Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support both local kernels and remote (via kernel gateway) at the same time. #1187

Open
ojarjur opened this issue Jan 25, 2023 · 14 comments
Open

Comments

@ojarjur
Copy link
Contributor

ojarjur commented Jan 25, 2023

Problem

I love the option of connecting a Jupyter server to a kernel gateway, but it is currently an all-or-nothing experience; either all of your kernels run locally or they all run using the kernel gateway.

I would like it if I could pick either local or remote when I am selecting a kernelspec.

For example, I want to be able to have two notebooks open in JupyterLab, and be able to run one of them using a kernel started by my local server, and have the other one using a kernel started by a kernel gateway.

Proposed Solution

It is possible to solve this by using some sort of an intermediary proxy as a kernel gateway, which is responsible for deciding whether to run the kernels locally or remotely.

In fact, I have a proof-of-concept implementation of this and was able to verify that it works as you might hope.

However, this approach has a big drawback in that you have to then run two separate instances of the jupyter server locally; one for creating kernels and one for connecting to this proxy, and those two different jupyter servers have to use different configs (telling them where to run the kernels).

It would be much simpler (both in the sense of being cleaner and easier to use) if the jupyter server was able to do this switching natively instead of relying on an intermediary proxy.

Additional context

The proof of concept I linked to above is very specific to my use case and not a general solution to this problem (e.g. it assumes a specific, hard-coded form of auth, etc).

The general approach, however, should be reusable and work in an in-jupyter-server based solution:

  1. For kernelspecs, take both the local and remote kernelspecs and combine them into a unified view... adding a prefix onto each kernelspec name to identify if it is local or remote.
  2. For creating kernels, figure out if the kernelspec is local or remote, strip off the prefix, and then forward the request to the corresponding backend.
  3. For switching kernels, send a delete request to the old backend and then a create request to the new one.
  4. Keep a map in memory from kernel IDs to the backend that holds them.
  5. Forward all other kernel requests to the corresponding backend.
@kevin-bates
Copy link
Member

Hi @ojarjur - thank you for opening this issue. I'm unable to respond to this right now but have spent a fair amount of time thinking about this and sharing some of those ideas with others and can see your general approach is similar. (I don't think number 4 is necessary as that functionality should "just work" via the existing Change Kernel behavior.)

I hope to be able to circle back to this in a few days (hopefully sooner).

@kevin-bates
Copy link
Member

The general approach I've been mulling over is to introduce the notion of a Kernel Server where a single jupyter server could be configured with one or more Kernel Servers. A Kernel Server would consist of a MappingKernelManager, a KernelSpecManager and, in some fashion, a KernelWebsocketHandler. In essence, a KernelServer is essentially a GatewayClient with a special "local" KernelServer that exists by default.

I'm not sure you had joined the Server/Kernels Team Meeting by this time, but I raised the question about how traits could be configured to apply to multiple instances where each instance had potentially different values. Since traits are class-based, specifying configuration settings for each KernelServer (besides the "local' KernelServer since it really wouldn't require any configuration for B/C purposes) would probably require some kind of "config loader" mechanism, that I'm sure we could tackle. (Thinking about a kernel_servers subdirectory in jupyter_server_config.d that contains a set of "named" files.)

Each KernelServer has a name that, as you also intimated, would be used to prefix that server's kernelspecs. We may also want to adjust the Display Names since these are what the user sees and these would require uniqueness.

The handlers would essentially do what they do today but call into the KernelServers (rather than MappingKernelManager) and the KernelServers would act as a broker taking a kernel name, locating its prefix in the set of KernelServers, and forwarding the request to that KernelServer.

As you also intimated, a second index, key'd by kernel_id, would also result in the applicable KernelServer's instance. This would be used when a WebSocket request (or any lifecycle request) is submitted, which uses the kernel_id, the KernelServer would then be identified and that request forwarded to that kernel server's KernelWebsocketHandler, or server, etc.

So I think this becomes a matter of the following (at a high level of course):

  1. Introduce a brokering layer - KernelServers - which is essentially injected in front of the MappingKernelManager.
  2. Address how specific configurations can be loaded into the KernelServers. Thinking that this would iterate the previously mentioned files in juptyer_server_config.d/kernel_servers and instantiate instances when KernelServers is instantiated - in addition to the default "local" KernelServer (which could be configured to be off if any deployments didn't want to support any local kernels).
  3. Update the handlers to forward the requests to the KernelServers broker. This may be as easy as supporting the same methods, requiring minimal changes.
  4. We'd probably want to introduce KernelSpecCaching (could port over EG's) since these would get pounded. This would be a great place to introduce events so the front end isn't requesting these every 10 seconds like it does today.

At any rate, I think we're on the same page here. At a high level, this is doable and would be a useful addition. By default, the server would behave just as today - supporting only local kernels. I think we could also accomplish backward compatibility for single-gateway configs keying off --gateway-url or the single configuration items in the server config file itself.

Thoughts?

@ojarjur
Copy link
Contributor Author

ojarjur commented Jan 31, 2023

@kevin-bates thanks for the detailed and thoughtful response, and for bringing the topic up in the weekly meeting.

Also, thanks for the class references, that will help if I try to prototype this by converting my existing proof-of-concept into a Jupyter server extension.

The general approach I've been mulling over is to introduce the notion of a Kernel Server where a single jupyter server could be configured with one or more Kernel Servers. A Kernel Server would consist of a MappingKernelManager, a KernelSpecManager and, in some fashion, a KernelWebsocketHandler. In essence, a KernelServer is essentially a GatewayClient with a special "local" KernelServer that exists by default.

I like this approach and I had considered something along these lines but I wasn't sure if the jupyter-server project was the right home for that level of configurability.

Conceptually, if we take it as granted that we always want to support local kernels, then this can be viewed as an instance of the "zero, one, or infinitely-many" design question in terms of kernel gateways... by default there are zero kernel gateways supported, you can currently opt into supporting one kernel gateway, and the approach you've described extends that to infinitely-many kernel gateways.

The "infinitely-many" case is the most general, but supporting it inside of jupyter-server itself opens up a huge dimension of complexity that can be hard to manage.

The question of configuration that you mentioned is one example of this complexity, but it's not the only one. There's also complexity in terms of the question of how multiple backends are managed, and I don't think that a single approach to that will satisfy all users.

For example, the simplest approach would be a fixed set of backends. However, I suspect most users would be better served by having some sort of automated discovery mechanism that dynamically finds all of the Jupyter servers available to them. That's inherently specific to the user's environment so we can't build a one-size-fits-all solution to it.

After you mentioned this at the weekly meeting I had some time to mull it over, and wanted to present another option:

What if, instead of a set of static configs, we defined a base class for providing the set of KernelServers? It could have just a single method that returns a set of KernelServer instances.

We could provide canned implementations of this class for the local-only use case, the one-remote-only use case, and the both-one-local-and-one-remote use case.

Then, users who wanted to use arbitrarily many backends could provide their own implementation of this base class that took advantage of what they know about their particular environment (e.g. knowing how to look up backends and which configs are common to all of them).

What do you think of that option? Would that still line up with what you wanted?

Each KernelServer has a name that, as you also intimated, would be used to prefix that server's kernelspecs. We may also want to adjust the Display Names since these are what the user sees and these would require uniqueness.

Yes, you are right. I forgot to mention it, but that is exactly what I've been doing. I add a suffix on each display name which defaults to " (local)" for local kernelspecs and " (remote)" for remote ones. I wanted these suffixes to be configurable so that the user can change them to their local language.

The handlers would essentially do what they do today but call into the KernelServers (rather than MappingKernelManager) and the KernelServers would act as a broker taking a kernel name, locating its prefix in the set of KernelServers, and forwarding the request to that KernelServer.

As you also intimated, a second index, key'd by kernel_id, would also result in the applicable KernelServer's instance. This would be used when a WebSocket request (or any lifecycle request) is submitted, which uses the kernel_id, the KernelServer would then be identified and that request forwarded to that kernel server's KernelWebsocketHandler, or server, etc.

That sounds good to me.

@kevin-bates
Copy link
Member

Hi @ojarjur - thanks for the response. I think there's some alignment here (in the majority) but I'm hoping we could perhaps have a call together because I believe there may be some "terminology disconnects" that I'd like to iron out. Could you please contact me via email (posted on my GH profile) and we can set something up?

In the meantime, I would like to respond to some items.

if we take it as granted that we always want to support local kernels

I don't think we should take this for granted. Users configuring the use of a gateway today essentially disable their local kernels and I don't think we should assume that every installation tomorrow will want local kernel support. Many will, as evidenced by those that have asked those kinds of questions, but I think there's some value to operators to know there won't be kernels running locally. Nevertheless, this is still a zero, one, or many proposition with respect to gateway servers.

One thing I think we can take for granted is that a given Jupyter Server deployment will have at least one server against which kernels can be run. I also think we can assume that that server (against which kernels are run) will be managed via the existing api/kernels (and api/kernelspecs) REST API.

I suspect most users would be better served by having some sort of automated discovery mechanism that dynamically finds all of the Jupyter servers available to them. That's inherently specific to the user's environment so we can't build a one-size-fits-all solution to it.

I'd like to better understand how this discovery mechanism would work. (I'm assuming by "Jupyter servers" [and elsewhere "backends"] you mean "Gateway servers" - or "Kernel servers".) In this particular instance, I think operators would prefer explicit configurations that are loaded at startup, so they know exactly where their requests are going. But, again, I may not be understanding how discovery would work.

users who wanted to use arbitrarily many backends could provide their own implementation of this base class that took advantage of what they know about their particular environment (e.g. knowing how to look up backends and which configs are common to all of them)

I agree that any class we provide should be extensible and substitutable, provided there's a well-known interface that the server can interact with. However, given the assumption above that all kernel servers will honor the existing REST APIs, I think a single implementation would be sufficient (at least for a vast majority of cases). Today's GatewayClient is essentially a class that communicates with a server to manage kernels. Yes, each destination would require a separate configuration, but, from an implementation standpoint, I think it's relatively straightforward. Perhaps I'm not understanding your comment correctly, but I would definitely like to avoid the need for operators to have to implement their own "KernelServer" class in order to connect to a different server via the REST API.

e.g. knowing how to look up backends and which configs are common to all of them

I think this is driving at the discovery stuff and could see an implementation of KernelsServers (note the plurality) that "discovers" its KernelServer instances, not by reading files in a directory and loading each instance, but by discovering then in some other way, like hitting a clearinghouse (DB) of sorts, for example.

This is an interesting conversation. Please reach out to me via my email. If we're in widely separate TZs I'd still like to have the conversation.

If others are interested in joining our sidebar, please let me know.

@Zsailer
Copy link
Member

Zsailer commented Jan 31, 2023

Just got back from parent leave and catching up on this thread.

If y'all have a meeting, I'd love to join (or at least see some notes! 😎).

@kevin-bates
Copy link
Member

@ojarjur - could you please send Zach an invite for our 2 pm (PST) chat today?

ojarjur added a commit to ojarjur/jupyter_server that referenced this issue Apr 29, 2023
This change adds support for kernel spec managers that rename kernel specs
based on configured traits.

This is a necessary step in the work to support multiplexing between
multiple kernel spec managers (jupyter-server#1187), as we need to be able to rename
kernel specs in order to prevent collisions between the kernel specs
provided by multiple kernel spec managers.
@waaffles
Copy link

waaffles commented Dec 9, 2024

Hey, all! I've been trying to find the answer to this problem for a few months myself and would love to be able to use the local + remote scenario described here. Personally, I'm not in a place where I can contribute to the work involved here due to my extreme lack of knowledge in this area 🥲 But it's been about a year and half since this was last discussed so I'm hoping if anyone here has had any more time to either work around this or think about how to better handle this within jupyter_server?

I see this PR is still open. Is it integral for getting us across the finish line?
#1267. I also couldn't tell from reading that PR what was blocking that PR from being merged

@kevin-bates
Copy link
Member

Thanks for posting this @waaffles. I'm sorry I'm unable to help with this (although I certainly have opinions 😄). @ojarjur (I hope you're doing well!) - do you still have momentum for this feature? I'm sure David and yourself aren't the only ones that would like to enable multiple kernel "environments"(?).

@lresende lresende pinned this issue Dec 9, 2024
@lresende lresende unpinned this issue Dec 9, 2024
@ojarjur
Copy link
Contributor Author

ojarjur commented Dec 12, 2024

@waaffles @kevin-bates Sorry for the delay in responding; I've been pretty busy with other work.

I largely went silent here for a long time because I wound up building a minimal-viable solution via a server extension.

That one provides a config helper that is tailored to Google's Cloud Platform, but the rest of the code is actually generic and usable in any other environment.

You would just install that package and then add something like this to your Jupyter config python file:

from jupyter_server.services.sessions.sessionmanager import SessionManager

from kernels_mixer.kernelspecs import MixingKernelSpecManager
from kernels_mixer.kernels import MixingMappingKernelManager
from kernels_mixer.websockets import DelegatingWebsocketConnection

c.ServerApp.kernel_spec_manager_class = MixingKernelSpecManager
c.ServerApp.kernel_manager_class = MixingMappingKernelManager
c.ServerApp.session_manager_class = SessionManager
c.ServerApp.kernel_websocket_connection_class = DelegatingWebsocketConnection

# Add configs for your gateway client below...

This is far from ideal for multiple reasons, but it was good enough for the remaining work to fall off of my radar.

The big limitations of this are:

  1. It doesn't support more than one gateway client at a time.
  2. It doesn't do any sort of renaming of kernelspecs; if a local and remote kernel have the same name, then the remote one is ignored.

However, fixing both of those becomes a much larger scoped change.

I would very much like to get that large change into Jupyter server itself, but the size of the change makes that a large amount of work and I have had difficulty finding the time to do it.

What this probably needs next is a group discussion on the overall architecture so that we can build a consensus on the right approach.

I joined the Jupyter Server Community meeting today to ask about the best way to do that, and @Zsailer suggested using an issue in the team-compass repo to drive that discussion.

I will try to write one up today and then will link it to this issue.

@lresende
Copy link
Member

@ojarjur, I believe that adopting this approach will eventually lead to the deprecation of the current gateway client. To minimize the impact on the Jupyter Server's architecture, we should explore more modular and isolated solutions, such as leveraging kernel providers. This would help ensure that any changes to the server's behavior remain minimal and well-contained.

However, a significant limitation with the current kernel provider implementation is its lack of flexibility in defining different communication mechanisms (e.g., websockets versus ZMQ) based on the kernel provider in use. This constraint makes it challenging to support scenarios where local kernels use ZMQ connections while remote kernels communicate over websockets. Addressing this limitation will be critical to achieving a seamless integration and maintaining compatibility across diverse kernel configurations.

@Zsailer
Copy link
Member

Zsailer commented Dec 12, 2024

@lresende you mean kernel provisioners instead of "providers", correct?

@lresende
Copy link
Member

@lresende you mean kernel provisioners instead of "providers", correct?

Sorry, @Zsailer is correct.

@ojarjur
Copy link
Contributor Author

ojarjur commented Dec 12, 2024

@lresende

I believe that adopting this approach will eventually lead to the deprecation of the current gateway client.

Can you explain your reasoning here? All of my implementations of this concept have relied on the GatewayClient, so I don't understand how this would lead to that getting deprecated.

The only change I can anticipate to the GatewayClient would be to change it from being a singleton.

To minimize the impact on the Jupyter Server's architecture, we should explore more modular and isolated solutions, such as leveraging kernel providers.

I'm having difficulty imagining what you are proposing.

If you are suggesting that there is no gateway server at all, and the Jupyter server directly provisions remote resources, then that approach would not work for my use case because we have kernel gateway servers that we want to connect to.

Alternatively, are you suggesting that kernel provisioners somehow connect to kernel gateway servers? If so, I don't see how that would work when the server needs to list the remote kernelspecs.

However, a significant limitation with the current kernel provider implementation is its lack of flexibility in defining different communication mechanisms (e.g., websockets versus ZMQ) based on the kernel provider in use. This constraint makes it challenging to support scenarios where local kernels use ZMQ connections while remote kernels communicate over websockets. Addressing this limitation will be critical to achieving a seamless integration and maintaining compatibility across diverse kernel configurations.

I was able to address that already.

I simply defined an intermediary implementation of BaseKernelWebsocketConnection that routes to either a wrapped ZMQChannelsWebsocketConnection instance or a GatewayWebSocketConnection instance depending on the type of the KernelManager. The easiest way to do that is to make each KernelManager class include a trait for its weboscket connection class, and fallback to ZMQChannelsWebsocketConnection if that trait isn't defined.

@waaffles
Copy link

waaffles commented Dec 13, 2024

Thanks @ojarjur so much for your follow up, the contributions, and for sharing the extension you had made! I came across your original github project but not the extension! This also solves my current use case for now where we only need to connect to a single gateway in conjunction with keeping local kernels around to not have to battle with toggling. Thanks for also getting it on the docket with the jupyter compass discussions. I'm gonna continue to keep an eye on this to hopefully eventually be able adopt the blessed path moving forward. I don't know in what capacity I can help here as I'm still learning the ecosystem, but I hope to be involved in whatever capacity I can.

Thanks as well to @Zsailer @lresende and @kevin-bates as well for a lot of the insight along the way here 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants