Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support shared sessions #55

Merged
merged 4 commits into from
Jul 29, 2024
Merged

Conversation

filchristou
Copy link
Collaborator

Redesign for #45

Novelties and summary

In #45 we discussed that it would be better to use the underlying protocol to change the module.
It became clear that this is easier done if shared sessions were supported.
So instead of directly solving my problem I tried to introduce multiple shared sessions.
As a result, module update for Pluto is now also more straightforward.

1. session_id

Basically I introduced a session_id that characterizes every ServerSideSession.
The server holds all the sessions in a Dict{UUID, ServerSideSession}.
Everytime a client connects, it send the session UUID it wants to connect to as the first thing.

2. SessionSideSession

Now SessionSideSession is not described by a single Socket but a vector of Sockets, which is kept up to date.

3. @remoterepl

I need a way to still connect remotely and change the module, i.e. %module ..., without having a REPL available (like Pluto).
@remote support only command execution. Therefore I introduced @remoterepl which gives access to the REPL magic commands for all clients.
The question is.. why not collapse @remote to @remoterepl, since the second is a superset ?

4. Structural changes

I put all the using dependencies on the src/RemoteREPL.jl file.
I find it much more readable this way and it also follows community conventions.

Pluto side

Pluto notebook should look something like this gist.
I am not sure how to avoid the updating every second, but don't focus on that; this is a Pluto issue/discussion.
Basically Pluto just uses @remoterepl to upgrade the module in a specific session.
The core code boils down to

server = Sockets.listen(Sockets.localhost, 27765)

@async serve_repl(server)

session_id = UUID("f03aec15-3e14-4d58-bcfa-82f8d33c9f9a")

con2server = connect_remote(Sockets.localhost, 27765; session_id=session_id)

# define function to get more recent module
takemodulesymbol() = Symbol("workspace#" ,PlutoRunner.moduleworkspace_count[])

# get module
mod = eval(takemodulesymbol())

# use `@remoterepl` to update module
@eval(@remoterepl $"%module $mod")

Test

Run the pluto notebook.
Open a julia kernel, add dependencies and do:

julia> using RemoteREPL, Sockets, UUIDs

julia> connect_repl(Sockets.localhost, 27765; session_id=UUID("f03aec15-3e14-4d58-bcfa-82f8d33c9f9a"))
[ Info: Using session id f03aec15-3e14-4d58-bcfa-82f8d33c9f9a
REPL mode remote_repl initialized. Press > to enter and backspace to exit.
"Prompt(\"julia@localhost:27765> \",...)"

julia@localhost:27765> temp
1

julia@localhost:27765> @__MODULE__
Main.var"workspace#1261"

julia@localhost:27765> @__MODULE__
Main.var"workspace#1265"

julia> # uncomment "# anewvar = 2"

julia> @remote(anewvar)
2

As you can see the remote client closely follows the changes in the Pluto notebook.

Let me know how that looks and I am happy to hear your thoughts!

@filchristou filchristou mentioned this pull request Jun 21, 2023
Copy link
Collaborator

@c42f c42f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this!!

I think there's a difference between parsing on remote vs local side - I don't think one is quite a superset of the other. Especially when it comes to interpolating local variables, as might be done if #53 is implemented.

I'd prefer to keep @remote as it is, and have a plain function rather than the @remoterepl macro for string-based REPL communication - there's really no need for this to be a macro if users are just passing a string.

I'd suggest we just make a shorter more convenient name for the run_remote_repl_command function and add some overloads to default to stdout and the current default connection.

Maybe call it remotecmd or something? So you can have

mod = "YourModule"
remotecmd("%module $mod")

(Actually I'd prefer that all the "magic" non Julia commands like %module had their own functions rather than going through the REPL string parser. So to set the module you could have RemoteREPL.remote_module(mod) or something. But that could be a separate PR.)

src/server.jl Outdated Show resolved Hide resolved
src/server.jl Outdated Show resolved Hide resolved
src/server.jl Outdated Show resolved Hide resolved
src/server.jl Outdated Show resolved Hide resolved
src/client.jl Show resolved Hide resolved
src/client.jl Outdated Show resolved Hide resolved
@filchristou
Copy link
Collaborator Author

Some summary for the force-push:

I generally agree with your comments, so I removed @remoterepl and I overloaded run_remote_repl_command

run_remote_repl_command

I think it's better to keep the same name and not unnecessarily introduce breaking changes.
Now it's a bit more welcoming to use as I exported it and added docs.
Because of the argument order, we need to manually overload by defining the other two methods.
However everything could be simplified to a single function signature like the following:

function run_remote_repl_command(cmdstr::String, conn::Connection=_repl_client_connection, out_stream::IO=Base.stdout)

but it would be breaking.

Maybe we could make a list of desired breaking changes and include this one (?)

mutex

I implemented the design as shown in #55 (comment)
Let me know if you have a different implementation in mind.

remote_module!

I also implemented the RemoteREPL.remote_module! function you were saying. The function is exported.
The Pluto notebook now also uses remote_module! instead.

docs/Manifest.toml

Currently CI.yml Pkg.develops the current directory.
Since the Manifest.toml is already checked in, we can have dev .. there.
This is also easier for local docs generation, where the CI.yml is unusable.
I also added some docs for session_id and the workflow with Pluto.

tests

It would be nice to develop some tests about the sessions sharing, but currently nothing is implemented.
If you don't consider it urgent, maybe in a following PR ?

@c42f
Copy link
Collaborator

c42f commented Jul 3, 2023

Because of the argument order, we need to manually overload by defining the other two methods

No problems. I like the current argument ordering anyway by analogy to other functions like show etc which take the IO as an initial argument. And also I feel like the Connection is the "primary argument" so should go first.

but it would be breaking.

Well run_remote_repl_command is not documented anywhere as part of the public API. So I think it would be ok to rename it. Also it would probably be ok to just release version 1.0 if we were worried. This package seems somewhat stable at this point 🤷‍♀️

[Tests] If you don't consider it urgent, maybe in a following PR ?

Mmm I'd prefer tests added as part of the PR. Tests can kind of be a pain but they're so essential in the long run. It's easier to write them when the new stuff is fresh in mind and to be motivated to do it if you want to use the feature they are testing.

Testing shouldn't need to be super extensive for this feature. Just a few tens of lines should be enough and the client-server setup already exists in runtests.jl.

@filchristou
Copy link
Collaborator Author

It would probably be ok to just release version 1.0 if we were worried.

before going to 1.0, it would be nice to talk a bit about #56

Update summary

  1. Renamed run_remote_repl_command to remotecmd
  2. Added some further docs
  3. reordered remote_module! function's arguments to resemble to remotecmd etc.
  4. Added tests. Also the test server now uses TestEnv, because otherwise the registry version of RemoteREPL is used rather the developed d9b8369#diff-3b9314a6f9f2d7eec1d0ef69fa76cfabafdbe6d0df923768f9ec32f27a249c63R117
    I think I successfully added TestEnv in the dependencies of the GHAction-julia, and shouldn't be a problem d9b8369#diff-3ab46ee209a127470fce3c2cf106b1a1dbadbb929a4b5b13656a4bc4ce19c0b8R76
    Could you try running the test please ?

@filchristou
Copy link
Collaborator Author

@c42f Hi Claire. If there are no issues with the PR, could you please merge it and maybe make a new release ?

@filchristou
Copy link
Collaborator Author

@c42f Are there any blocking points? Do you think we can merge that soon?

@c42f c42f force-pushed the sharedsessions branch from d9b8369 to 14ddf46 Compare July 16, 2024 04:17
@c42f
Copy link
Collaborator

c42f commented Jul 16, 2024

I've fixed the conflicts by rebasing and pushing to the pull request branch.

That also seems to have triggered CI. Can you have a look at what's gone wrong there?

@filchristou
Copy link
Collaborator Author

ouf. that's gonna be hard. Locally using TestEnv I get the tests passed, but with the ] test something fails with a different error from the CI. Also sometimes (mostly when I exit the kernel) I get a non fatal error printing the following

error in running finalizer: Base.IOError(msg="write: broken pipe (EPIPE)", code=-32)
uv_write at ./stream.jl:1066
unsafe_write at ./stream.jl:1120
write at ./strings/io.jl:248 [inlined]
writeheader at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/Serialization/src/Serialization.jl:703
serialize at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/Serialization/src/Serialization.jl:776
unknown function (ip: 0x7fa575ae73b9)
_jl_invoke at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:3077
close at /u/home/wima/fchrstou/GitProjects/RemoteREPL.jl/src/client.jl:166
unknown function (ip: 0x7fa575ad01e5)
_jl_invoke at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:3077
run_finalizer at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gc.c:318
jl_gc_run_finalizers_in_list at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gc.c:408
run_finalizers at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gc.c:454
ijl_atexit_hook at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/init.c:299
jl_repl_entrypoint at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/jlapi.c:732
main at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/cli/loader_exe.c:58
unknown function (ip: 0x7fa581eb1249)
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)

I will have to look closer to that when I find some time. Btw thanks for making me a collaborator :)

@c42f
Copy link
Collaborator

c42f commented Jul 17, 2024

Oof. I suspect I've seen that finalizer error before. Probably not your fault! It's hard to clean up the ssh tunnel cleanly when the process exists. (Unless there's a way to hook the REPL shutdown more cleanly?)

This is a relatively hard package to test.

@filchristou filchristou mentioned this pull request Jul 23, 2024
@filchristou
Copy link
Collaborator Author

As said. The problem was very obscure and I still don't understand what was the issue. include("test/runtests.jl") was working but ] test was not. I just removed all the TestEnv stuff and now it seems that the tests consistently pass. @c42f can we merge ?

Copy link
Collaborator

@c42f c42f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so I found a little time to look through this again.

My main comment is that the server and client should always exchange the protocol header (including PROTOCOL_VERSION) before they start doing anything else. This is to allow some comprehensible error message for users when both sides of the connection are on different RemoteREPL versions.

src/client.jl Outdated
@@ -140,6 +139,8 @@ function setup_connection!(conn::Connection)
namespace=conn.namespace)
end
Sockets.nagle(socket, false) # Disables nagles algorithm. Appropriate for interactive connections.
# transmit session id
serialize(socket, conn.session_id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to go after verify_header() and you should increment PROTOCOL_VERSION if you're making an incompatible change to the RemoteREPL protocol (such as adding a session ID)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. done that. send_header and verify_header now come first.

It's an incompatible change from protocol perspective, but all user code is still backwards compatible. If no UUID is passed a random will be generated. I increased the PROTOCOL_VERSION.

Equivalent to using the `%module` magic.
"""
remote_module!(conn::Connection, mod::Module) = remotecmd(conn, Base.stdout, "%module $(mod)")
remote_module!(mod::Module) = remotecmd(_repl_client_connection, Base.stdout, "%module $(mod)")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally these would not use string concatenation because the %module just needs to be parsed again on the client side in order to send the :in_module command. But doing that cleanly that would require a bit of refactoring to pull out the last couple of paragraphs of remotecmd() into a separate internal function dedicated to sending already-parsed command tuples. So I leave it to you to decide what to do about this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I see that. I don't think it's too much work but I would like it to keep it like that for this PR. I made an issue to not forget #69

src/server.jl Show resolved Hide resolved
@filchristou
Copy link
Collaborator Author

@c42f pushed an update. magic protocol is checked first now.

Copy link
Collaborator

@c42f c42f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be good to go, I think

@filchristou filchristou merged commit c273759 into JuliaWeb:main Jul 29, 2024
4 checks passed
@filchristou
Copy link
Collaborator Author

Should we make a v0.3.0 release ? If I have permissions and it's fine by you I can make one.

@c42f
Copy link
Collaborator

c42f commented Jul 30, 2024

This PR seems to have broken the SSH tunnel tests :-(

They work for me before this PR, but not after.

I can't remember exactly, but I think the SSH tests are not enabled in CI because getting the SSH server working in the CI environment wasn't finished.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants