Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Zeppelin support #206

Open
cmenguy opened this issue Sep 14, 2018 · 2 comments
Open

Adding Zeppelin support #206

cmenguy opened this issue Sep 14, 2018 · 2 comments
Labels
idea An idea which is open to discussion rather than a particular issue or bug

Comments

@cmenguy
Copy link

cmenguy commented Sep 14, 2018

Hi, we are considering adopting Papermill for parameterizing and running our notebooks, but the main thing stopping us is the lack of support for Zeppelin. Our notebooks are a mix of Jupyter and Zeppelin, and having the ability to run both with the same library would be invaluable.

I was wondering if that is something that has been discussed before, and if this is something that would be a good fit for Papermill?

If this is something that would be of interest, I would be happy to try contributing something there.

@mpacer
Copy link
Member

mpacer commented Sep 14, 2018

So from what I understand, this is slightly tricky because of the way that Zeppelin thinks about a notebook file. Under the hood there is a note.json. I was not able to track down a spec for that file, so we may have no guarantees about what we can expect to find there.

Because it doesn't seem to have a standard, versioned spec that we can adhere to it can be tricky to parameterise. It would likely require creating a library like nbformat for Zeppelin notebooks that would to plug into what we're currently doing with nbformat to parameterise Jupyter notebooks.

Additionally, I'm not sure how the system thinks about metadata…so while it might be possible to apply tags to cells, we may need to figure out a different convention for labeling cells as holding parameters.

@MSeal
Copy link
Member

MSeal commented Sep 14, 2018

Hi and welcome to the 🎉 @cmenguy !

I think there's definitely room in papermill for processing zepplin notebooks. As M mentioned, it definitely operates in a different format than Jupyter so it'd require a few components to get some abstraction upgrades.

The first abstraction that needs adjusting is the node formatting. We'd need something to load the note.json into nbformat or an nbformat-like object for processing. Then parameterization would then need to be able to apply to both notebook formats in a similar manner -- or we'd need parameterization be more abstract if nbformat-like memory store is out. This might require upgrading parameterization to a more plug-in play pattern like we do with other components of papermill either way.

Then we'd want to extend #204 with an --engine=zepplin to wrap a zepplin executor. This will add some java dependency for this particular engine, but that's ok and we can just raise an exception if the JRE isn't available inside the engine.

And finally we'd need to figure out how to handle the iorw patterns for a non-jupter document. This one would require a little more thought, but I don't see any reason we couldn't solve it there too.

@MSeal MSeal added the idea An idea which is open to discussion rather than a particular issue or bug label Sep 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
idea An idea which is open to discussion rather than a particular issue or bug
Projects
None yet
Development

No branches or pull requests

3 participants