Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add filename and URL support for Image #1676

Merged
merged 8 commits into from
Mar 28, 2018

Conversation

pganssle
Copy link
Contributor

Fixes #435.

This adds two convenience methods for pulling images directly from filenames (one alternate constructor and one "setter"-style method). It also adds support for the "url" format (per @SylvainCorlay's advice), which interprets the value as a URL.

I would also like to automatically detect when value has been set to a six.text_type (str/unicode) and automatically set format to 'url', but apparently validators fire after the type checking. Is there a hook in traitlets that fires before type checking?

I added some very basic tests, but I was not entirely sure how to mock the front-end and capture the messages - if someone can point me in the right direction I can improve these tests. I did not add any tests on the JS side - any advice on how to write tests for the front end would be appreciated.

@pganssle
Copy link
Contributor Author

It seems the test failures are because I was assuming that ipywidgets/widgets/tests/data would be available at test time, but it seems that ipywidgets packages the tests directory as a submodule under widgets, and the data directory doesn't come along with it, because it doesn't have a __init__.py in it.

I was under the impression that generally a tests directory is not included as part of the package (part of the source distribution, but not available from the library), is there a specific reason it was done this way?

For the moment I guess I can add the image as part of the package_data, but it might be worth fooling around with the way tests are handled so that the actual installed package can be slimmer?

@maartenbreddels
Copy link
Member

I actually forgot during the sprint that I've done sth 'similar' with the VideoStream in ipywebrtc. Note that having a VideoStream doesn't mean we should not have a Video widget, but I'll touch on that in a different issue/PR.

(cc @jasongrout )
Maybe it's useful to have a similar API or Image, VideoStream (and if we want a Video widget), if we VideoStram to go upstream. I believe you two were also discussed downloading the image. Something like this is useful to have to the VideoStream as well, since cross origin videos are 'tainted' and not possible to say stream over webrtc, or will taint other canvas'es or streams if you start mixing them. If you download them and send them over as blobs you won't run into these issues.
would an API like this make sense:

Image.from_file(...)
Video(Stream).from_file(...)

Image.from_url(...)
Video(Stream).from_url(...)

Although the from_url may not make it clear it will download it.

Copy link
Member

@maartenbreddels maartenbreddels left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look good! Maybe some rewording from filename -> file.

@@ -32,3 +33,40 @@ class Image(DOMWidget, ValueWidget, CoreWidget):
width = CUnicode(help="Width of the image in pixels.").tag(sync=True)
height = CUnicode(help="Height of the image in pixels.").tag(sync=True)
value = Bytes(help="The image data as a byte string.").tag(sync=True)

@classmethod
def from_filename(cls, filename, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say from_file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a reasonable change. The main reason I didn't go with from_file was that it seemed like from_file would take a stream buffer object (like an open file), rather than a filename.

I suppose I could change it so that it's from_file and try to infer whether it's already an open file or a file path.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, that will be expected, and a nice feature :)

@classmethod
def from_filename(cls, filename, **kwargs):
"""
Create an `Image` from a local filename.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... from a local file. ?

var oldurl = this.el.src;
this.el.src = url;
if (oldurl) {
if (oldurl && typeof oldurl !== 'string') {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quite tricky, but createObjectURL does create a string object. Maybe sth like:

if (oldurl && oldurl.startsWith('blob:')) {
 ...
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm... If it's just a string does it still need to be freed? Or is it just cast to a string when accessed through this.el.src?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a string of the form, 'blob:<someid>', see #1685 how I solved it for the videostream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I played around with it in the console and I see what's going on. The src attribute is a string that points to the location of the resource, which in this case is an in-memory blob.

I think the only downside of this startsWith('blob:') approach is that a user might create another blob that they don't want revoked, and pass that one's resource locator to the model, and when they change the image, we'll be freeing their blob, not our blob. This seems like an extreme edge case, though, and I'm sure there are simple enough workarounds that dealing with this edge case is not worth adding additional complexity.

I imagine that the "right" way to do this (which avoids this edge case) would be to set a flag on the model that indicates whether the current URL was allocated by update, and if so free it - rather than trying to infer this from the value at all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think users have access to the blob:... strings, so that would indeed be rather unlikely.
The docs are a big vague, but it seems to be its reference counting behind the scenes, so I wouldn't worry about it.

@maartenbreddels
Copy link
Member

I think it's better to include the tests and (provided the data for testing is small) to include this as well. In case of issues you can always ask users to run the tests.
Another option than may work is to install with -e in here

@pganssle
Copy link
Contributor Author

@maartenbreddels I'm not sure it particularly helps if the user has the tests in the module unless there's a specific mechanism for test discovery I don't know about. I think this is not common, but it's less work for me to do it that way so I guess I'm not complaining too much ;)

@jasongrout jasongrout added this to the 7.x milestone Oct 19, 2017
@jasongrout
Copy link
Member

@maartenbreddels, @pganssle - where are we on this?

@jasongrout
Copy link
Member

@pganssle - I think I resolved the conflict with master correctly, but it wouldn't hurt to look at it.

@maartenbreddels
Copy link
Member

As mentioned in #1685 it would be nice to have the API for Image and VideoStream with the static methods similar:

  • from_url will put the url in value
  • from_download will download and put the content in value
  • from_file will read the file and put the content in value
    Does that sound good?

@jasongrout
Copy link
Member

Does that sound good?

Those are convenience methods? That sounds all right to me.

@pganssle
Copy link
Contributor Author

Last time I changed anything on this, I ended up spinning my wheels again trying to mess around with the tests. I guess I'll just ship the test data with the tests, but I still think it's not a good idea for the test directory itself to be importable from the package. I means you have to ship a bunch of unnecessary data with your package for something that's not useful for end-users anyway.

I'll clean this up soon, and maybe make a separate PR for review that moves the tests outside of the package.

@maartenbreddels
Copy link
Member

maartenbreddels commented Oct 24, 2017 via email

@pganssle
Copy link
Contributor Author

I think for one thing the abstraction is wrong - tests are not user-facing code, they are part of development and deployment. They don't serve to benefit the user at all, except indirectly in that it might aid the developers to diagnose or fix a problem. Some of the problems of shipping the tests are basically problems mainly because they are burdens on the user (albeit possibly minor) with no benefit for the user.

Some downsides I see:

  1. To the extent that your tests are intended for use by the users, they become part of the public API, meaning changes to the tests and test organization might be a backwards-incompatible change (though probably most people aren't actually relying on the tests for anything).
  2. As you add more data, the fact that the tests are there at all is an increasing burden - you may end up shipping 100 MB of images, videos, etc for your tests, when the library/app itself is only 10kb.
  3. You either end up introducing unnecessary runtime dependencies on libraries you only need for tests (e.g. pytest, freezegun, etc), or you're shipping something where the tests don't even work in a standard installation out of the box anyway.

Most of these are not huge show-stoppers in most situations, but the benefit you get (users can run tests even from a wheel install) is not really worth it. As long as the tests are included in the source distribution, users who really want to run the tests can do so pretty easily, though I think this won't be something that is of general interest anyway, since most of the time a user has problems with something specific that your library is failing at (i.e. they already have a failing test case). Unit tests are more useful for saying, "is this library failing, and if so, how?" As long as the tests have already been run upstream, the only reason to run them again once you have them is to check that they work on your particular platform. That's something that is useful for people running package managers (e.g. debian, arch linux), but they're probably building from source or repo anyway.

@jasongrout
Copy link
Member

jasongrout commented Oct 24, 2017

+1 for moving tests out of the distributed package - I think you've eloquently spoken of reasons for doing that. That change should be a different PR as you mention, of course.

@maartenbreddels
Copy link
Member

Thanks Paul, I've made up my mind I think. I'm currently splitting vaex up into multiple packages (but monorepo), and then it becomes a bit blurry which test belongs to which package, so it would be more natural to not include it in the source tree.

For the testing framework, my vote goes to py.test, nosetests is maintained but dead, and I'm really positive about py.test (fixtures are great, capturing stdout keeps the noise down, just running failed tests, jumping to the debugger on fail).

@jasongrout
Copy link
Member

Ping on this. We thought it would be nice to go in to the 7.2 release, which we're hoping to do by early next week.

let url;
let format = this.model.get('format');
let value = this.model.get('value');
if (format != 'url') {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!== please

var blob = new Blob([value], {type: `image/${this.model.get('format')}`});
url = URL.createObjectURL(blob);
} else {
url = String.fromCharCode.apply(null, new Uint8Array(value.buffer));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, apparently the notebook itself uses TextDecoder to get the JSON parts of a binary message: https://github.com/jupyter/notebook/blob/179bb24fbf79d153812858126127a91431da3319/notebook/static/services/kernels/serialize.js#L20

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird that it is used in the notebook, especially since it is not supported in IE 11 and in development in Edge: https://developer.microsoft.com/en-us/microsoft-edge/platform/status/encodingstandard/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the notebook already includes a polyfill: jupyter/notebook#3457

I think we should assume the TextDecoder api works on the page. The notebook has a polyfill, and lab doesn't support IE/Edge. People embedding widgets, if the widgets use binary messages, will need to put it on the page.

@pganssle
Copy link
Contributor Author

pganssle commented Mar 22, 2018

@jasongrout Sorry this has totally fallen off my radar. Depending on how early my son falls asleep, I'll either make these changes tonight, tomorrow night or this weekend.

Because I don't use any sort of task management software or anything, I'm going to leave the tab open, nagging at me until I get to it 😛

@vidartf
Copy link
Member

vidartf commented Mar 22, 2018

Because I don't use any sort of task management software or anything, I'm going to leave the tab open, nagging at me until I get to it

At this point, I consider open tabs as my task management system 😅

@pganssle pganssle changed the title [WIP] Add filename and URL support for Image Add filename and URL support for Image Mar 28, 2018
@jasongrout
Copy link
Member

This looks good to me. I'll leave it open for the European working hours for any comments from @SylvainCorlay or @maartenbreddels. If there is no objection, I'll merge.

Thanks @pganssle!

else:
return str

_text_type = _text_type()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say a more pythonic way would be

_text_type = str
try:
    _text_type = unicode
except:
    pass

url: [str, bytes]
The location of a URL to load.
"""
if isinstance(url, _text_type):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would fail for a str type in Python 2, you could do (avoiding six, and what is done in widget.py):

from ipython_genutils.py3compat import string_types
if isinstance(url, string_types):
 ....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not intended to match str, it's intended to match unicode. str in Python 2 does not need to be decoded.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, makes sense (maybe a comment?)

Returns an `Image` with the value set from the filename.
"""
value = cls._load_file_value(filename)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about guessing format:

base, ext = os.path.splitext(filename)
if ext and 'format' not in kwargs:
    kwargs = dict(**kwargs, format=ext[1:])  # copy dict and remove leading .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maartenbreddels Good idea, though I think the implementation needs to be a bit more complicated - it should try to map the extension to the MIME type, which is not always the same thing.

I'll fix that up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can choose maybe from a whitelist (all the ones starting with image/). Maybe give a warning if it's not known?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could just use the python library for mapping filenames to formats: https://docs.python.org/3/library/mimetypes.html#mimetypes.guess_type


@classmethod
def _load_file_value(cls, filename):
if getattr(filename, 'read', None) is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to use if hasattr(filename, 'read') ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It is mainly because it's dangerous in Python 2.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, always learning!

Copy link
Member

@maartenbreddels maartenbreddels left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Paul,

thanks a lot, good work, I got a few comments, feel free to disagree!

cheers,

Maarten

@pganssle
Copy link
Contributor Author

@maartenbreddels OK, added in the MIME type inference.

name = getattr(filename, 'name', None)
name = name or filename

try:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the python mimetypes library? That seems cleaner...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jasongrout Good call, didn't know about that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. I think this is much cleaner.

@classmethod
def _guess_format(cls, filename):
# file objects may have a .name parameter
name = getattr(filename, 'name', None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't know that!

@jasongrout
Copy link
Member

@maartenbreddels - feel free to merge when this looks good to you!

@maartenbreddels
Copy link
Member

Excellent work Paul!
It's all green light, happy to merge this, but now we're all together, what about PIL Images and numpy arrays?
What's i'm doing a lot is sth like this:

    # I is numpy arrays with 'intensities' of shape (height, width)
    colormap = matplotlib.cm.get_cmap(colormap)
    rgba = colormap(I, bytes=True)
    width, height = rgba.shape[1], rgba.shape[0]
    f = StringIO()
    img = PIL.Image.frombuffer("RGBA", (width, height), rgba, 'raw', 'RGBA', 0, 0)
    img.save(f, "png")
    return f.getvalue()

Do we want?

  • `from_pil_image(img)
  • from_ndarray(image_array) # takes (height, width, channels(3, 4)) of type byte or floatX type as argument
    It feels a bit out of scope, but I think many people would use it.

I think we can add support for this without explicitly importing PIL/Pillow, and happy to do a separate PR for this is Paul feels like it enough :)

@jasongrout
Copy link
Member

happy to do a separate PR for this is Paul feels like it enough :)

I think definitely separate PR - this one has evolved enough.

@pganssle
Copy link
Contributor Author

@maartenbreddels Probably best not to let the best be the enemy of the good. It's always possible to add features in later PRs or later releases.

One thing to note is that if you are going to have a lot of variant forms of roughly the same thing, for explicit dispatch I have written a library called variants that provides some syntactic sugar for adding variant methods.

Unfortunately it does not currently work with classmethods, but that's definitely coming (plus there are other ways to implement the same general concept without the variants library that will work with classmethods). Here are the slides for a lightning talk I gave about this, if you are interested.

@maartenbreddels maartenbreddels merged commit 377594c into jupyter-widgets:master Mar 28, 2018
@maartenbreddels
Copy link
Member

@pganssle you seemed to be on roll, didn't want to hold you back if you had energy left ;)

@sruthiiyer
Copy link

sruthiiyer commented Oct 1, 2020

Can someone tell how to add hyperlink on image in ipywidgets.

I tried the below syntax. This works perfectly on jupyter notebooks cell output. But on Voila, image is not getting displayed on Voila

out = '<html><head><title>HTML Image as link</title></head><body><a href="https://cat/"><img alt="Qries" src="cat.jpeg" width=130" height="280"></a></body></html>'
widgets.HTML(value=out)

@github-actions github-actions bot added the resolved-locked Closed issues are locked after 30 days inactivity. Please open a new issue for related discussion. label Feb 7, 2021
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 7, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
resolved-locked Closed issues are locked after 30 days inactivity. Please open a new issue for related discussion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Image does not accept URLs, only data
5 participants