Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement HuggingFaceVisualQuestionAnsweringRemoteInferencePromptSchema #1396

Merged
merged 1 commit into from
Mar 4, 2024

Conversation

rholinshead
Copy link
Contributor

@rholinshead rholinshead commented Mar 4, 2024

Implement HuggingFaceVisualQuestionAnsweringRemoteInferencePromptSchema

Implement HuggingFaceVisualQuestionAnsweringRemoteInferencePromptSchema

For #1392

This will add the prompt schema so that visual question answering prompts have the nice UI for input and settings


Stack created with Sapling. Best reviewed with ReviewStack.

Copy link
Member

@Ankush-lastmile Ankush-lastmile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, we might want to revisit how text + other modality input box renders

Screenshot 2024-03-04 at 4 56 29 PM

type: "string",
description: `Hugging Face model to use. Can be a model ID hosted on the Hugging Face Hub or a URL
to a deployed Inference Endpoint`,
default: "dandelin/vilt-b32-finetuned-vqa",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this model work fine on the free inference endpoint?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep -
Screenshot 2024-03-04 at 5 01 40 PM

@rholinshead
Copy link
Contributor Author

rholinshead commented Mar 4, 2024

lgtm, we might want to revisit how text + other modality input box renders

Screenshot 2024-03-04 at 4 56 29 PM

Ya, I think we can improve it a bit. Probably makes more sense to have attachment first since it seems to be the main focus?

# Implement HuggingFaceVisualQuestionAnsweringRemoteInferencePromptSchema

For #1392

This will add the prompt schema so that visual question answering prompts have the nice UI for input and settings
@rholinshead rholinshead merged commit c843134 into main Mar 4, 2024
2 checks passed
rholinshead added a commit that referenced this pull request Mar 4, 2024
# Render Attachment before Data When Both Exist in Input

See comment in
#1396 (review)

When we have a Prompt Input with both attachment and data, most likely
the attachment will be the main focus (e.g. Visual Question Answering)
so we can show it first in the prompt input:

![Screenshot 2024-03-04 at 5 08 17
PM](https://github.com/lastmile-ai/aiconfig/assets/5060851/40ec0a8d-9255-44b7-aebb-f0e46dee44aa)
![Screenshot 2024-03-04 at 5 08 44
PM](https://github.com/lastmile-ai/aiconfig/assets/5060851/9568cae3-e48f-4a1d-a608-e011234c83b4)

Was initially thinking we could order it based on the properties order
in the schema, but just doing this for now since property order in
schema may not be guaranteed when we define it in python and then
serialize it to to client (will just be passing JSON object in response)

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/lastmile-ai/aiconfig/pull/1397).
* __->__ #1397
* #1396
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add visual-question-answering / multimodal support to gradio notebook tasks
2 participants