-
Notifications
You must be signed in to change notification settings - Fork 776
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Llama 3.2-Vision Implementation #1160
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ Changes requested. Reviewed everything up to 6c646b5 in 1 minute and 7 seconds
More details
- Looked at
1058
lines of code in11
files - Skipped
0
files when reviewing. - Skipped posting
6
drafted comments based on config settings.
1. skyvern/forge/sdk/api/llm/utils.py:17
- Draft comment:
Theadd_assistant_prefix
parameter is not used for Llama models in this function. Consider removing it to avoid confusion. - Reason this comment was not posted:
Confidence changes required:50%
Thellm_messages_builder
function inutils.py
has a redundant parameteradd_assistant_prefix
for Llama models, as it is not used in the Llama-specific logic. This can be misleading and should be removed for clarity.
2. skyvern/forge/sdk/api/llm/utils.py:96
- Draft comment:
Theadd_assistant_prefix
parameter is not used for Llama models in this function. Consider removing it to avoid confusion. - Reason this comment was not posted:
Confidence changes required:50%
Thellm_messages_builder
function inutils.py
has a redundant parameteradd_assistant_prefix
for Llama models, as it is not used in the Llama-specific logic. This can be misleading and should be removed for clarity.
3. skyvern/forge/sdk/api/llm/api_handler_factory.py:101
- Draft comment:
Theadd_assistant_prefix
parameter is not used for Llama models inllm_messages_builder
. Consider removing it to avoid confusion. - Reason this comment was not posted:
Confidence changes required:50%
Thellm_messages_builder
function inapi_handler_factory.py
is called withadd_assistant_prefix
for Llama models, but this parameter is not used in the Llama-specific logic. This can be misleading and should be removed for clarity.
4. skyvern/forge/sdk/api/llm/api_handler_factory.py:159
- Draft comment:
Theadd_assistant_prefix
parameter is not used for Llama models inllm_messages_builder
. Consider removing it to avoid confusion. - Reason this comment was not posted:
Confidence changes required:50%
Thellm_messages_builder
function inapi_handler_factory.py
is called withadd_assistant_prefix
for Llama models, but this parameter is not used in the Llama-specific logic. This can be misleading and should be removed for clarity.
5. skyvern/forge/sdk/api/llm/llama_handler.py:25
- Draft comment:
Consider making the API endpoint URL configurable instead of hardcoding it tohttp://localhost:11434/api/chat
. - Reason this comment was not posted:
Confidence changes required:50%
Thellama_handler
function inllama_handler.py
uses a hardcoded URL for the API endpoint. This should be configurable to allow flexibility in different environments.
6. skyvern/__init__.py:23
- Draft comment:
Thellama_handler
function is not implemented. Ensure this is completed to provide the necessary functionality. - Reason this comment was not posted:
Comment did not seem useful.
Workflow ID: wflow_NRvLU0214y5uG6Ns
Want Ellipsis to fix these issues? Tag @ellipsis-dev
in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
@@ -11,3 +13,12 @@ | |||
}, | |||
) | |||
setup_logger() | |||
|
|||
async def llama_handler( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new llama_handler
function appears to duplicate existing functionality in skyvern/forge/sdk/api/llm/llama_handler.py
. Consider reusing or extending the existing function instead of adding a new one.
- function
llama_handler
(llama_handler.py)
…. Closer to success with extract-action.
I've encountered similar problems with context size, otherwise the ollama integration seems to be working solely with some additions to config.py, .env, and config_registry.py. Is there a way to limit the context size sent by the application? (I increased Ollama context limit to 16k but context sometimes explodes to 450k tokens) |
I have yet to find a way to address that. The prompt length required for some jobs gets quite large using llama and I do not know why at this point. Also, llama struggles in particular with the extract-action prompt, and doesn't like to output the required JSON. Sometimes, it will also provide fake element ID's. Im not sure this model is capable of doing the job, but I am also new to utilizing these LLM's. Any input here would be appreciated. |
Ollama does provide other vision models, have you tried llava or the 90B version of llama3.2 vision? |
Yes, 90B is just as clueless as 11b sadly. I had the same results. |
That's unfortunate, could it be a quantization issue? Ollama models usually comes in Q4, perhaps that's doing weird things? |
Honestly, I do not know. I am new to the LLM space. I will look into it. Thanks! |
Best of luck! |
what about qwen-vl? |
@rcmorano Qwen2-VL does not work with Ollama or llama.cpp yet afaik. |
Ollama supports inference of custom models through the ‘ollama create’ endpoint |
Using Ollama, I keep getting errors like this. level=WARN source=runner.go:126 msg="truncating input prompt" limit=128000 prompt=781154 numKeep=5. That prompt length is insane. Does anyone have any idea what's going on here? |
exceeded context length of the model in question its trying to get message history for the last 5 messages, and the total of those exceeded the 128k context length |
Right. I understand that. I just don't understand how to fix this or why this happens using any model within Ollama. These prompt lengths are outrageous. |
Not sure what skyvern is doing, can you give an example of what the prompt is like? |
that doesn't work for vision models, yet. Each VLM has a different enough architecture to be a pain in the ass. Might need to fork Ollama and implement it myself. |
pity, although ollama is fast evolving. Who knows? Perhaps in a few months |
Im thinking of closing this branch down. The goal here should simply be to plug Ollama in. Any rabbit holes in terms of Ollama's capabilities should not be hard coded here. |
Def drop a new link in here if you open a new pr! Can’t wait to see ollama powering skyvern. |
This is a work in progress. Simply got it working. Need to refactor the code back to best practices after the Llama model functions properly and handles all requests made of it. Here are some things I have noted so far.
Important
Integrate Llama 3.2-Vision model by adding handlers, configurations, and environment setups while disabling other providers.
llama_handler
inllama_handler.py
to handle Llama 3.2-Vision API requests.api_handler_factory.py
to registerllama_handler
if Llama is enabled.config_registry.py
to register Llama configuration withLLMConfigRegistry
.config.py
andsettings_manager.py
to include Llama-specific settings and disable other providers..env.example
.Dockerfile
anddocker-compose.yml
to support Llama setup.setup.sh
to configure Llama during setup process.llm_messages_builder
inutils.py
to handle Llama-specific message formatting.parse_api_response
inutils.py
to parse Llama responses correctly.This description was created by for 6c646b5. It will automatically update as commits are pushed.