-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python: Include a function_invoke_attempt index with Streaming CMC #10009
base: main
Are you sure you want to change the base?
Conversation
Python Test Coverage Report •
Python Unit Test Overview
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some big questions for you :D
print("\n[No tool calls returned by the model]") | ||
|
||
|
||
async def handle_streaming( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this to me makes it seem it is quite complex and takes lots of code to make streaming function calling work, can't we do two samples, one fully auto and streaming only, the other non-auto-invoke with this stuff?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree we should break the samples into streaming and non-streaming. A similar structure already exists in the chat_completion
concept samples.
@@ -148,9 +148,10 @@ def _create_streaming_chat_message_content( | |||
chunk: ChatCompletionChunk, | |||
choice: ChunkChoice, | |||
chunk_metadata: dict[str, Any], | |||
function_invoke_attempt: int = 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we make function_invoke_attempt part of the settings/FunctionChoiceBehavior? this seems like it's quite messy?
@@ -51,6 +53,12 @@ class StreamingChatMessageContent(ChatMessageContent, StreamingContentMixin): | |||
__add__: Combines two StreamingChatMessageContent instances. | |||
""" | |||
|
|||
function_invoke_attempt: int | None = Field( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see why this is needed more for streaming, but wouldn't regular CMC also benefit from this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the regular CMC will benefit from this because the regular CMCs aren't chunks. They contain the full content, and users won't need to concatenate them thus they don't need to know which request attempt a message belong to. The ordering of the CMC in the chat history already encodes that information.
print("\n[No tool calls returned by the model]") | ||
|
||
|
||
async def handle_streaming( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree we should break the samples into streaming and non-streaming. A similar structure already exists in the chat_completion
concept samples.
@@ -154,6 +154,7 @@ async def _inner_get_streaming_chat_message_contents( | |||
self, | |||
chat_history: "ChatHistory", | |||
settings: "PromptExecutionSettings", | |||
function_invoke_attempt: int = 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When does a user want to pass in a value that's not 0?
@@ -51,6 +53,12 @@ class StreamingChatMessageContent(ChatMessageContent, StreamingContentMixin): | |||
__add__: Combines two StreamingChatMessageContent instances. | |||
""" | |||
|
|||
function_invoke_attempt: int | None = Field( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the regular CMC will benefit from this because the regular CMCs aren't chunks. They contain the full content, and users won't need to concatenate them thus they don't need to know which request attempt a message belong to. The ordering of the CMC in the chat history already encodes that information.
Motivation and Context
During auto function calling, we're yielding all messages back without any indication as to which invocation index they are related to. This information could be helpful to the caller to understand in which order message chunks were received during the auto function invocation loop.
Depending upon the behavior of auto function calling, the
request_index
iterates up to themaximum_auto_invoke_attempts
. The caller doesn't know today which function auto invoke attempt they're currently on -- so simply handing all yielded messages can be confusing. In a new PR, we will handle adding therequest_index
(perhaps with a different name) to make it easier to know which streaming message chunks to concatenate, which should help reduce the confusion down the line.Description
This PR adds:
function_invoke_attempt
attribute to theStreamingChatMessageContent
class. This can help callers/users track which streaming chat message chunks belong to which auto function invocation attempt._inner_get_streaming_chat_message_contents
to allow thefunction_invoke_attempt
int to be passed through to theStreamingChatMessageContent
creation in each AI Service. This additive keyword argument should not break existing.Contribution Checklist