Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python: Include a function_invoke_attempt index with Streaming CMC #10009

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

moonbox3
Copy link
Contributor

Motivation and Context

During auto function calling, we're yielding all messages back without any indication as to which invocation index they are related to. This information could be helpful to the caller to understand in which order message chunks were received during the auto function invocation loop.

Depending upon the behavior of auto function calling, the request_index iterates up to the maximum_auto_invoke_attempts. The caller doesn't know today which function auto invoke attempt they're currently on -- so simply handing all yielded messages can be confusing. In a new PR, we will handle adding the request_index (perhaps with a different name) to make it easier to know which streaming message chunks to concatenate, which should help reduce the confusion down the line.

Description

This PR adds:

  • The function_invoke_attempt attribute to the StreamingChatMessageContent class. This can help callers/users track which streaming chat message chunks belong to which auto function invocation attempt.
  • A new keyword argument was added to the _inner_get_streaming_chat_message_contents to allow the function_invoke_attempt int to be passed through to the StreamingChatMessageContent creation in each AI Service. This additive keyword argument should not break existing.
  • Updates unit tests
  • Combines three previously distinct samples related to auto function calling into one sample, and allows the user to configure other chat completion services that support auto function calling.

Contribution Checklist

@moonbox3 moonbox3 self-assigned this Dec 18, 2024
@moonbox3 moonbox3 requested a review from a team as a code owner December 18, 2024 10:45
@markwallace-microsoft markwallace-microsoft added the python Pull requests for the Python Semantic Kernel label Dec 18, 2024
@markwallace-microsoft
Copy link
Member

Python Test Coverage

Python Test Coverage Report •
FileStmtsMissCoverMissing
semantic_kernel/connectors/ai
   chat_completion_client_base.py127298%406, 416
semantic_kernel/connectors/ai/anthropic/services
   anthropic_chat_completion.py158696%141, 147, 160, 166, 170, 382
semantic_kernel/connectors/ai/azure_ai_inference/services
   azure_ai_inference_chat_completion.py105694%110–113, 122, 144, 168
semantic_kernel/connectors/ai/bedrock/services
   bedrock_chat_completion.py1361490%117, 139, 164, 168–171, 229, 247–266, 325
semantic_kernel/connectors/ai/google/google_ai/services
   google_ai_chat_completion.py119497%126, 153, 179, 181
semantic_kernel/connectors/ai/google/vertex_ai/services
   vertex_ai_chat_completion.py119497%121, 148, 174, 176
semantic_kernel/connectors/ai/mistral_ai/services
   mistral_ai_chat_completion.py1223869%119–122, 132, 147–150, 165, 181–185, 200–208, 225–233, 246–259, 265, 274–278, 323–326
semantic_kernel/connectors/ai/ollama/services
   ollama_chat_completion.py1383475%116, 141, 145–146, 156, 169, 186, 206–207, 211, 224–234, 245–247, 258–267, 279, 289–290, 312, 323–324, 350, 359–367
semantic_kernel/connectors/ai/onnx/services
   onnx_gen_ai_chat_completion.py72692%69–70, 100, 126, 174, 180
semantic_kernel/connectors/ai/open_ai/services
   open_ai_chat_completion_base.py127794%71, 81, 102, 122, 143, 179, 291
semantic_kernel/contents
   streaming_chat_message_content.py70199%225
TOTAL16777184989% 

Python Unit Test Overview

Tests Skipped Failures Errors Time
2966 4 💤 0 ❌ 0 🔥 1m 14s ⏱️

Copy link
Member

@eavanvalkenburg eavanvalkenburg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some big questions for you :D

print("\n[No tool calls returned by the model]")


async def handle_streaming(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this to me makes it seem it is quite complex and takes lots of code to make streaming function calling work, can't we do two samples, one fully auto and streaming only, the other non-auto-invoke with this stuff?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we should break the samples into streaming and non-streaming. A similar structure already exists in the chat_completion concept samples.

@@ -148,9 +148,10 @@ def _create_streaming_chat_message_content(
chunk: ChatCompletionChunk,
choice: ChunkChoice,
chunk_metadata: dict[str, Any],
function_invoke_attempt: int = 0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we make function_invoke_attempt part of the settings/FunctionChoiceBehavior? this seems like it's quite messy?

@@ -51,6 +53,12 @@ class StreamingChatMessageContent(ChatMessageContent, StreamingContentMixin):
__add__: Combines two StreamingChatMessageContent instances.
"""

function_invoke_attempt: int | None = Field(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see why this is needed more for streaming, but wouldn't regular CMC also benefit from this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the regular CMC will benefit from this because the regular CMCs aren't chunks. They contain the full content, and users won't need to concatenate them thus they don't need to know which request attempt a message belong to. The ordering of the CMC in the chat history already encodes that information.

print("\n[No tool calls returned by the model]")


async def handle_streaming(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we should break the samples into streaming and non-streaming. A similar structure already exists in the chat_completion concept samples.

@@ -154,6 +154,7 @@ async def _inner_get_streaming_chat_message_contents(
self,
chat_history: "ChatHistory",
settings: "PromptExecutionSettings",
function_invoke_attempt: int = 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When does a user want to pass in a value that's not 0?

@@ -51,6 +53,12 @@ class StreamingChatMessageContent(ChatMessageContent, StreamingContentMixin):
__add__: Combines two StreamingChatMessageContent instances.
"""

function_invoke_attempt: int | None = Field(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the regular CMC will benefit from this because the regular CMCs aren't chunks. They contain the full content, and users won't need to concatenate them thus they don't need to know which request attempt a message belong to. The ordering of the CMC in the chat history already encodes that information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python Pull requests for the Python Semantic Kernel
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

4 participants