Skip to content

Commit

Permalink
.Net: Add PostgresVectorStore Memory connector. (#9324)
Browse files Browse the repository at this point in the history
This PR adds a PostgresVectorStore and related classes to
Microsoft.SemanticKernel.Connectors.Postgres.

### Motivation and Context

As part of the move to having memory connectors implement the new
Microsoft.Extensions.VectorData.IVectorStore architecture (see
https://github.com/microsoft/semantic-kernel/blob/main/docs/decisions/0050-updated-vector-store-design.md),
each memory connector needs to be updated with the new architecture.
This PR tackles updating the existing
Microsoft.SemanticKernel.Connectors.Postgres package to include this
implementation. This will supercede the PostgresMemoryStore
implementation.

Some high level comments about design:
- PostgresVectorStore and PostgresVectorStoreRecordCollection get
injected with an IPostgresVectorStoreDbClient. This abstracts the
database communication and allows for unit tests to mock database
interactions.
- The PostgresVectorStoreDbClient gets passed in a NpgsqlDataSource from
the user, which is used to manage connections to the database. The
responsibility of connection pool lifecycle management is on the user.
- The IPostgresVectorStoreDbClient is designed to accept and produce the
storage model, which in this case is a Dictionary<string, object?> .
This is the intermediate type that is mapped to by the
IVectorStoreRecordMapper.
- The PostgresVectorStoreDbClient also takes a
IPostgresVectorStoreCollectionSqlBuilder, which generates SQL command
information for interacting with the database. This abstracts the SQL
queries related to each task, and allows for future expansion. This is
particularly targeted at creating a AzureDBForPostgre vector store that
will enable alternate vector implementations like
[DiskANN](https://techcommunity.microsoft.com/t5/azure-database-for-postgresql/introducing-diskann-vector-index-in-azure-database-for/ba-p/4261192),
while leveraging the same database client as the Postgres connector.
-  The integration tests for the vector store utilize Docker.Net to
bring up a pgvector/pgvector docker container, which test are run
against.

### Contribution Checklist

<!-- Before submitting this PR, please make sure: -->

- [x] The code builds clean without any errors or warnings
- [x] The PR follows the [SK Contribution
Guidelines](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md)
and the [pre-submission formatting
script](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md#development-scripts)
raises no violations
- [x] All unit tests pass, and I have added new tests where possible
- [x] I didn't break anyone 😄

---------

Co-authored-by: Rob Emanuele <[email protected]>
Co-authored-by: Dmytro Struk <[email protected]>
  • Loading branch information
3 people authored Dec 16, 2024
1 parent 12a4d40 commit c7a371e
Show file tree
Hide file tree
Showing 42 changed files with 5,074 additions and 72 deletions.
9 changes: 9 additions & 0 deletions dotnet/SK-dotnet.sln
Original file line number Diff line number Diff line change
Expand Up @@ -411,6 +411,8 @@ Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "AotCompatibility", "samples
EndProject
Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "SemanticKernel.AotTests", "src\SemanticKernel.AotTests\SemanticKernel.AotTests.csproj", "{39EAB599-742F-417D-AF80-95F90376BB18}"
EndProject
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "Connectors.Postgres.UnitTests", "src\Connectors\Connectors.Postgres.UnitTests\Connectors.Postgres.UnitTests.csproj", "{232E1153-6366-4175-A982-D66B30AAD610}"
EndProject
Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "Process.Utilities.UnitTests", "src\Experimental\Process.Utilities.UnitTests\Process.Utilities.UnitTests.csproj", "{DAC54048-A39A-4739-8307-EA5A291F2EA0}"
EndProject
Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "GettingStartedWithVectorStores", "samples\GettingStartedWithVectorStores\GettingStartedWithVectorStores.csproj", "{8C3DE41C-E2C8-42B9-8638-574F8946EB0E}"
Expand Down Expand Up @@ -1074,6 +1076,12 @@ Global
{6F591D05-5F7F-4211-9042-42D8BCE60415}.Publish|Any CPU.Build.0 = Debug|Any CPU
{6F591D05-5F7F-4211-9042-42D8BCE60415}.Release|Any CPU.ActiveCfg = Release|Any CPU
{6F591D05-5F7F-4211-9042-42D8BCE60415}.Release|Any CPU.Build.0 = Release|Any CPU
{232E1153-6366-4175-A982-D66B30AAD610}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{232E1153-6366-4175-A982-D66B30AAD610}.Debug|Any CPU.Build.0 = Debug|Any CPU
{232E1153-6366-4175-A982-D66B30AAD610}.Publish|Any CPU.ActiveCfg = Debug|Any CPU
{232E1153-6366-4175-A982-D66B30AAD610}.Publish|Any CPU.Build.0 = Debug|Any CPU
{232E1153-6366-4175-A982-D66B30AAD610}.Release|Any CPU.ActiveCfg = Release|Any CPU
{232E1153-6366-4175-A982-D66B30AAD610}.Release|Any CPU.Build.0 = Release|Any CPU
{E82B640C-1704-430D-8D71-FD8ED3695468}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{E82B640C-1704-430D-8D71-FD8ED3695468}.Debug|Any CPU.Build.0 = Debug|Any CPU
{E82B640C-1704-430D-8D71-FD8ED3695468}.Publish|Any CPU.ActiveCfg = Debug|Any CPU
Expand Down Expand Up @@ -1311,6 +1319,7 @@ Global
{E82B640C-1704-430D-8D71-FD8ED3695468} = {5A7028A7-4DDF-4E4F-84A9-37CE8F8D7E89}
{6ECFDF04-2237-4A85-B114-DAA34923E9E6} = {5D4C0700-BBB5-418F-A7B2-F392B9A18263}
{39EAB599-742F-417D-AF80-95F90376BB18} = {831DDCA2-7D2C-4C31-80DB-6BDB3E1F7AE0}
{232E1153-6366-4175-A982-D66B30AAD610} = {0247C2C9-86C3-45BA-8873-28B0948EDC0C}
{DAC54048-A39A-4739-8307-EA5A291F2EA0} = {0D8C6358-5DAA-4EA6-A924-C268A9A21BC9}
{8C3DE41C-E2C8-42B9-8638-574F8946EB0E} = {FA3720F1-C99A-49B2-9577-A940257098BF}
{DB58FDD0-308E-472F-BFF5-508BC64C727E} = {0D8C6358-5DAA-4EA6-A924-C268A9A21BC9}
Expand Down
3 changes: 3 additions & 0 deletions dotnet/samples/Concepts/Concepts.csproj
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,9 @@
</ItemGroup>

<ItemGroup>
<None Update="appsettings.Development.json">
<CopyToOutputDirectory>Always</CopyToOutputDirectory>
</None>
<None Include="Resources\Plugins\ApiManifestPlugins\**\apimanifest.json">
<CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
</None>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,51 @@ namespace Memory.VectorStoreFixtures;
/// </summary>
internal static class VectorStoreInfra
{
/// <summary>
/// Setup the postgres pgvector container by pulling the image and running it.
/// </summary>
/// <param name="client">The docker client to create the container with.</param>
/// <returns>The id of the container.</returns>
public static async Task<string> SetupPostgresContainerAsync(DockerClient client)
{
await client.Images.CreateImageAsync(
new ImagesCreateParameters
{
FromImage = "pgvector/pgvector",
Tag = "pg16",
},
null,
new Progress<JSONMessage>());

var container = await client.Containers.CreateContainerAsync(new CreateContainerParameters()
{
Image = "pgvector/pgvector:pg16",
HostConfig = new HostConfig()
{
PortBindings = new Dictionary<string, IList<PortBinding>>
{
{"5432", new List<PortBinding> {new() {HostPort = "5432" } }},
},
PublishAllPorts = true
},
ExposedPorts = new Dictionary<string, EmptyStruct>
{
{ "5432", default },
},
Env = new List<string>
{
"POSTGRES_USER=postgres",
"POSTGRES_PASSWORD=example",
},
});

await client.Containers.StartContainerAsync(
container.ID,
new ContainerStartParameters());

return container.ID;
}

/// <summary>
/// Setup the qdrant container by pulling the image and running it.
/// </summary>
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
// Copyright (c) Microsoft. All rights reserved.

using Docker.DotNet;
using Npgsql;

namespace Memory.VectorStoreFixtures;

/// <summary>
/// Fixture to use for creating a Postgres container before tests and delete it after tests.
/// </summary>
public class VectorStorePostgresContainerFixture : IAsyncLifetime
{
private DockerClient? _dockerClient;
private string? _postgresContainerId;

public async Task InitializeAsync()
{
}

public async Task ManualInitializeAsync()
{
if (this._postgresContainerId == null)
{
// Connect to docker and start the docker container.
using var dockerClientConfiguration = new DockerClientConfiguration();
this._dockerClient = dockerClientConfiguration.CreateClient();
this._postgresContainerId = await VectorStoreInfra.SetupPostgresContainerAsync(this._dockerClient);

// Delay until the Postgres server is ready.
var connectionString = TestConfiguration.Postgres.ConnectionString;
var succeeded = false;
var attemptCount = 0;
while (!succeeded && attemptCount++ < 10)
{
try
{
NpgsqlDataSourceBuilder dataSourceBuilder = new(connectionString);
dataSourceBuilder.UseVector();
using var dataSource = dataSourceBuilder.Build();
NpgsqlConnection connection = await dataSource.OpenConnectionAsync().ConfigureAwait(false);

await using (connection)
{
// Create extension vector if it doesn't exist
await using (NpgsqlCommand command = new("CREATE EXTENSION IF NOT EXISTS vector", connection))
{
await command.ExecuteNonQueryAsync();
}
}
}
catch (Exception)
{
await Task.Delay(1000);
}
}
}
}

public async Task DisposeAsync()
{
if (this._dockerClient != null && this._postgresContainerId != null)
{
// Delete docker container.
await VectorStoreInfra.DeleteContainerAsync(this._dockerClient, this._postgresContainerId);
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
// Copyright (c) Microsoft. All rights reserved.

using Azure.Identity;
using Memory.VectorStoreFixtures;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.Connectors.AzureOpenAI;
using Microsoft.SemanticKernel.Connectors.Postgres;
using Npgsql;

namespace Memory;

/// <summary>
/// An example showing how to use common code, that can work with any vector database, with a Postgres database.
/// The common code is in the <see cref="VectorStore_VectorSearch_MultiStore_Common"/> class.
/// The common code ingests data into the vector store and then searches over that data.
/// This example is part of a set of examples each showing a different vector database.
///
/// For other databases, see the following classes:
/// <para><see cref="VectorStore_VectorSearch_MultiStore_AzureAISearch"/></para>
/// <para><see cref="VectorStore_VectorSearch_MultiStore_Redis"/></para>
/// <para><see cref="VectorStore_VectorSearch_MultiStore_InMemory"/></para>
///
/// To run this sample, you need a local instance of Docker running, since the associated fixture will try and start a Postgres container in the local docker instance.
/// </summary>
public class VectorStore_VectorSearch_MultiStore_Postgres(ITestOutputHelper output, VectorStorePostgresContainerFixture PostgresFixture) : BaseTest(output), IClassFixture<VectorStorePostgresContainerFixture>
{
[Fact]
public async Task ExampleWithDIAsync()
{
// Use the kernel for DI purposes.
var kernelBuilder = Kernel
.CreateBuilder();

// Register an embedding generation service with the DI container.
kernelBuilder.AddAzureOpenAITextEmbeddingGeneration(
deploymentName: TestConfiguration.AzureOpenAIEmbeddings.DeploymentName,
endpoint: TestConfiguration.AzureOpenAIEmbeddings.Endpoint,
credential: new AzureCliCredential());

// Initialize the Postgres docker container via the fixtures and register the Postgres VectorStore.
await PostgresFixture.ManualInitializeAsync();
kernelBuilder.Services.AddPostgresVectorStore(TestConfiguration.Postgres.ConnectionString);

// Register the test output helper common processor with the DI container.
kernelBuilder.Services.AddSingleton<ITestOutputHelper>(this.Output);
kernelBuilder.Services.AddTransient<VectorStore_VectorSearch_MultiStore_Common>();

// Build the kernel.
var kernel = kernelBuilder.Build();

// Build a common processor object using the DI container.
var processor = kernel.GetRequiredService<VectorStore_VectorSearch_MultiStore_Common>();

// Run the process and pass a key generator function to it, to generate unique record keys.
// The key generator function is required, since different vector stores may require different key types.
// E.g. Postgres supports Guid and ulong keys, but others may support strings only.
await processor.IngestDataAndSearchAsync("skglossaryWithDI", () => Guid.NewGuid());
}

[Fact]
public async Task ExampleWithoutDIAsync()
{
// Create an embedding generation service.
var textEmbeddingGenerationService = new AzureOpenAITextEmbeddingGenerationService(
TestConfiguration.AzureOpenAIEmbeddings.DeploymentName,
TestConfiguration.AzureOpenAIEmbeddings.Endpoint,
new AzureCliCredential());

// Initialize the Postgres docker container via the fixtures and construct the Postgres VectorStore.
await PostgresFixture.ManualInitializeAsync();
var dataSourceBuilder = new NpgsqlDataSourceBuilder(TestConfiguration.Postgres.ConnectionString);
dataSourceBuilder.UseVector();
await using var dataSource = dataSourceBuilder.Build();
var vectorStore = new PostgresVectorStore(dataSource);

// Create the common processor that works for any vector store.
var processor = new VectorStore_VectorSearch_MultiStore_Common(vectorStore, textEmbeddingGenerationService, this.Output);

// Run the process and pass a key generator function to it, to generate unique record keys.
// The key generator function is required, since different vector stores may require different key types.
// E.g. Postgres supports Guid and ulong keys, but others may support strings only.
await processor.IngestDataAndSearchAsync("skglossaryWithoutDI", () => Guid.NewGuid());
}
}
82 changes: 82 additions & 0 deletions dotnet/samples/Concepts/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -215,3 +215,85 @@ dotnet test -l "console;verbosity=detailed" --filter "FullyQualifiedName=ChatCom
- [OpenAI_TextToImage](https://github.com/microsoft/semantic-kernel/blob/main/dotnet/samples/Concepts/TextToImage/OpenAI_TextToImage.cs)
- [OpenAI_TextToImageLegacy](https://github.com/microsoft/semantic-kernel/blob/main/dotnet/samples/Concepts/TextToImage/OpenAI_TextToImageLegacy.cs)
- [AzureOpenAI_TextToImage](https://github.com/microsoft/semantic-kernel/blob/main/dotnet/samples/Concepts/TextToImage/AzureOpenAI_TextToImage.cs)

## Configuration

### Option 1: Use Secret Manager

Concept samples will require secrets and credentials, to access OpenAI, Azure OpenAI,
Bing and other resources.

We suggest using .NET [Secret Manager](https://learn.microsoft.com/en-us/aspnet/core/security/app-secrets)
to avoid the risk of leaking secrets into the repository, branches and pull requests.
You can also use environment variables if you prefer.

To set your secrets with Secret Manager:

```
cd dotnet/src/samples/Concepts
dotnet user-secrets init
dotnet user-secrets set "OpenAI:ServiceId" "gpt-3.5-turbo-instruct"
dotnet user-secrets set "OpenAI:ModelId" "gpt-3.5-turbo-instruct"
dotnet user-secrets set "OpenAI:ChatModelId" "gpt-4"
dotnet user-secrets set "OpenAI:ApiKey" "..."
...
```

### Option 2: Use Configuration File
1. Create a `appsettings.Development.json` file next to the `Concepts.csproj` file. This file will be ignored by git,
the content will not end up in pull requests, so it's safe for personal settings. Keep the file safe.
2. Edit `appsettings.Development.json` and set the appropriate configuration for the samples you are running.

For example:

```json
{
"OpenAI": {
"ServiceId": "gpt-3.5-turbo-instruct",
"ModelId": "gpt-3.5-turbo-instruct",
"ChatModelId": "gpt-4",
"ApiKey": "sk-...."
},
"AzureOpenAI": {
"ServiceId": "azure-gpt-35-turbo-instruct",
"DeploymentName": "gpt-35-turbo-instruct",
"ChatDeploymentName": "gpt-4",
"Endpoint": "https://contoso.openai.azure.com/",
"ApiKey": "...."
},
// etc.
}
```

### Option 3: Use Environment Variables
You may also set the settings in your environment variables. The environment variables will override the settings in the `appsettings.Development.json` file.

When setting environment variables, use a double underscore (i.e. "\_\_") to delineate between parent and child properties. For example:

- bash:

```bash
export OpenAI__ApiKey="sk-...."
export AzureOpenAI__ApiKey="...."
export AzureOpenAI__DeploymentName="gpt-35-turbo-instruct"
export AzureOpenAI__ChatDeploymentName="gpt-4"
export AzureOpenAIEmbeddings__DeploymentName="azure-text-embedding-ada-002"
export AzureOpenAI__Endpoint="https://contoso.openai.azure.com/"
export HuggingFace__ApiKey="...."
export Bing__ApiKey="...."
export Postgres__ConnectionString="...."
```

- PowerShell:

```ps
$env:OpenAI__ApiKey = "sk-...."
$env:AzureOpenAI__ApiKey = "...."
$env:AzureOpenAI__DeploymentName = "gpt-35-turbo-instruct"
$env:AzureOpenAI__ChatDeploymentName = "gpt-4"
$env:AzureOpenAIEmbeddings__DeploymentName = "azure-text-embedding-ada-002"
$env:AzureOpenAI__Endpoint = "https://contoso.openai.azure.com/"
$env:HuggingFace__ApiKey = "...."
$env:Bing__ApiKey = "...."
$env:Postgres__ConnectionString = "...."
```
Original file line number Diff line number Diff line change
Expand Up @@ -29,4 +29,9 @@
<ProjectReference Include="..\..\SemanticKernel.Core\SemanticKernel.Core.csproj" />
</ItemGroup>

<ItemGroup>
<InternalsVisibleTo Include="SemanticKernel.Connectors.Postgres.UnitTests" />
<InternalsVisibleTo Include="DynamicProxyGenAssembly2" />
</ItemGroup>

</Project>
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
namespace Microsoft.SemanticKernel.Connectors.Postgres;

/// <summary>
/// Interface for client managing postgres database operations.
/// Interface for client managing postgres database operations for <see cref="PostgresMemoryStore"/>.
/// </summary>
public interface IPostgresDbClient
{
Expand Down
Loading

0 comments on commit c7a371e

Please sign in to comment.