Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PGVector Duplicates Entries #739

Open
MichaelMMeskhi opened this issue Dec 18, 2024 · 1 comment
Open

PGVector Duplicates Entries #739

MichaelMMeskhi opened this issue Dec 18, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@MichaelMMeskhi
Copy link

Describe the bug
When training the RAG layer for PGVector, it duplicates the entires. For instance in ChromaDB, duplicate entries are skipped over.

To Reproduce
Steps to reproduce the behavior:

  1. Run script to embed 10 documents into PGVector.
  2. Check Vanna app to confirm training data has 10 entries,
  3. Rerun training script
  4. Training data now has 20 entries (10 duplicates).

Expected behavior
Should skip duplicate embeddings.

Error logs/Screenshots
If applicable, add logs/screenshots to give more information about the issue.

Desktop (please complete the following information where):

  • OS: Ubuntu
  • Version: 24.04
  • Python: 3.11
  • Vanna: 0.7.5
@MichaelMMeskhi MichaelMMeskhi added the bug Something isn't working label Dec 18, 2024
@MichaelMMeskhi
Copy link
Author

For instance, when using ChromaDB, it warns the user that an existing embedding already exists and it skips it. While PGVector has no such warnings and somehow manages to slip the duplicates in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant