GitHub - hotchpotch/yasem: YASEM - Yet Another Splade|Sparse Embedder - A simple and efficient library for SPLADE embeddings

YASEM (Yet Another Splade|Sparse Embedder)

YASEM is a simple and efficient library for executing SPLADE (Sparse Lexical and Expansion Model for Information Retrieval) and creating sparse vectors. It provides a straightforward interface inspired by SentenceTransformers for easy integration into your projects.

Why YASEM?

Simplicity: YASEM focuses on providing a clean and simple implementation of SPLADE without unnecessary complexity.
Efficiency: Generate sparse embeddings quickly and easily.
Flexibility: Works with both NumPy and PyTorch backends.
Convenience: Includes helpful utilities like get_token_values for inspecting feature representations.

Installation

You can install YASEM using pip:

pip install yasem

Quick Start

Here's a simple example of how to use YASEM:

from yasem import SpladeEmbedder

# Initialize the embedder
embedder = SpladeEmbedder("naver/splade-v3")

# Prepare some sentences
sentences = [
    "Hello, my dog is cute",
    "Hello, my cat is cute",
    "Hello, I like a ramen",
    "Hello, I like a sushi",
]

# Generate embeddings
embeddings = embedder.encode(sentences)
# or sparse csr matrix
# embeddings = embedder.encode(sentences, convert_to_csr_matrix=True)

# Compute similarity
similarity = embedder.similarity(embeddings, embeddings)
print(similarity)
# [[148.62903569 106.88184372  18.86930016  22.87525314]
#  [106.88184372 122.79656474  17.45339064  21.44758757]
#  [ 18.86930016  17.45339064  61.00272733  40.92700849]
#  [ 22.87525314  21.44758757  40.92700849  73.98511539]]


# Inspect token values for the first sentence
token_values = embedder.get_token_values(embeddings[0])
print(token_values)
# {'hello': 6.89453125, 'dog': 6.48828125, 'cute': 4.6015625,
#  'message': 2.38671875, 'greeting': 2.259765625,
#    ...

token_values = embedder.get_token_values(embeddings[3])
print(token_values)
# {'##shi': 3.63671875, 'su': 3.470703125, 'eat': 3.25,
#  'hello': 2.73046875, 'you': 2.435546875, 'like': 2.26953125, 'taste': 1.8203125,

rank API

# Rank documents based on query
query = "What programming language is best for machine learning?"
documents = [
   "Python is widely used in machine learning due to its extensive libraries like TensorFlow and PyTorch",
   "JavaScript is primarily used for web development and front-end applications", 
   "SQL is essential for database management and data manipulation"
]

# Get ranked results with relevance scores
results = embedder.rank(query, documents)
print(results)
# [
#   {'corpus_id': 0, 'score': 12.453},  # Python/ML document ranks highest
#   {'corpus_id': 2, 'score': 5.234},
#   {'corpus_id': 1, 'score': 3.123}
# ]

# Get ranked results including document text
results = embedder.rank(query, documents, return_documents=True)
print(results)  
# [
#   {
#     'corpus_id': 0,
#     'score': 12.453,
#     'text': 'Python is widely used in machine learning due to its extensive libraries like TensorFlow and PyTorch'
#   },
#   {
#     'corpus_id': 2, 
#     'score': 5.234,
#     'text': 'SQL is essential for database management and data manipulation'
#   },
#   ...
# ]

Features

Easy-to-use API inspired by SentenceTransformers
Support for both NumPy and scipy.sparse.csr_matrix
Efficient dot product similarity computation
Utility function to inspect token values in embeddings

License

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Acknowledgements

This library is inspired by the SPLADE model and aims to provide a simple interface for its usage. Special thanks to the authors of the original SPLADE paper and the developers of the model.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
tests		tests
yasem		yasem
.gitignore		.gitignore
.python-version		.python-version
LICENSE.txt		LICENSE.txt
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

YASEM (Yet Another Splade|Sparse Embedder)

Why YASEM?

Installation

Quick Start

rank API

Features

License

Contributing

Acknowledgements

About

Releases

Packages

Languages

License

hotchpotch/yasem

Folders and files

Latest commit

History

Repository files navigation

YASEM (Yet Another Splade|Sparse Embedder)

Why YASEM?

Installation

Quick Start

rank API

Features

License

Contributing

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages