Skip to content
This repository has been archived by the owner on Apr 30, 2024. It is now read-only.

A fuzzy string comparison library for Elixir

License

Notifications You must be signed in to change notification settings

grain-team/fuzzy_compare

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FuzzyCompare

Getting started

In order to compare two strings with each other do the following:

iex> FuzzyCompare.similarity("Oscar-Claude Monet", "monet, claude")
0.95

Inner workings

Imagine you had to match some names.

Try to match the following list of painters:

  • "Oscar-Claude Monet"
  • "Edouard Manet"
  • "Monet, Claude"

For a human it is easy to see that some of the names have just been flipped and that others are different but similar sounding.

A first approrach could be to compare the strings with a string similarity function like the Jaro-Winkler function.

iex> String.jaro_distance("Oscar-Claude Monet", "Monet, Claude")
0.6032763532763533

iex> String.jaro_distance("Oscar-Claude Monet", "Edouard Manet")
0.6749287749287749

This is not an improvement over exact equality.

In order to improve the results this library uses two different approaches, FuzzyCompare.ChunkSet and FuzzyCompare.SortedChunks.

Sorted chunks

This approach yields good results when words within a string have been shuffled around. The strategy will sort all substrings by words and compare the sorted strings.

iex> FuzzyCompare.SortedChunks.substring_similarity("Oscar-Claude Monet", "Monet, Claude")
1.0

iex(4)> FuzzyCompare.SortedChunks.substring_similarity("Oscar-Claude Monet", "Edouard Manet")
0.6944444444444443

Chunkset

The chunkset approach is best in scenarios when the strings contain other substrings that are not relevant to what is being searched for.

iex> FuzzyCompare.ChunkSet.standard_similarity("Claude Monet", "Alice Hoschedé was the wife of Claude Monet")
1.0

Substring comparison

Should one of the strings be much longer than the other the library will attempt to compare matching substrings only.

Credits

This library is inspired by a seatgeek blogpost from 2011.

About

A fuzzy string comparison library for Elixir

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Elixir 100.0%