Jeremy Trips

Where I share about my journey in tech and life.

This article is writen manually and uses ai in order to check the spelling.

String metric

End goal is to measure the the similiratiy between two strings.

I needed this to match the profile of companies in two different datasets. The end goal was to match the profile of a company in our software with the european public funding registry.

We are here talking about a statistical approach of string comparison, not a semantic one.

The most known matrics in the Levenshtein distance

I am aware of TF-IDF and word embedding approaches, but they does not suit my case as the matching will most certainly be based on small amount of text like Company name, vat number and website.

pg_trgm with efcore

TrigramsSimilarity vs TrigramsWordSimilarity

sources

https://en.wikipedia.org/wiki/String_metric