Where I share about my journey in tech and life.
This article is writen manually and uses ai in order to check the spelling.
One year ago, I wrote a search engine for a company I am working on. After many times working other parts of the software I decided to test it and analyse the results... It was sad, really sad. Here is how I fixed it.
At Subalta, our goal is to help companies identify public funding. We try to clean the path of public funding, fraught with pitfalls.
To address that, I have developed a hybrid search engine combining eligibility filtering and relevance based ranking, allowing users to discover funding opportunities based on their project, company location, size, and other constraints.
The core problem quickly became obvious, the ranking algorithm was fundamentally broken. It surfaced irrelevant fundings and, more criticaly, failed to return the perfect match for the user.
The funding world is higly constrained, at each level, many programs apply to specific geographic zones (for example, Regional Aid Areas), company profiles, ... in this context, false positive are not tolerable.
For that reason, we have modeled a set of hard constraints based on specific informations comming from the funding. These hard constraints are translated to sql queries before the ranking step.
The first stage, enforce that a funding returned is eligible from the set of constraints point of view.
The query pathologies is an issue faced by every search engine, one thing hard to understand was that not all queries requied a response. Too vague queries were leading to nothing but noise. Here is an example of the query users tried in the field describe you project:
These queries contains not actionnable signal about the user intent. To tackle this issue, we developped a request preprocessor that uses embeddings to determine whether the query is precise enough for the search engine and eventually return a bad request.