Around two weeks ago I had a big honor to deliver the keynote at the Haystack EU conference held in Berlin, Germany.
In it, I’ve shared some of my learnings from interviewing Vector Search makers in the Vector Podcast. Worth sharing them here:
- Malte Pietsch (CTO Deepset, Haystack): Metric blindness
- Jo Bergum (Vespa, Yahoo): Keep your ears to the ground and don’t sell hype
- Max Irwin (Mighty): Why Vector Search / AI has to be locked only to Python?
- Doug Turnbull (Shopify): Stop talking about yourself as a Vector Database and switch to Relevance oriented applications
This is very much still an emerging space, that aims to conquer some of the toughest problems in Search, that each year, as an example, lead to search abandonment and $300 billion dollar opportunity, just in the US retail (according to McKinsey & Co and Google, 2021 research).
Because it is emerging, it is important to keep researching and sharing, without selling hype to your customers and followers. Of course, there is a business need to do marketing of the proposed solutions, and that’s all right. But don’t go too far, saying that all problems been solved. For example: symbolic filtering contradicts the nature of the geometric space, and there is no easy way to escape that. Sharing also makes our collective ride more optimal, for example it helps to know, that for hybrid search there are several methods, not just one or two (linear and RRF):
I also emphasized the need to do fundamental research, illustrating our work on BuddyPQ algorithm last year in the billion-scale vector search competition. You don’t always need to invent a completely new algorithm: it can be an efficient modification of the existing one. It is of course understandable that startups won’t have the ability to focus on research only: they need to win the deals. But at the same time, turning this into the game of taking from research done elsewhere and making it work for your clients — you can only walk this far until your competitor does the same.
I also keep suggesting the way to glue things together in the space by offering this nice pyramid:
The numbers attached to pyramid levels here mean how open each level is: do you have its source code and / or the data, or is the principle algorithm known to the public. And as you can see, the higher in the pyramid you are, the less open it is. In other words, if you are sitting next to the user, you probably don’t disclose / share with the community as much, as when you sit very far away (in the ANN algorithm space).
I also showed a multimodal and multilingual search demo to spice things up.
Here is the full recording:
Thanks to the OSC team for making this event happen and for practical support during the event: Charlie Hull, Daniel Wrigley, René Kriegler and Aditya Bhise.