Tom Lackner — VP Engineering — Classic.com — on Qdrant, NFT, challenges and joys of ML engineering
In the 5th episode of Vector Podcast I had a good chat with Tom Lackner on how he approached selecting a vector database and ended up (and happy so far!) with Qdrant DB. Qdrant is written in Rust and offers the feature set and a simple to use API you’ll need to build a neural search functionality in your product. If you are wondering which vector DB to pick for your project, I wrote this succinct comparison of 7 of them: https://towardsdatascience.com/milvus-pinecone-vespa-weaviate-vald-gsi-what-unites-these-buzz-words-and-what-makes-each-9c65a3bd0696
We also covered topics like CLIP (image & text model from OpenAI), engineering ML pipelines and researching ML models, but also search engines!
Watch this episode:
Or listen to it:
Here are the topics we covered:
Topics:
00:00 Intro
00:53 — Tom’ background in IT Engineering Management over 20 years
01:21 — What’s Classic and what kind of cars can one get there?
03:17 How does search flow look like on the site? Transition from Elasticsearch/Postgres to embeddings
04:12 — Typo-tolerance issue in search
05:49 — NFTs, https://lookpop.co/ and similarity search
07:38 — Tom’s experience with CLIP and how about MLOps
10:20 — Systems like determined.ai can help
11:36 — Can I buy an NFT on lookpop?
12:21 — Qdrant and support on Telegram
14:45 — Other vector DB options Tom looked at, and criteria for choosing Qdrant
21:06 — Rust as a preferred language for distributed and high-load search engines
21:59 — Modes of search: need-to and how-to combine sparse search (BM25) with neural search
26:30 — Query typos and byte-level encoding
28:12 — “The amount of effort going into training models is absurd”
33:16 — We need a confidence levels from sparse and dense retrieval
37:20 — Tom’s way of relying on query-document distance to measure the quality of query serving
40:03 — Read papers or code? ML vs Engineering
44:08 — Why Tom picked the field of ML / vector search?
46:15 — Tom’s announcement