Vector Podcast episode with Connor Shorten

Dmitry Kan
2 min readMar 8, 2023

Connor is one of the top professionals in the Vector Search field I know for two years now. He has a background in Academia working on deep learning, which is a definite unfair advantage when it comes to the core of this field. He is not only able to recommend you to read a paper or a survey, but connect the dots beyond papers and make his own predictions (tweet link). At the same time, working for Weaviate as a Research Scientist, he keeps looking for scientific solutions that will work in practice.

Connor Shorten, episode II, on Vector Podcast with Dmitry Kan

When we meet on a podcast (hopefully this becomes a tradition), the conversation branches in so many ways we both can barely keep up. This is a rare occasion for me personally to brainstorm during an interview (his for my podcast or mine for his) and seek joy by serendipitous findings we make.

In this episode we have covered lots of topics in Large Language Models, ChatGPT for search and recommenders. You will find show notes packed with lots of papers and search presentations from sparse (BM25, TFIDF) to dense worlds. Give it a listen and let me know what you think!

Topics:

00:00 Intro

01:54 Things Connor learnt in the past year that changed his perception of Vector Search

02:42 Is search becoming conversational?

05:46 Connor asks Dmitry: How Large Language Models will change Search?

08:39 Vector Search Pyramid

09:53 Large models, data, Form vs Meaning and octopus underneath the ocean

13:25 Examples of getting help from ChatGPT and how it compares to web search today

18:32 Classical search engines with URLs for verification vs ChatGPT-style answers

20:15 Hybrid search: keywords + semantic retrieval

23:12 Connor asks Dmitry about his experience with sparse retrieval

28:08 SPLADE vectors

34:10 OOD-DiskANN: handling the out-of-distribution queries, and nuances of sparse vs dense indexing and search

39:54 Ways to debug a query case in dense retrieval (spoiler: it is a challenge!)

44:47 Intricacies of teaching ML models to understand your data and re-vectorization

49:23 Local IDF vs global IDF and how dense search can approach this issue

54:00 Realtime index

59:01 Natural language to SQL

1:04:47 Turning text into a causal DAG

1:10:41 Engineering and Research as two highly intelligent disciplines

1:18:34 Podcast search

1:25:24 Ref2Vec for recommender systems

1:29:48 Announcements

--

--

Dmitry Kan

Founder and host of Vector Podcast, tech team lead, software engineer, manager, but also: cat lover and cyclist. Host: https://www.youtube.com/c/VectorPodcast