Vector Podcast with Sid Probstein: Search in siloed data with SWIRL

Dmitry Kan
2 min readJul 23, 2023

Change of format: in this episode you will find a demo of SWIRL!

Vector Podcast with Sid Probstein on SWIRL

Imagine you have an existing search engine and would like to extend it with a new data source. Something that most of us would do is to get this data – by downloading, scraping or some other way (like SFTP push) – and add it to our existing index. But this process can take months in real production settings. Because you need to download and store the data, handle user-level entitlements (if it is not public for all), remodel it to your data representation (have mandatory fields, like title, body, publication date, type etc), index it and update your search logic (for example check that ranking formula still makes sense). And on top of this goes handling periodic data updates – so estimate of months is probably conservative.

In certain other situations this is not even possible. Financial reports that each cost thousands of dollars might not be released to you. Instead, your client will have their own search engine over these documents.

So how would you solve this?

The answer is a federated or metasearch engine:

Credit: https://en.wikipedia.org/wiki/Metasearch_engine

SWIRL implements such a metasearch engine, that allows you to unlock all the silos within your organization and spice it up with ChatGPT responses.

There is another daunting issue with federated search: ranking the documents. We have discussed this and many other topics in the episode with Sid. For the first time, the episode offers you a real system demo. Take a look or give a listen!

The episode on other platforms:

Spotify:

Apple Podcasts:

Google Podcasts:

RSS:

https://rss.com/podcasts/vector-podcast/1047952/

--

--

Dmitry Kan

Founder and host of Vector Podcast, tech team lead, software engineer, manager, but also: cat lover and cyclist. Host: https://www.youtube.com/c/VectorPodcast