Member-only story

Neural Search with BERT and Solr

Dmitry Kan
10 min readAug 18, 2020

--

It is exciting to read the latest research advances in the computational linguistics. In particular, the better language models we build, the more accurate downstream NLP systems we can design.

Update: if you are looking to run neural search with latest Solr versions(starting version 8.x), I have just published a new blog where I walk you through low-level implementation of vector format and search, and the story of upgrading from 6.x to 8.x: https://medium.com/@dmitry.kan/fun-with-apache-lucene-and-bert-embeddings-c2c496baa559

Bert in Solr hat

Having background in production systems I have a strong conviction, that it is important to deploy latest theoretical achievements into real life systems. This allows you to:

  • see NLP in action in practical systems
  • identify possible shortcomings and continue research and experimentation in the promising directions
  • iterate to achieve better performance: quality, speed and other important parameters, like memory consumption

For this story I’ve chosen to deploy BERT — language model by Google — into Apache Solr — production grade search engine — to implement neural search. Traditionally the out of the box search engines are using some sort of TF-IDF — and lately BM25 — based ranking of found documents. TF-IDF for instance is based on…

--

--

Dmitry Kan
Dmitry Kan

Written by Dmitry Kan

Founder and host of Vector Podcast, software engineer, product manager, but also: cat lover and cyclist. Host: https://www.youtube.com/c/VectorPodcast

Responses (2)