Neural Search with BERT and Solr

Dmitry Kan
10 min readAug 18, 2020

It is exciting to read the latest research advances in the computational linguistics. In particular, the better language models we build, the more accurate downstream NLP systems we can design.

Update: if you are looking to run neural search with latest Solr versions(starting version 8.x), I have just published a new blog where I walk you through low-level implementation of vector format and search, and the story of upgrading from 6.x to 8.x: https://medium.com/@dmitry.kan/fun-with-apache-lucene-and-bert-embeddings-c2c496baa559

Bert in Solr hat

Having background in production systems I have a strong conviction, that it is important to deploy latest theoretical achievements into real life systems. This allows you to:

  • see NLP in action in practical systems
  • identify possible shortcomings and continue research and experimentation in the promising directions
  • iterate to achieve better performance: quality, speed and other important parameters, like memory consumption

For this story I’ve chosen to deploy BERT — language model by Google — into Apache Solr — production grade search engine — to implement neural search. Traditionally the out of the box search engines are using some sort of TF-IDF — and lately BM25 — based ranking of found documents. TF-IDF for instance is based on…

--

--

Dmitry Kan

Founder and host of Vector Podcast, tech team lead, software engineer, manager, but also: cat lover and cyclist. Host: https://www.youtube.com/c/VectorPodcast