Getting to know your data with metric learning
I hope your Spring (or Autumn, if you happen to be in the Southern Hemisphere) is going well so far. In this new episode of Vector Podcast I sat down with Yusuf Sarıgöz, AI Research Engineer at Qdrant — vector database, one of 7 as discussed in the related blog post.
We discussed metric learning — this technique can be used in data scarce scenarios when you need to build a production-grade model, for instance a classifier.
You might have a small amount of labelled data, but if you manage to present the metric learning model positive and negative samples, it will learn optimal embeddings for your end-user model.
To implement metric learning in the form of a neural network, we can use AutoEncoder architecture:
The input sample X with dimension K is passed through the Encoder, which computes an embedding out of it with dimension N << K. The Decoder will reconstruct the input sample X’ from the embedding. The network optimizes for the objective D(X, X’)->0, that is the distance between the input sample and its reconstructed form should be as close to 0 as possible.
Once the optimum is reached across all input samples, we can dispose the Decoder part of our network and utilize the Encoder for computing the embeddings on our data. Once you have the embeddings, you can store them in a vector database for optimal retrieval during classification.
Watch the episode on YouTube:
You can also listen to it on the usual Spotify:
Apple Podcasts:
I’ve also received a request to publish the RSS feed of the podcast episodes, so that you can plug in into your favorite podcast software. Here it is:
https://media.rss.com/vector-podcast/feed.xml
You will find lots of links to papers, blogs and tools around this topic of metric learning to optimize your learning process. Good luck and remember to subscribe to the podcast to get new episodes in your stream.