In 2018 I upgraded the related posts functionality on this blog to use word2vec. This was hacked together by averaging the vectors for interesting words in each post together and then looking for the closest vectors. It worked quite well, but the state of the art has moved on just a little bit since then.
OpenAI has an embeddings API and recently released a cheaper model called text-embedding-ada-002. The vectors have 1,536 dimensions, a pretty significant increase from the 300 I was using with word2vec. Creating vectors for all my posts took a few minutes and cost $0.11 which is pretty affordable. As you'd expect those related posts are now significantly more related and useful. Thanks OpenAI!
I shared some code previously for the word2vec hack. This is a lot more straightforward - call the API with the post text and then compare the vectors with cosine distance to find the most related. It works well for search too.
- Better related posts with word2vec (C#)
- OpenAGI, or why we shouldn't trust Open AI to protect us from the Singularity
- Shipping a website in a day with Generative AI