Upgrading from word2vec to OpenAI


In 2018 I upgraded the related posts functionality on this blog to use word2vec. This was hacked together by averaging the vectors for interesting words in each post together and then looking for the closest vectors. It worked quite well, but the state of the art has moved on just a little bit since then.

OpenAI has an embeddings API and recently released a cheaper model called text-embedding-ada-002. The vectors have 1,536 dimensions, a pretty significant increase from the 300 I was using with word2vec. Creating vectors for all my posts took a few minutes and cost $0.11 which is pretty affordable. As you'd expect those related posts are now significantly more related and useful. Thanks OpenAI!

I shared some code previously for the word2vec hack. This is a lot more straightforward - call the API with the post text and then compare the vectors with cosine distance to find the most related. It works well for search too.

Add your comment...

Related Posts

(All Code Posts)

(Published to the Fediverse as: Upgrading from word2vec to OpenAI #code #ml #openai #ithcwy #word2vec Using the Open AI embeddings API to find better related posts for a blog. )

Add Comment

All comments are moderated. Your email address is used to display a Gravatar and optionally for notification of new comments and to sign up for the newsletter.