Links for April 2021

Updated on Thursday, April 22, 2021

Links for March 2021

ITHCWY Redesign

Updated on Sunday, November 6, 2022

I've just launched a redesign of I Thought He Came With You. The main thrust is to make the site more usable on desktops. Which seems nuts, but the data doesn't lie. The site has low mobile traffic and for a while I thought this was some kind of technical issue. I optimized the design heavily for mobile and spent a lot of time on speed and some AMP. I guess it's the content. Google loves it when I write documentation for them and doesn't think I have anything useful to say on politics. They're probably right. So I've gone back to having an old school sidebar and I've taken the performance hit of using Bootstrap to get some better looking forms and navigation without spending a lot of time on it. I hope you enjoy it, and if you find anything broken please email or leave a comment.

Add your comment...

Related Posts

(All Etc Posts)

Better related posts with word2vec (C#)

Updated on Tuesday, February 13, 2024

I have been experimenting with word2vec recently. Word2vec trains a neural network to guess which word is likely to appear given the context of the surrounding words. The result is a vector representation of each word in the trained vocabulary with some amazing properties (the canonical example is king - man + woman = queen). You can also find similar words by looking at cosine distance - words that are close in meaning have vectors that are close in orientation.

This sounds like it should work well for finding related posts. Spoiler alert: it does!

My old system listed posts with similar tags. This worked reasonably well, but it depended on me remembering to add enough tags to each post and a lot of the time it really just listed a few recent posts that were loosely related. The new system (live now) does a much better job which should be helpful to visitors and is likely to help with SEO as well.

I don't have a full implementation to share as it's reasonably tightly coupled to my custom CMS but here is a code snippet which should be enough to get this up and running anywhere:

The first step is getting a vector representation of a post. Word2vec just gives you a vector for a word (or short phrase depending on how the model is trained). A related technology, doc2vec, adds the document to the vector. This could be useful but isn't really what I needed here (i.e. I could solve my forgetfulness around adding tags by training a model to suggest them for me - might be a good project for another day). I ended up using a pre-trained model and then averaging together the vectors for each word. This paper (PDF) suggests that this isn't too crazy.

For the model I used word2vec-slim which condenses the Google News model down from 3 million words to 300k. This is because my blog runs on a very modest EC2 instance and a multi-gigabyte model might kill it. I load the model into Word2vec.Tools (available via NuGet) and then just get the word vectors (GetRepresentationFor(...).NumericVector) and average them together.

I haven't included code to build the word list but I just took every word from the post, title, meta description and tag list, removed stop words (the, and, etc) and converted to lower case.

Now that each post has a vector representation it's easy to compute the most related posts. For a given post compute the cosine distance between the post vector and every other post. Sort the list in ascending order and pick however many you want from the top (the distance between the post and itself would be 1, a totally unrelated post would be 0). The last line in the code sample shows this comparison for one post pair using Accord.Math, also on Nuget.

I'm really happy with the results. This was a fast implementation and a huge improvement over tag based related posts.

Updated 2023-01-29 22:00:

I have recently moved this functionality to use an OpenAI embeddings API integration.

Add your comment...

Related Posts

(All Code Posts)

(Published to the Fediverse as: Better related posts with word2vec (C#) #code #software #word2vec #ithcwy #ml How to use word2vec to create a vector representation of a blog post and then use the cosine distance between posts to select improved related posts. )

Privacy Policy Update and Comment Notifications

Updated on Sunday, November 6, 2022

The ITHCWY privacy policy has been updated to reflect changes in the blog comment system. Previously email addresses submitted with comments were only used to display a Gravatar. Starting today they will also be used for notifications and newsletter signup.

The first notification is when a comment is approved. You'll always be notified in this case if you enter an email address.

When you leave a comment you can opt in to receiving notifications when another comment is added to the same post.

Finally, you can also subscribe to the monthly newsletter when leaving a comment.

Add your comment...

Related Posts

(All Etc Posts)

Subscribe via Messenger

Updated on Sunday, November 6, 2022

Thanks to revoice.me you can now subscribe to I Thought He Came With You on Facebook Messenger, Telegram, Slack and/or Chrome Notifications. To sign up visit ITHCWY on revoice.me.

Add your comment...

Related Posts

(All Etc Posts)

Host change

Updated on Sunday, November 6, 2022

I'm switching hosts so there will be various DNS changes and some downtime today.

Add your comment...

Related Posts

(All Etc Posts)

Excessive Book Reviews

Updated on Sunday, May 3, 2020

You know how you're debugging and comment out that return statement that stops book reviews from being posted more than once a month so you can get to the bottom of a problem without constantly deleting posts? And then you get distracted and push a new version of the blog software with that return statement still commented out? Thankfully that task is only scheduled to run every four hours. Sorry.

Add your comment...

Related Posts

(All Reviews)

(Published to the Fediverse as: Excessive Book Reviews #reviews #ithcwy Excessive Book Reviews )

Get ITHCWY By Email

Updated on Sunday, November 6, 2022

I'm over social media - the Facebook page for this blog is a hopeless way to reach people and I removed the slow horrible sharing widgets a while ago. But I have this nagging suspicion that RSS is a super-niche activity for techno-libertarians harking back to the good old days of the Internet with open protocols and wall-free gardens and isn't entirely up to snuff either. So I'm going to experiment for a monthly email list for people who vaguely follow the blog or use Catfood Software products but don't quite manage to come back here every day to check for updates. Sign up here.

Why? Excellent question. The rules for blogs are to pick a narrow topic of interest, know your audience and do keyword research and drop SEO honeypot bombs to draw that audience in. I did that for Catfood Software but this isn't that kind of blog. It's a random collection of my hobbies and interests. So if you're not sure read through the Featured section in the side bar to get a preview.

I write a lot of code so what you'll get for sure is updates from Catfood Software and other occasional side projects. When I struggle with the process or discover something I write about that as well - these posts are more interesting to other developers and less exciting if you just want your desktop wallpaper (or Android phone) to look awesome. I love to make videos that don't have me in as well, mainly complicated time-lapses so you'll find a lot of those too. Also hikes in and around the San Francisco Bay Area. Occasionally politics.

If that works for you and you're not an RSS type then please join and let me know how I'm doing.

Add your comment...

Related Posts

(All Marketing Posts)

Correlation is not causation but...

Updated on Wednesday, February 22, 2017

Blog post length vs number of children.

I have crunched the numbers and calculated the average length of blog posts on I Thought He Came With You vs. how many children I had at the time the post was written.

Add your comment...

Related Posts

(All Etc Posts)

(Published to the Fediverse as: Correlation is not causation but... #etc #blog #ithcwy Statistically insignificant study of blog post length vs. number of children here on I Thought He Came With You. )