I Thought He Came With You is Robert Ellison’s blog about software, marketing, politics, photography and time lapse.

Pulling the plug on Facebook and Twitter, Tweet Archive

A year ago I uninstalled Facebook and Twitter from my phone in an effort to slim down my social media fake news diet. The idea was I'd occasionally check in from my laptop. Which I didn't. So this week I've finally taken the plunge and deleted both accounts. Or rather, deactivated, you have to wait 30 days before they actually delete anything. I also nuked Quora, because of the hack rather than any particular tendency to undermine the foundations of democracy.

This leaves me with a potential problem. As a person with a rapidly decreasing social media footprint I might be asked to host the Oscars. It would be nice to be tapped, but I really don't want to and so I've published a complete archive of all my tweets. I'm pretty sure some of them would be disqualifying. Whew.

Elephant Seals at San Simeon

Elephant Seals at San Simeon

Elephant Seals at San Simeon

Elephant Seals at San Simeon

Elephant Seals at San Simeon

Elephant Seals at San Simeon

Elephant Seals at San Simeon

Elephant Seals at San Simeon (Piedras Blancas elephant seal rookery).

Book reviews for November 2018

The Increment by David Ignatius

The Increment by David Ignatius

3/5

 

What Have You Done by Matthew Farrell

What Have You Done by Matthew Farrell

2/5

 

Salvation (Salvation Sequence #1) by Peter F. Hamilton

Salvation (Salvation Sequence #1) by Peter F. Hamilton

4/5

 

The Christmas Scorpion (Jack Reacher, #22.5) by Lee Child

The Christmas Scorpion (Jack Reacher, #22.5) by Lee Child

1/5

 

Less by Andrew Sean Greer

Less by Andrew Sean Greer

4/5

 

Grand Canyon

Grand Canyon

Hoover Dam (Fixed)

Hoover Dam (Fixed)

High Roller Timelapse

High Roller Ferris Wheel at The Linq in Las Vegas

Sunset timelapse of the 550 foot Ferris Wheel at The Linq in Las Vegas, Nevada (technically the High Roller Observation Wheel).

Book reviews for September 2018

Updated on Friday, October 5, 2018
The Strange Library by Haruki Murakami

The Strange Library by Haruki Murakami

3/5

 

Revenge by Yōko Ogawa

Revenge by Yōko Ogawa

4/5

 

Ball Lightning by Liu Cixin

Ball Lightning by Liu Cixin

4/5

 

Enlightenment Now: The Case for Reason, Science, Humanism, and Progress by Steven Pinker

Enlightenment Now: The Case for Reason, Science, Humanism, and Progress by Steven Pinker

4/5

 

Provenance by Ann Leckie

Provenance by Ann Leckie

4/5

 

Autumnal Equinox 2018

Autumnal Equinox 2018 in Catfood Earth

It's Autumn for the Northern Hemisphere and Spring south of the equator.

Rendered in Catfood Earth.

(PreviouslyPreviouslyPreviouslyPreviouslyPreviously)

Better related posts with word2vec (C#)

I have been experimenting with word2vec recently. Word2vec trains a neural network to guess which word is likely to appear given the context of the surrounding words. The result is a vector representation of each word in the trained vocabulary with some amazing properties (the canonical example is king - man + woman = queen). You can also find similar words by looking at cosine distance - words that are close in meaning have vectors that are close in orientation.

This sounds like it should work well for finding related posts. Spoiler alert: it does!

My old system listed posts with similar tags. This worked reasonably well, but it depended on me remembering to add enough tags to each post and a lot of the time it really just listed a few recent posts that were loosely related. The new system (live now) does a much better job which should be helpful to visitors and is likely to help with SEO as well.

I don't have a full implementation to share as it's reasonably tightly coupled to my custom CMS but here is a code snippet which should be enough to get this up and running anywhere:

The first step is getting a vector representation of a post. Word2vec just gives you a vector for a word (or short phrase depending on how the model is trained). A related technology, doc2vec, adds the document to the vector. This could be useful but isn't really what I needed here (i.e. I could solve my forgetfulness around adding tags by training a model to suggest them for me - might be a good project for another day). I ended up using a pre-trained model and then averaging together the vectors for each word. This paper (PDF) suggests that this isn't too crazy.

For the model I used word2vec-slim which condenses the Google News model down from 3 million words to 300k. This is because my blog runs on a very modest EC2 instance and a multi-gigabyte model might kill it. I load the model into Word2vec.Tools (available via NuGet) and then just get the word vectors (GetRepresentationFor(...).NumericVector) and average them together.

I haven't included code to build the word list but I just took every word from the post, title, meta description and tag list, removed stop words (the, and, etc) and converted to lower case.

Now that each post has a vector representation it's easy to compute the most related posts. For a given post compute the cosine distance between the post vector and every other post. Sort the list in ascending order and pick however many you want from the top (the distance between the post and itself would be 1, a totally unrelated post would be 0). The last line in the code sample shows this comparison for one post pair using Accord.Math, also on Nuget.

I'm really happy with the results. This was a fast implementation and a huge improvement over tag based related posts.

Lassen Star Trails

Star trails from the Manzanita Lake campground at Lassen Volcanic National Park

A timelapse two ways shot from the Manzanita Lake campground at Lassen Volcanic National Park (the second time I've visited and the second time that Bumpass Hell has been closed). First a regular 4K timelapse looking up from the campsite:

The second version is the same footage in HD where each frame is the cumulative maximum pixel value of all the frames up to the current frame (so it builds in star trails as the video runs):