Links for April 2021

Updated on Thursday, April 22, 2021
The New York Times: How Brexit Ruined Easter for Britain’s Chocolate Makers

Interesting that there also seems to be a shortage on British supermarket shelves as well then.

--

Easter egg hunt: UK shoppers disappointed by shortages | Easter | The Guardian

Here's the Guardian on the shortage. So British chocolate can't be found in Europe or the UK. Where is it going? Eezy Freezy!

--

The story behind Kidlapse

Latest update on Kidlapse.

--

Search Engine Roundtable: Google: User Generate Content Products Reviews Will Have A Hard Time Ranking Well

And instead we get affiliate link stuffed fluff pieces?

--

The Washington Post: We should soon stop catering to the vaccine holdouts

Yes

Links for March 2021

Solar System Cartogram

Might need revisiting when we get a submarine into Enceladus.

ITHCWY Redesign

I've just launched a redesign of I Thought He Came With You. The main thrust is to make the site more usable on desktops. Which seems nuts, but the data doesn't lie. The site has low mobile traffic and for a while I thought this was some kind of technical issue. I optimized the design heavily for mobile and spent a lot of time on speed and some AMP. I guess it's the content. Google loves it when I write documentation for them and doesn't think I have anything useful to say on politics. They're probably right. So I've gone back to having an old school sidebar and I've taken the performance hit of using Bootstrap to get some better looking forms and navigation without spending a lot of time on it. I hope you enjoy it, and if you find anything broken please email or leave a comment.

Better related posts with word2vec (C#)

I have been experimenting with word2vec recently. Word2vec trains a neural network to guess which word is likely to appear given the context of the surrounding words. The result is a vector representation of each word in the trained vocabulary with some amazing properties (the canonical example is king - man + woman = queen). You can also find similar words by looking at cosine distance - words that are close in meaning have vectors that are close in orientation.

This sounds like it should work well for finding related posts. Spoiler alert: it does!

My old system listed posts with similar tags. This worked reasonably well, but it depended on me remembering to add enough tags to each post and a lot of the time it really just listed a few recent posts that were loosely related. The new system (live now) does a much better job which should be helpful to visitors and is likely to help with SEO as well.

I don't have a full implementation to share as it's reasonably tightly coupled to my custom CMS but here is a code snippet which should be enough to get this up and running anywhere:

The first step is getting a vector representation of a post. Word2vec just gives you a vector for a word (or short phrase depending on how the model is trained). A related technology, doc2vec, adds the document to the vector. This could be useful but isn't really what I needed here (i.e. I could solve my forgetfulness around adding tags by training a model to suggest them for me - might be a good project for another day). I ended up using a pre-trained model and then averaging together the vectors for each word. This paper (PDF) suggests that this isn't too crazy.

For the model I used word2vec-slim which condenses the Google News model down from 3 million words to 300k. This is because my blog runs on a very modest EC2 instance and a multi-gigabyte model might kill it. I load the model into Word2vec.Tools (available via NuGet) and then just get the word vectors (GetRepresentationFor(...).NumericVector) and average them together.

I haven't included code to build the word list but I just took every word from the post, title, meta description and tag list, removed stop words (the, and, etc) and converted to lower case.

Now that each post has a vector representation it's easy to compute the most related posts. For a given post compute the cosine distance between the post vector and every other post. Sort the list in ascending order and pick however many you want from the top (the distance between the post and itself would be 1, a totally unrelated post would be 0). The last line in the code sample shows this comparison for one post pair using Accord.Math, also on Nuget.

I'm really happy with the results. This was a fast implementation and a huge improvement over tag based related posts.

Privacy Policy Update and Comment Notifications

The ITHCWY privacy policy has been updated to reflect changes in the blog comment system. Previously email addresses submitted with comments were only used to display a Gravatar. Starting today they will also be used for notifications and newsletter signup.

The first notification is when a comment is approved. You'll always be notified in this case if you enter an email address.

When you leave a comment you can opt in to receiving notifications when another comment is added to the same post.

Finally, you can also subscribe to the monthly newsletter when leaving a comment.

Subscribe via Messenger

Updated on Wednesday, January 15, 2020

Thanks to revoice.me you can now subscribe to I Thought He Came With You on Facebook Messenger, Telegram, Slack and/or Chrome Notifications. To sign up visit ITHCWY on revoice.me.

Host change

Updated on Wednesday, June 28, 2017

I'm switching hosts so there will be various DNS changes and some downtime today.

Excessive Book Reviews

Updated on Sunday, May 3, 2020

You know how you're debugging and comment out that return statement that stops book reviews from being posted more than once a month so you can get to the bottom of a problem without constantly deleting posts? And then you get distracted and push a new version of the blog software with that return statement still commented out? Thankfully that task is only scheduled to run every four hours. Sorry.

Get ITHCWY By Email

I'm over social media - the Facebook page for this blog is a hopeless way to reach people and I removed the slow horrible sharing widgets a while ago. But I have this nagging suspicion that RSS is a super-niche activity for techno-libertarians harking back to the good old days of the Internet with open protocols and wall-free gardens and isn't entirely up to snuff either. So I'm going to experiment for a monthly email list for people who vaguely follow the blog or use Catfood Software products but don't quite manage to come back here every day to check for updates. Sign up here.

Why? Excellent question. The rules for blogs are to pick a narrow topic of interest, know your audience and do keyword research and drop SEO honeypot bombs to draw that audience in. I did that for Catfood Software but this isn't that kind of blog. It's a random collection of my hobbies and interests. So if you're not sure read through the Featured section in the side bar to get a preview.

I write a lot of code so what you'll get for sure is updates from Catfood Software and other occasional side projects. When I struggle with the process or discover something I write about that as well - these posts are more interesting to other developers and less exciting if you just want your desktop wallpaper (or Android phone) to look awesome. I love to make videos that don't have me in as well, mainly complicated time-lapses so you'll find a lot of those too. Also hikes in and around the San Francisco Bay Area. Occasionally politics.

If that works for you and you're not an RSS type then please join and let me know how I'm doing.

Reviews and Links for March 2012

Updated on Friday, May 22, 2020

No books this month.

Links

RT @drclue: "drclue: #pearlhunt making progress... http://t.co/FAgLQ2UH" --http://www.twitter.com/drclue/status/185829093244280832

We won #pearlhunt and all we won was this... http://t.co/vUBgLWeK

Bald eagle, fox, and cat are porch friends - Boing Boing http://t.co/5WGciNLD via @BoingBoing

ITHCWY: Agua: Little known fact, geologists would tell you that Bernal Hill is made of chert, actually it's mostly… http://t.co/xMRm2J9n

ITHCWY: Mangler: I don't know what the machine attached to our office does but it's giving me nightmares. http://t.co/BSbvoshq

ITHCWY: It was where he left it: Not to bang on about the BBC and their horrible headlines but 'lost' is a bit… http://t.co/bDKjUbl4

ITHCWY: Executive Clubbing: I used to really love British Airways. I even got over their silly new livery and… http://t.co/NC2Bt9bM

ITHCWY: Sand Ladder at Fort Funston http://t.co/5aeQjoti

ITHCWY: SFO http://t.co/sB1QdXCt

BBC News - The Spanish link in cracking the Enigma code http://t.co/Qv6qYqqQ #fb

ITHCWY: Robot Ahead http://t.co/tn7mHTI8

ITHCWY: Goldilocks: Israel just banned models with a BMI under 18.5. That's not severely underweight, it's the… http://t.co/G5HCE5Ey

External impact report for @IDEX at http://t.co/BsMp8ADa

RT @CatfoodSoftware: Blog: Vernal (Spring) #Equinox 2012 in Catfood #Earth: Spring starts right now in the… http://t.co/xxnASTMp

Good weekend to skip Fort Funston: http://t.co/UE8blE4c

ITHCWY: Catfood: PdfScan 1.40: Catfood PdfScan 1.40 is a small bug fix release. PdfScan converts documents to PDFs… http://t.co/YXdMn6ux

RT @CatfoodSoftware: Blog: Catfood #PdfScan 1.40: I’ve just released Catfood PdfScan 1.40. This is a minor update… http://t.co/6bzjCdfi

Shamed... http://t.co/AlSwTzzY

ITHCWY: Three reasons the dream of a robot companion isn't over: David Lee reports from the Innorobo 2012… http://t.co/JndJZahn

ITHCWY: Fixing dropped wireless connection for Linksys E4200: I've been going quietly mad trying to fix a constant… http://t.co/VVZ2dl2m

Why is this firefighting robot familiar: http://t.co/rqXLfCdC vs. http://t.co/wCL3Hu2d US Navy, call Cybernetics #fb

RT @CatfoodSoftware: Blog: To Follow or Not To Follow: The Third Way: Mashable published an article by Christine… http://t.co/MEzWlryS

ITHCWY: Sweeney Ridge: Sweeney Ridge, starting from Skyline College and walking up to the Portola Expedition… http://t.co/g1HIms1F

ITHCWY: Upgrading to http://t.co/0gDd7HHJ 2.5: Today I upgraded this blog to the latest and greatest version of… http://t.co/HHDj7VdM

"not a threat to the penguins, we don't suspect" - http://t.co/oIrzQOEj - it wasn't a dream!

http://t.co/qRCS8Qhb (new #SF data portal) #todo @myEN