ITHCWY Redesign

I've just launched a redesign of I Thought He Came With You. The main thrust is to make the site more usable on desktops. Which seems nuts, but the data doesn't lie. The site has low mobile traffic and for a while I thought this was some kind of technical issue. I optimized the design heavily for mobile and spent a lot of time on speed and some AMP. I guess it's the content. Google loves it when I write documentation for them and doesn't think I have anything useful to say on politics. They're probably right. So I've gone back to having an old school sidebar and I've taken the performance hit of using Bootstrap to get some better looking forms and navigation without spending a lot of time on it. I hope you enjoy it, and if you find anything broken please email or leave a comment.

Better related posts with word2vec (C#)

I have been experimenting with word2vec recently. Word2vec trains a neural network to guess which word is likely to appear given the context of the surrounding words. The result is a vector representation of each word in the trained vocabulary with some amazing properties (the canonical example is king - man + woman = queen). You can also find similar words by looking at cosine distance - words that are close in meaning have vectors that are close in orientation.

This sounds like it should work well for finding related posts. Spoiler alert: it does!

My old system listed posts with similar tags. This worked reasonably well, but it depended on me remembering to add enough tags to each post and a lot of the time it really just listed a few recent posts that were loosely related. The new system (live now) does a much better job which should be helpful to visitors and is likely to help with SEO as well.

I don't have a full implementation to share as it's reasonably tightly coupled to my custom CMS but here is a code snippet which should be enough to get this up and running anywhere:

The first step is getting a vector representation of a post. Word2vec just gives you a vector for a word (or short phrase depending on how the model is trained). A related technology, doc2vec, adds the document to the vector. This could be useful but isn't really what I needed here (i.e. I could solve my forgetfulness around adding tags by training a model to suggest them for me - might be a good project for another day). I ended up using a pre-trained model and then averaging together the vectors for each word. This paper (PDF) suggests that this isn't too crazy.

For the model I used word2vec-slim which condenses the Google News model down from 3 million words to 300k. This is because my blog runs on a very modest EC2 instance and a multi-gigabyte model might kill it. I load the model into Word2vec.Tools (available via NuGet) and then just get the word vectors (GetRepresentationFor(...).NumericVector) and average them together.

I haven't included code to build the word list but I just took every word from the post, title, meta description and tag list, removed stop words (the, and, etc) and converted to lower case.

Now that each post has a vector representation it's easy to compute the most related posts. For a given post compute the cosine distance between the post vector and every other post. Sort the list in ascending order and pick however many you want from the top (the distance between the post and itself would be 1, a totally unrelated post would be 0). The last line in the code sample shows this comparison for one post pair using Accord.Math, also on Nuget.

I'm really happy with the results. This was a fast implementation and a huge improvement over tag based related posts.

Privacy Policy Update and Comment Notifications

The ITHCWY privacy policy has been updated to reflect changes in the blog comment system. Previously email addresses submitted with comments were only used to display a Gravatar. Starting today they will also be used for notifications and newsletter signup.

The first notification is when a comment is approved. You'll always be notified in this case if you enter an email address.

When you leave a comment you can opt in to receiving notifications when another comment is added to the same post.

Finally, you can also subscribe to the monthly newsletter when leaving a comment.

Subscribe via Messenger

Updated on Wednesday, January 15, 2020

Thanks to revoice.me you can now subscribe to I Thought He Came With You on Facebook Messenger, Telegram, Slack and/or Chrome Notifications. To sign up visit ITHCWY on revoice.me.

Host change

Updated on Wednesday, June 28, 2017

I'm switching hosts so there will be various DNS changes and some downtime today.

Excessive Book Reviews

Updated on Sunday, May 3, 2020

You know how you're debugging and comment out that return statement that stops book reviews from being posted more than once a month so you can get to the bottom of a problem without constantly deleting posts? And then you get distracted and push a new version of the blog software with that return statement still commented out? Thankfully that task is only scheduled to run every four hours. Sorry.

Get ITHCWY By Email

I'm over social media - the Facebook page for this blog is a hopeless way to reach people and I removed the slow horrible sharing widgets a while ago. But I have this nagging suspicion that RSS is a super-niche activity for techno-libertarians harking back to the good old days of the Internet with open protocols and wall-free gardens and isn't entirely up to snuff either. So I'm going to experiment for a monthly email list for people who vaguely follow the blog or use Catfood Software products but don't quite manage to come back here every day to check for updates. Sign up here.

Why? Excellent question. The rules for blogs are to pick a narrow topic of interest, know your audience and do keyword research and drop SEO honeypot bombs to draw that audience in. I did that for Catfood Software but this isn't that kind of blog. It's a random collection of my hobbies and interests. So if you're not sure read through the Featured section in the side bar to get a preview.

I write a lot of code so what you'll get for sure is updates from Catfood Software and other occasional side projects. When I struggle with the process or discover something I write about that as well - these posts are more interesting to other developers and less exciting if you just want your desktop wallpaper (or Android phone) to look awesome. I love to make videos that don't have me in as well, mainly complicated time-lapses so you'll find a lot of those too. Also hikes in and around the San Francisco Bay Area. Occasionally politics.

If that works for you and you're not an RSS type then please join and let me know how I'm doing.

Reviews and Links for March 2012

Updated on Friday, May 22, 2020

No books this month.

Links

RT @drclue: "drclue: #pearlhunt making progress... http://t.co/FAgLQ2UH" --http://www.twitter.com/drclue/status/185829093244280832

We won #pearlhunt and all we won was this... http://t.co/vUBgLWeK

Bald eagle, fox, and cat are porch friends - Boing Boing http://t.co/5WGciNLD via @BoingBoing

ITHCWY: Agua: Little known fact, geologists would tell you that Bernal Hill is made of chert, actually it's mostly… http://t.co/xMRm2J9n

ITHCWY: Mangler: I don't know what the machine attached to our office does but it's giving me nightmares. http://t.co/BSbvoshq

ITHCWY: It was where he left it: Not to bang on about the BBC and their horrible headlines but 'lost' is a bit… http://t.co/bDKjUbl4

ITHCWY: Executive Clubbing: I used to really love British Airways. I even got over their silly new livery and… http://t.co/NC2Bt9bM

ITHCWY: Sand Ladder at Fort Funston http://t.co/5aeQjoti

ITHCWY: SFO http://t.co/sB1QdXCt

BBC News - The Spanish link in cracking the Enigma code http://t.co/Qv6qYqqQ #fb

ITHCWY: Robot Ahead http://t.co/tn7mHTI8

ITHCWY: Goldilocks: Israel just banned models with a BMI under 18.5. That's not severely underweight, it's the… http://t.co/G5HCE5Ey

External impact report for @IDEX at http://t.co/BsMp8ADa

RT @CatfoodSoftware: Blog: Vernal (Spring) #Equinox 2012 in Catfood #Earth: Spring starts right now in the… http://t.co/xxnASTMp

Good weekend to skip Fort Funston: http://t.co/UE8blE4c

ITHCWY: Catfood: PdfScan 1.40: Catfood PdfScan 1.40 is a small bug fix release. PdfScan converts documents to PDFs… http://t.co/YXdMn6ux

RT @CatfoodSoftware: Blog: Catfood #PdfScan 1.40: I’ve just released Catfood PdfScan 1.40. This is a minor update… http://t.co/6bzjCdfi

Shamed... http://t.co/AlSwTzzY

ITHCWY: Three reasons the dream of a robot companion isn't over: David Lee reports from the Innorobo 2012… http://t.co/JndJZahn

ITHCWY: Fixing dropped wireless connection for Linksys E4200: I've been going quietly mad trying to fix a constant… http://t.co/VVZ2dl2m

Why is this firefighting robot familiar: http://t.co/rqXLfCdC vs. http://t.co/wCL3Hu2d US Navy, call Cybernetics #fb

RT @CatfoodSoftware: Blog: To Follow or Not To Follow: The Third Way: Mashable published an article by Christine… http://t.co/MEzWlryS

ITHCWY: Sweeney Ridge: Sweeney Ridge, starting from Skyline College and walking up to the Portola Expedition… http://t.co/g1HIms1F

ITHCWY: Upgrading to http://t.co/0gDd7HHJ 2.5: Today I upgraded this blog to the latest and greatest version of… http://t.co/HHDj7VdM

"not a threat to the penguins, we don't suspect" - http://t.co/oIrzQOEj - it wasn't a dream!

http://t.co/qRCS8Qhb (new #SF data portal) #todo @myEN

Reviews and Links for February 2012

Updated on Friday, May 22, 2020

The Snowman by Jo Nesbø

4/5

Very good, enjoying the entire Harry Hole series. Wishing for translations of the first two now!

 

The Devil's Star by Jo Nesbø

3/5

Slightly weaker than the others in the series I've read so far but still knocked it back quickly.

 

The Redbreast by Jo Nesbø

4/5

Best so far on my quest to read through Nesbo...

 

Nemesis by Jo Nesbø

4/5

On a Jo Nesbo binge...

 

The Leopard by Jo Nesbø

4/5

Compelling crime thriller, rather worryingly one of series featuring Harry Hole so I'm going to have to go back to the beginning and read all of them.

 

Links

Catfood.Shapefile 1.51: http://t.co/BKtkx9Zq (ESRI Shapefile Parser, fixed release binary issue).

4 of 5 stars to The Snowman by Jo Nesbø http://t.co/IrvdrDBf

Breaking Good: how to synthesize Pseudoephedrine (Sudafed) From N-Methylamphetamine (crystal meth): http://t.co/fviYaj5P

ITHCWY: Catfood.Shapefile 1.50: I've just released a small update to my C# Shapefile library on Codeplex. Catfood… http://t.co/lXoGoBsY

4 of 5 stars to The Redbreast by Jo Nesbø http://t.co/PqrOQnQL

Epic #Bernal Panorama: http://t.co/zVqZYosG - via @bernalwood

Neal Stephenson on getting big stuff done http://t.co/6PHS1VD1 #todo @myEN

Stop Colbert: http://t.co/kBtSC7NV via @NancyPelosi

Wolfram|Alpha Pro: http://t.co/G88eWq6Y #tools @myEN

A History of the Sky for One Year: http://t.co/UKMjosCK (very cool)

+1: A U.S. appeals court rules Prop. 8 unconstitutional: http://t.co/TZgdKU9k #fb

ITHCWY: Badge Driven Development: Microsoft has released Visual Studio Achievements, an extension that brings… http://t.co/5BOyNF03

ITHCWY: GGNRA Dog Management Plan Update: I love it when making some noise works. The NPS is pushing its dog… http://t.co/fzqaJWM2

Unicode Character 'PILE OF POO' (U+1F4A9): http://t.co/LkGffsvW

http://t.co/NA6TOdQk #todo @myEN

BBC News - Can the US Army embrace atheists? http://t.co/5ubkKT7r

Running an API at HUGE Scale - Webinar: http://t.co/tEnxdRBM #API

4 of 5 stars to The Leopard by Jo Nesbø http://t.co/tIIPs1M5

ITHCWY: Reviews and Links for January 2012: Damned by Chuck Palahniuk 3/5 Very much a vehicle for Palahniuk to rant… http://t.co/6kvApyf1

Reviews and links for April 2010

Updated on Friday, May 22, 2020

The Spire by Richard North Patterson

3/5

A good enough holiday read and nice to see Patterson return to a straight psychological thriller rather than the last few OpEds loosely wrapped with some plot.

 

Advanced .NET Debugging (Addison-Wesley Microsoft Technology Series) by Mario Hewardt

5/5

Comprehensive introduction to low level .NET debugging - when you need to fire up WinDbg to check out the state of the managed heap, or debug a crash dump from the field you'll find this book invaluable. I wish it had been available when I started figuring out how to use SOS.

 

The Complete Stories of J. G. Ballard by J.G. Ballard

5/5

Wonderful collection of all of Ballard's short stories. It's a huge book with surprisingly few duds. My favorites include The Illuminated Man, clearly the inspiration for The Crystal World, which includes meaning bombs like "It's almost as if a sequence of displaced but identical images were being produced by refraction through a prism, but with the element of time replacing the role of light." and The Ultimate City (which isn't using ultimate in the sense of being good...). I've read most of Ballard's novels but not many of the short stories before. They're well worth the time.

 

Links

- Microsoft Agrees With Apple And Google: “The Future Of The Web Is HTML5″ from TechCrunch (Which makes it all the more tragic that a huge number of clients will still be running IE6 :().

- Comedian criticises BBC 'rebuke' from BBC News | News Front Page | World Edition (The problem isn't that it was anti-Semitic, it's that it wasn't funny.).

- UK 'has a high early death rate' from BBC News | News Front Page | World Edition (That'll be the deep fried mars bars and chips.).

- Oklahoma, where women's rights are swept away from All Salon (Competing with AZ to be the most fucked up state? Sigh :().

- Cameras capture 'Highland tiger' from BBC News | News Front Page | World Edition (Tabbs was bigger than that (a house cat)).

- MI5 dumps staff lacking IT skills from BBC News | News Front Page | World Edition (MI5 has staff without computer skills?).

- The Internet Provides. from jwz (Disturbing).

- Who Really Spends The Most On Their Military? from Information Is Beautiful (Click through to the Guardian blog post, interesting reading.).