I Thought He Came With You is Robert Ellison’s blog about software, marketing, politics, photography and time lapse.

Better related posts with word2vec (C#)

I have been experimenting with word2vec recently. Word2vec trains a neural network to guess which word is likely to appear given the context of the surrounding words. The result is a vector representation of each word in the trained vocabulary with some amazing properties (the canonical example is king - man + woman = queen). You can also find similar words by looking at cosine distance - words that are close in meaning have vectors that are close in orientation.

This sounds like it should work well for finding related posts. Spoiler alert: it does!

My old system listed posts with similar tags. This worked reasonably well, but it depended on me remembering to add enough tags to each post and a lot of the time it really just listed a few recent posts that were loosely related. The new system (live now) does a much better job which should be helpful to visitors and is likely to help with SEO as well.

I don't have a full implementation to share as it's reasonably tightly coupled to my custom CMS but here is a code snippet which should be enough to get this up and running anywhere:

The first step is getting a vector representation of a post. Word2vec just gives you a vector for a word (or short phrase depending on how the model is trained). A related technology, doc2vec, adds the document to the vector. This could be useful but isn't really what I needed here (i.e. I could solve my forgetfulness around adding tags by training a model to suggest them for me - might be a good project for another day). I ended up using a pre-trained model and then averaging together the vectors for each word. This paper (PDF) suggests that this isn't too crazy.

For the model I used word2vec-slim which condenses the Google News model down from 3 million words to 300k. This is because my blog runs on a very modest EC2 instance and a multi-gigabyte model might kill it. I load the model into Word2vec.Tools (available via NuGet) and then just get the word vectors (GetRepresentationFor(...).NumericVector) and average them together.

I haven't included code to build the word list but I just took every word from the post, title, meta description and tag list, removed stop words (the, and, etc) and converted to lower case.

Now that each post has a vector representation it's easy to compute the most related posts. For a given post compute the cosine distance between the post vector and every other post. Sort the list in ascending order and pick however many you want from the top (the distance between the post and itself would be 1, a totally unrelated post would be 0). The last line in the code sample shows this comparison for one post pair using Accord.Math, also on Nuget.

I'm really happy with the results. This was a fast implementation and a huge improvement over tag based related posts.

Reading and Writing Office 365 Excel from a Console app using the Microsoft.Graph C# Client API

Updated on Sunday, September 30, 2018

I needed a console app that reads some inputs from an online Excel workbook, does some processing and then writes back the results to a different worksheet. Because I enjoy pain I decided to use the thinly documented new Microsoft.Graph client library. The sample code below assumes that you have a work or education Office 365 subscription.

Paste the code into a new console project and then follow the instructions at the top to add the necessary NuGet packages. You'll also need to register an application at https://portal.azure.com/. You want a Native application and you'll need the Application ID and the redirect URL (just make up some non-routable URL for this). Under Required Permissions for the app you should add read and write files delegated permissions for the Microsoft Graph API.

Hope this saves you a few hours. Comment below if you need a more detailed explanation for any of the above.

Shapefile Update

A few people have asked for 3D shape support in my ESRI Shapefile library. I've never got around to it, but CodePlex user ekleiman has forked a version in his ESRI Shapefile to Image Convertor that supports PointZ, PolygonZ and PolyLineZ shapes. If that's what you need please check it out.

Crushing PNGs in .NET

Updated on Sunday, September 30, 2018

Crushing PNGs in .NET

I'm working on page speed and Google PageSpeed Insights is telling me that my PNGs are just way too large. Sadly .NET does not provide any way to optimize PNG images so there is no easy fix - just unmanaged libraries and command line tools.

I have an allergy to manual processes so I've lashed up some code to automatically find and optimize PNGs in my App_Data folder using PNGCRUSH. I can call CrushAllImages() to fix up everything or CrushImage() when I need to fix up a specific PNG. Code below:

Minify and inline CSS for ASP.NET MVC

Updated on Sunday, September 30, 2018

ASP.NET has a CssMinify class (and a JavaScript variant as well) designed for use in the bundling pipeline. But what if you want to have your CSS minified and inline? Here is an action that is working for me (rendered into a style tag on my _Layout.cshtml using @Html.Action("InlineCss", "Home")).

Note that I'm using this to inline CSS for this blog. The pages are cached so I'm not worried about how well this action performs. My blog is also basically all landing pages so I'm also not worried about caching a non-inline version for later use, I just drop all the CSS on every page.

Personal Finger Daemon for Windows

Updated on Sunday, September 30, 2018

Did you know that Windows still has a vestigial finger command with just about nothing left to talk to? One of my New Year's resolutions is to bring finger back and unlike the stalled webfinger project I need to make some progress. Here's some C# to run your own personal finger daemon... you just need to create a .plan file in your home directory (haven't done that for a while):

Fix search on enter problem in BlogEngine.NET

Updated on Thursday, November 12, 2015

Search on enter has been broken for a while in BlogEngine.NET (I'm running the latest version). Finally got a chance to look at this today and there is a simple patch to the JavaScript to fix it. See the issue I just filed on CodePlex for details.

How to get SEO credit for Facebook Comments (the missing manual)

Updated on Sunday, September 30, 2018

How to get SEO credit for Facebook Comments (the missing manual)

I've been using the Facebook Comments Box on this blog since I parted ways with Disqus. One issue with the Facebook system is that you won't get SEO credit for comments displayed in an iframe. They have an API to retrieve comments but the documentation is pretty light and so here are three critical tips to get it working.

The first thing to know is that comments can be nested. Once you've got a list of comments to enumerate through you need to check each comment to see if it has it's own list of comments and so on. This is pretty easy to handle.

The second thing is that the first page of JSON returned from the API is totally different from the other pages. This is crazy and can bite you if you don't test it thoroughly. For https://developers.facebook.com/docs/reference/plugins/comments/ the first page is https://graph.facebook.com/comments/?ids=https://developers.facebook.com/docs/reference/plugins/comments/. The second page is embedded at the bottom of the first page and is currently https://graph.facebook.com/10150360250580608/comments?limit=25&offset=25&__after_id=10150360250580608_28167854 (if that link is broken check the first page for a new one). The path to the comment list is "https://developers.facebook.com/docs/reference/plugins/comments/" -> "comments" -> "data" on the first page and just "data" on the second. So you need to handle both formats as well as the URL being included as the root object on the first page. Don't know why this would be the case, just need to handle it.

Last but not least you want to include the comments in a way that can be indexed by search engines but not visible to regular site visitors. I've found that including the SEO list in the tag does the trick, i.e.

I've included the source code for an ASP.NET user control below - this is the code I'm using on the blog. You can see an example of the output on any page with Facebook comments. The code uses Json.net.



The curious case of the missing slugs (in BlogEngine.net 2.8)

Updated on Sunday, September 30, 2018

2013-06-16 Update: There is now a patch for the issue discussed below.

I just upgraded to BlogEngine.net 2.8 as it contains a fix for broken links from Facebook. There were a couple of hitches that I'll share in case they help anyone else.

I messed up the first upgrade attempt because the updater utility updates the source folder (containing the newly downloaded 2.8 code) instead of the destination folder (containing the current version of your blog). This is a little odd and the result is I uploaded an unchanged instance and then embarrassingly complained the the Facebook bug hadn't been fixed. It had, just not in the folder I was expecting. I probably didn't pay enough attention to the instruction video.

Having got that out of the way I discovered that new posts were appearing with a bad link (to /.aspx instead of /blog-title.aspx). I rarely post using the editor as I have a home-grown post by email service running. After a bit of digging it turns out that prior to 2.8 you could leave the slug empty when creating a post but now this results in the bad link. Luckily there isn't much effort require to fix this, you just need to set the slug before saving the new post:

In the middle of playing with this my live site died and started returning a 500 error. No amount of uploading the working local copy would fix this. Happily Server Intellect have outstanding support and restored a working backup for me in the middle of the night. Thanks chaps!

Catfood: Earth for Android 1.10

Updated on Friday, February 24, 2017

Catfood Earth for Android 1.10

I’ve just released Catfood Earth for Android 1.10. You can control the center of the screen manually (the most requested new feature) and also tweak the transparency of each layer and the width of the terminator between day and night. It also starts a lot faster and has fewer update glitches. Grab it from Google Play if this looks like your sort of live wallpaper.