I Thought He Came With You is Robert Ellison’s blog about software, marketing, politics, photography and time lapse.

Better related posts with word2vec (C#)

I have been experimenting with word2vec recently. Word2vec trains a neural network to guess which word is likely to appear given the context of the surrounding words. The result is a vector representation of each word in the trained vocabulary with some amazing properties (the canonical example is king - man + woman = queen). You can also find similar words by looking at cosine distance - words that are close in meaning have vectors that are close in orientation.

This sounds like it should work well for finding related posts. Spoiler alert: it does!

My old system listed posts with similar tags. This worked reasonably well, but it depended on me remembering to add enough tags to each post and a lot of the time it really just listed a few recent posts that were loosely related. The new system (live now) does a much better job which should be helpful to visitors and is likely to help with SEO as well.

I don't have a full implementation to share as it's reasonably tightly coupled to my custom CMS but here is a code snippet which should be enough to get this up and running anywhere:

The first step is getting a vector representation of a post. Word2vec just gives you a vector for a word (or short phrase depending on how the model is trained). A related technology, doc2vec, adds the document to the vector. This could be useful but isn't really what I needed here (i.e. I could solve my forgetfulness around adding tags by training a model to suggest them for me - might be a good project for another day). I ended up using a pre-trained model and then averaging together the vectors for each word. This paper (PDF) suggests that this isn't too crazy.

For the model I used word2vec-slim which condenses the Google News model down from 3 million words to 300k. This is because my blog runs on a very modest EC2 instance and a multi-gigabyte model might kill it. I load the model into Word2vec.Tools (available via NuGet) and then just get the word vectors (GetRepresentationFor(...).NumericVector) and average them together.

I haven't included code to build the word list but I just took every word from the post, title, meta description and tag list, removed stop words (the, and, etc) and converted to lower case.

Now that each post has a vector representation it's easy to compute the most related posts. For a given post compute the cosine distance between the post vector and every other post. Sort the list in ascending order and pick however many you want from the top (the distance between the post and itself would be 1, a totally unrelated post would be 0). The last line in the code sample shows this comparison for one post pair using Accord.Math, also on Nuget.

I'm really happy with the results. This was a fast implementation and a huge improvement over tag based related posts.

Animation of a year of Global Cloud Cover

Animation of a year of Global Cloud Cover

Here's an animation showing a year of global cloud cover (from July 2017 to July 2018) :

The clouds are sourced from the free daily download at xplanet. I run a Google apps script that saves a copy of the image to Google Drive every day (basically the same as this script to save Nest cam images). The hard part was waiting a year to get enough frames. Xplanet combines GEOS, METEOSAT and GMS satellite imagery with some reflection near the poles. Although I didn't need to for this project note that you can subscribe to higher quality / more frequent downloads.

As well as the clouds you can also see the terminator between day and night change shape over the course of the year. This video starts and ends with the Summer equinox when days are longest in the Northern hemisphere.

Where it's nighttime the image is based on NASA's Black Marble. The daytime is based on Blue Marble, but blended with a different older image which has better ocean colors and interpolated daily between twelve monthly Blue Marble satellite images. The result of this is that you can see snow and ice coverage changing over the course of the year. If you look closely you'll also notice vegetation growing and dying back with the seasons.

Rendered in a slightly modified build of Catfood Earth (the main release doesn't know how to access my private cache of xplanet cloud images). As well as combining day, night and cloud images Catfood Earth can also show you earthquakes, volcanoes, US weather radar, political borders, places and time zones. It has been enlivening Windows desktop wallpaper for fifteen years now (as shareware back when that was a thing, these days it's a free download for Windows and Android).

Export Google Fit Daily Steps to a Google Sheet

Updated on Saturday, December 8, 2018

Google Fit Daily Step Export

Google Fit is a great way to keep track of your daily step count without needing to carry a Fitbit or other dedicated tracker. It's not easy to get that data out though, as far as I can tell the only way is Google Takeout which is not made for automation. Luckily there is an API and you can do almost anything with Google Sheets.

If you're looking to export your step count this post has everything you need, just follow the instructions below to get your spreadsheet up and running. This is also a good primer on using OAuth2 with Google Apps Script and should be a decent starting point for a more complex Google Fit integration. If you have any questions or feedback please leave a comment below.

To get started you need a Google Sheet, an apps script project attached to the sheet and a Google API Project that will provide access to the Fitness API. That might sound intimidating but it should only take a few minutes to get everything up and running.

In Google Drive create a new spreadsheet and call it whatever you like. Rename the first tab to 'Steps'. Enter 'Date' in cell A1 and 'Steps' in cell B1. To grab history as well create another tab called 'History' with the same headers. Next select 'Script editor...' from the Tools menu which will open a new apps script project.

Give the apps script project a name and then select 'Libraries...' from the Resources menu. Next to 'Add a library' enter 1B7FSrk5Zi6L1rSxxTDgDEUsPzlukDsi4KGuTMorsTQHhGBzBkMun4iDF ad click Add. This will find the Google OAuth2 library. Choose the most recent version (24 at the time of writing) and click Save. Then select 'Project properties' from the File menu and make a note of the Script ID (a long series of letters and numbers).

Open the Google API Console. Create a new project and name it something like 'Google Fit Sheet'. From the Dashboard click Enable APIs and Services and find and select the Fitness API. Then go to Keys and create an OAuth Client ID. You'll be asked to create a consent screen, the only field you need to enter is the product name (i.e. 'My Fit App'). Then choose Web Application as the application type. You need to set the name and the authorized redirect URL. The redirect URL is https://script.google.com/macros/d/{SCRIPTID}/usercallback replacing {SCRIPTID} with the actual Script ID you made a note of above. After adding this make a note of the Client ID and Client Secret.

Go back to the apps script project and paste the code below into the Code.gs window:

Right at the top of the code there are spaces to enter the Client ID and Client Secret from the API Console. Enter these and save the project.

Switch back to your Google Sheet and reload. After reloading there will be a Google Fit menu item. First select Authorize... You'll get a screen to authorize the script and then a sidebar with a link. Click the link to authorize the script to access your Google Fit data. You can then close the sidebar and select Get Steps for Yesterday from the Google Fit menu. You should see a new row added to the spreadsheet with yesterday's date and step count.

The final step is to automate pulling in the data. Go back to the apps script project and select Current project's triggers from the Edit menu. Add a trigger to run getSteps() as a time driven day timer - I recommend between 5 and 6am. You can also click notifications to add an email alert if anything goes wrong, like your Google Fit authorization expiring (in which case you just need to come back and authorize from the Google Fit menu again.

At this point you're all set. Every day the spreadsheet will automatically update with your step count from the day before. You can add charts, moving averages, export to other systems, pull in your weight or BMI, etc. I want to add a seven day moving average step count to this blog somewhere as a semi-public motivational tool... watch this space.

If you are looking to extend this sample to other data types then this API explorer page is very helpful for finding data types that the API documentation doesn't list.

Enable GZIP compression for Amazon S3 hosted website in CloudFront

Updated on Wednesday, October 18, 2017

Enable GZIP compression for Amazon S3 hosted website in CloudFront

By default compression doesn't work in CloudFront for a website backed by an Amaxon S3 bucket.

The first step is pretty obvious - switch on compression in CloudFront:

Compress Objects Automatically option in Amazon CloudFront

To get to this setting open you distribution, go to the Behaviors tab and edit your behavior(s). Scroll down to the bottom and toggle Compress Objects Automatically to On. Save and drum your fingers while the distribution updates.

The less obvious piece is that CloudFront will only compress files between 1,000 and 10,000,000 bytes (as of writing this post) and it detects the filesize from the Content-Length header. What the documentation doesn't mention is that S3 does not send the Content-Length header by default and so no compression is applied.

Go to S3 and open the properties for your bucket (not for individual files). Expand Permissions and then click Edit CORS Configuration. You need to add Content-Length as an allowed header like this:

Amazon S3 CORS Configuration

Catfood Software Support

Updated on Thursday, March 1, 2018

Catfood Software Support

Need help with a Catfood Software product? Please leave a comment below.

Catfood Earth 3.41

Updated on Wednesday, February 22, 2017

Catfood Earth 3.41

Catfood Earth 3.41 fixes a problem that was preventing the weather radar layer from loading.

I've also updated to the latest (2015g) time zone database and the latest time zone map from Eric Muller.

Download the latest Catfood Earth.

(Previously)

Catfood Earth 3.40

Updated on Wednesday, February 22, 2017

Catfood Earth 3.40

I've just released Catfood Earth 3.40 for Windows and 1.50 for Android.

Both updates fix a problem with the clouds layer not updating. The Android update also adds compatibility for Android 5 / Lollipop.

Also, Catfood Earth for Android is now free. I had been charging $0.99 for the Android version but I've reached the conclusion that I'm never going to retire based on this (or even buy more than a couple of beers) so it's not worth the hassle. Catfood Earth for Windows has been free since 3.20.

Enjoy!

Fortune Cookies for Android

Updated on Thursday, November 12, 2015

Fortune Cookies for Android

Fortune is now available on Google Play. It's an Android version of the UNIX fortune program and will send a random fortune cookie to your notification area at 8ish every morning.

ZoneInfo Update (tzdata for .NET)

Updated on Thursday, November 12, 2015

ZoneInfo Update (tzdata for .NET)

I've used the ZoneInfo (PublicDomain.ZoneInfo) project from CodePlex for quite a few years, especially in Catfood Earth. The project had rusted a little so I emailed the author (Mark Rodrigues) and he was kind enough to add me as a developer. I've just updated ZoneInfo with some of the local changes I'd made and a variety of patches from the CodePlex community. It now works with the latest IANA tzdata file, at least for the test cases I can run. Let me know if I missed something (and thanks Mark for letting me contribute back to this very helpful project).

Thank you Feedly

Updated on Thursday, November 12, 2015

Thank you Feedly

It has been brought to my attention that I've been whinging too much recently

So I'd like to take a break from that and say how much I'm enjoying feedly. It's a wonderfully well designed RSS reader. I use the Chrome Extension version and the Android app. It preserves the Google Reader keyboard shortcuts so I can sail through my subscriptions and it brings back social sharing. 

I looked at feedly once before and didn't really get it. I thought it was just one of those algorithmic recommendation news manglers that tries to guess what you want to read. It might do that on the home page but the 'All' view is a perfect replacement for Google Reader. 

I love it. I want to pay for it to make sure it stays around. Thank you feedly.