I Thought He Came With You is Robert Ellison’s blog about software, marketing, politics, photography and time lapse.

Catfood.Shapefile 1.60

I just released Catfood.Shapefile 1.60. This contains a fix from Libor Weigl that factors out the enumerator so that you can still access the shapefile after enumeration.

Catfood.Shapefile is a .NET library for parsing ESRI Shapefiles.

(previously)

Book reviews for August 2019

C# Machine Learning Projects: Nine real-world projects to build robust and high-performing machine learning models with C# by Yoon Hyup Hwang

C# Machine Learning Projects: Nine real-world projects to build robust and high-performing machine learning models with C# by Yoon Hyup Hwang

5/5

 

Digital Marketing in an AI World: Futureproofing Your PPC Agency by Frederick Vallaeys

Digital Marketing in an AI World: Futureproofing Your PPC Agency by Frederick Vallaeys

4/5

 

The Redemption of Time: A Three-Body Problem Novel by Baoshu

The Redemption of Time: A Three-Body Problem Novel by Baoshu

3/5

Not bad, a few moments that are a little too fan fic but overall has the tone and scope of the original.

 

The Expert System's Brother by Adrian Tchaikovsky

The Expert System's Brother by Adrian Tchaikovsky

3/5

 

The Lions of Lucerne (Scot Harvath, #1) by Brad Thor

The Lions of Lucerne (Scot Harvath, #1) by Brad Thor

3/5

 

The Possession (The Anomaly Files #2) by Michael Rutger

The Possession (The Anomaly Files #2) by Michael Rutger

4/5

 

Path of the Assassin (Scot Harvath, #2) by Brad Thor

Path of the Assassin (Scot Harvath, #2) by Brad Thor

3/5

 

Photo Sorter 1.00

Bacterial Mat

I've just tidied up and released a tool I've used for a while to sort photos and videos. It does a pretty good job figuring out the date each was taken and then moves them to a year + month subfolder. The source code and a binary release are now available on github - see photo-sorter.

This is a command line application with two arguments, a source folder and a destination folder. Use it like this (paths are examples and note that if there are spaces then the entire argument needs to be in quotes):

PhotoSorter.exe "C:\Users\My Name\Google Drive\Google Photos" "C:\Users\My Name\Photos"

This will process all files in the source folder, including subfolders, even if they are not photos or videos. Each file will be moved to a year + month subfolder in the destination (i.e. 2019-08) or to a special subfolder (An Unknown Date) for any files where the date the photo or video was taken cannot be determined.

In addition to moving files the tool also handles de-duplication. If the file already exists in the destination folder it is just deleted from the source and not moved. This is checked by file contents (hash) and not by name. If a different file with the same name already exists in the destination folder then PhotoSorter will move it to a unique, new filename.

I originally wrote this to handle my 'Google Photos' folder - when this feature worked it just dumped everything from Google Photos into one Drive folder with no organization. I used this periodically to tidy everything into my Photos folder also backed up to Google Drive. Now that Google has stopped syncing Drive and Photos this is still useful, especially with my script that copies new photos over to Google Drive.

Better related posts with word2vec (C#)

I have been experimenting with word2vec recently. Word2vec trains a neural network to guess which word is likely to appear given the context of the surrounding words. The result is a vector representation of each word in the trained vocabulary with some amazing properties (the canonical example is king - man + woman = queen). You can also find similar words by looking at cosine distance - words that are close in meaning have vectors that are close in orientation.

This sounds like it should work well for finding related posts. Spoiler alert: it does!

My old system listed posts with similar tags. This worked reasonably well, but it depended on me remembering to add enough tags to each post and a lot of the time it really just listed a few recent posts that were loosely related. The new system (live now) does a much better job which should be helpful to visitors and is likely to help with SEO as well.

I don't have a full implementation to share as it's reasonably tightly coupled to my custom CMS but here is a code snippet which should be enough to get this up and running anywhere:

The first step is getting a vector representation of a post. Word2vec just gives you a vector for a word (or short phrase depending on how the model is trained). A related technology, doc2vec, adds the document to the vector. This could be useful but isn't really what I needed here (i.e. I could solve my forgetfulness around adding tags by training a model to suggest them for me - might be a good project for another day). I ended up using a pre-trained model and then averaging together the vectors for each word. This paper (PDF) suggests that this isn't too crazy.

For the model I used word2vec-slim which condenses the Google News model down from 3 million words to 300k. This is because my blog runs on a very modest EC2 instance and a multi-gigabyte model might kill it. I load the model into Word2vec.Tools (available via NuGet) and then just get the word vectors (GetRepresentationFor(...).NumericVector) and average them together.

I haven't included code to build the word list but I just took every word from the post, title, meta description and tag list, removed stop words (the, and, etc) and converted to lower case.

Now that each post has a vector representation it's easy to compute the most related posts. For a given post compute the cosine distance between the post vector and every other post. Sort the list in ascending order and pick however many you want from the top (the distance between the post and itself would be 1, a totally unrelated post would be 0). The last line in the code sample shows this comparison for one post pair using Accord.Math, also on Nuget.

I'm really happy with the results. This was a fast implementation and a huge improvement over tag based related posts.

Reading and Writing Office 365 Excel from a Console app using the Microsoft.Graph C# Client API

Updated on Sunday, September 30, 2018

I needed a console app that reads some inputs from an online Excel workbook, does some processing and then writes back the results to a different worksheet. Because I enjoy pain I decided to use the thinly documented new Microsoft.Graph client library. The sample code below assumes that you have a work or education Office 365 subscription.

Paste the code into a new console project and then follow the instructions at the top to add the necessary NuGet packages. You'll also need to register an application at https://portal.azure.com/. You want a Native application and you'll need the Application ID and the redirect URL (just make up some non-routable URL for this). Under Required Permissions for the app you should add read and write files delegated permissions for the Microsoft Graph API.

Hope this saves you a few hours. Comment below if you need a more detailed explanation for any of the above.

Shapefile Update

A few people have asked for 3D shape support in my ESRI Shapefile library. I've never got around to it, but CodePlex user ekleiman has forked a version in his ESRI Shapefile to Image Convertor that supports PointZ, PolygonZ and PolyLineZ shapes. If that's what you need please check it out.

Crushing PNGs in .NET

Updated on Sunday, September 30, 2018

Crushing PNGs in .NET

I'm working on page speed and Google PageSpeed Insights is telling me that my PNGs are just way too large. Sadly .NET does not provide any way to optimize PNG images so there is no easy fix - just unmanaged libraries and command line tools.

I have an allergy to manual processes so I've lashed up some code to automatically find and optimize PNGs in my App_Data folder using PNGCRUSH. I can call CrushAllImages() to fix up everything or CrushImage() when I need to fix up a specific PNG. Code below:

Minify and inline CSS for ASP.NET MVC

Updated on Sunday, September 30, 2018

ASP.NET has a CssMinify class (and a JavaScript variant as well) designed for use in the bundling pipeline. But what if you want to have your CSS minified and inline? Here is an action that is working for me (rendered into a style tag on my _Layout.cshtml using @Html.Action("InlineCss", "Home")).

Note that I'm using this to inline CSS for this blog. The pages are cached so I'm not worried about how well this action performs. My blog is also basically all landing pages so I'm also not worried about caching a non-inline version for later use, I just drop all the CSS on every page.

Personal Finger Daemon for Windows

Updated on Sunday, September 30, 2018

Did you know that Windows still has a vestigial finger command with just about nothing left to talk to? One of my New Year's resolutions is to bring finger back and unlike the stalled webfinger project I need to make some progress. Here's some C# to run your own personal finger daemon... you just need to create a .plan file in your home directory (haven't done that for a while):

ZoneInfo Update (tzdata for .NET)

Updated on Thursday, November 12, 2015

ZoneInfo Update (tzdata for .NET)

I've used the ZoneInfo (PublicDomain.ZoneInfo) project from CodePlex for quite a few years, especially in Catfood Earth. The project had rusted a little so I emailed the author (Mark Rodrigues) and he was kind enough to add me as a developer. I've just updated ZoneInfo with some of the local changes I'd made and a variety of patches from the CodePlex community. It now works with the latest IANA tzdata file, at least for the test cases I can run. Let me know if I missed something (and thanks Mark for letting me contribute back to this very helpful project).