I've just tidied up and released a tool I've used for a while to sort photos and videos. It does a pretty good job figuring out the date each was taken and then moves them to a year + month subfolder. The source code and a binary release are now available on github - see photo-sorter.
This is a command line application with two arguments, a source folder and a destination folder. Use it like this (paths are examples and note that if there are spaces then the entire argument needs to be in quotes):
This will process all files in the source folder, including subfolders, even if they are not photos or videos. Each file will be moved to a year + month subfolder in the destination (i.e. 2019-08) or to a special subfolder (An Unknown Date) for any files where the date the photo or video was taken cannot be determined.
In addition to moving files the tool also handles de-duplication. If the file already exists in the destination folder it is just deleted from the source and not moved. This is checked by file contents (hash) and not by name. If a different file with the same name already exists in the destination folder then PhotoSorter will move it to a unique, new filename.
I originally wrote this to handle my 'Google Photos' folder - when this feature worked it just dumped everything from Google Photos into one Drive folder with no organization. I used this periodically to tidy everything into my Photos folder also backed up to Google Drive. Now that Google has stopped syncing Drive and Photos this is still useful, especially with my script that copies new photos over to Google Drive.
I have been experimenting with word2vec recently. Word2vec trains a neural network to guess which word is likely to appear given the context of the surrounding words. The result is a vector representation of each word in the trained vocabulary with some amazing properties (the canonical example is king - man + woman = queen). You can also find similar words by looking at cosine distance - words that are close in meaning have vectors that are close in orientation.
This sounds like it should work well for finding related posts. Spoiler alert: it does!
My old system listed posts with similar tags. This worked reasonably well, but it depended on me remembering to add enough tags to each post and a lot of the time it really just listed a few recent posts that were loosely related. The new system (live now) does a much better job which should be helpful to visitors and is likely to help with SEO as well.
I don't have a full implementation to share as it's reasonably tightly coupled to my custom CMS but here is a code snippet which should be enough to get this up and running anywhere:
The first step is getting a vector representation of a post. Word2vec just gives you a vector for a word (or short phrase depending on how the model is trained). A related technology, doc2vec, adds the document to the vector. This could be useful but isn't really what I needed here (i.e. I could solve my forgetfulness around adding tags by training a model to suggest them for me - might be a good project for another day). I ended up using a pre-trained model and then averaging together the vectors for each word. This paper (PDF) suggests that this isn't too crazy.
For the model I used word2vec-slim which condenses the Google News model down from 3 million words to 300k. This is because my blog runs on a very modest EC2 instance and a multi-gigabyte model might kill it. I load the model into Word2vec.Tools (available via NuGet) and then just get the word vectors (GetRepresentationFor(...).NumericVector) and average them together.
I haven't included code to build the word list but I just took every word from the post, title, meta description and tag list, removed stop words (the, and, etc) and converted to lower case.
Now that each post has a vector representation it's easy to compute the most related posts. For a given post compute the cosine distance between the post vector and every other post. Sort the list in ascending order and pick however many you want from the top (the distance between the post and itself would be 1, a totally unrelated post would be 0). The last line in the code sample shows this comparison for one post pair using Accord.Math, also on Nuget.
I'm really happy with the results. This was a fast implementation and a huge improvement over tag based related posts.
I needed a console app that reads some inputs from an online Excel workbook, does some processing and then writes back the results to a different worksheet. Because I enjoy pain I decided to use the thinly documented new Microsoft.Graph client library. The sample code below assumes that you have a work or education Office 365 subscription.
Paste the code into a new console project and then follow the instructions at the top to add the necessary NuGet packages. You'll also need to register an application at https://portal.azure.com/. You want a Native application and you'll need the Application ID and the redirect URL (just make up some non-routable URL for this). Under Required Permissions for the app you should add read and write files delegated permissions for the Microsoft Graph API.
Hope this saves you a few hours. Comment below if you need a more detailed explanation for any of the above.
I'm working on page speed and Google PageSpeed Insights is telling me that my PNGs are just way too large. Sadly .NET does not provide any way to optimize PNG images so there is no easy fix - just unmanaged libraries and command line tools.
I have an allergy to manual processes so I've lashed up some code to automatically find and optimize PNGs in my App_Data folder using PNGCRUSH. I can call CrushAllImages() to fix up everything or CrushImage() when I need to fix up a specific PNG. Code below:
Note that I'm using this to inline CSS for this blog. The pages are cached so I'm not worried about how well this action performs. My blog is also basically all landing pages so I'm also not worried about caching a non-inline version for later use, I just drop all the CSS on every page.
Did you know that Windows still has a vestigial finger command with just about nothing left to talk to? One of my New Year's resolutions is to bring finger back and unlike the stalled webfinger project I need to make some progress. Here's some C# to run your own personal finger daemon... you just need to create a .plan file in your home directory (haven't done that for a while):
I've been using the Facebook Comments Box on this blog since I parted ways with Disqus. One issue with the Facebook system is that you won't get SEO credit for comments displayed in an iframe. They have an API to retrieve comments but the documentation is pretty light and so here are three critical tips to get it working.
The first thing to know is that comments can be nested. Once you've got a list of comments to enumerate through you need to check each comment to see if it has it's own list of comments and so on. This is pretty easy to handle.
Last but not least you want to include the comments in a way that can be indexed by search engines but not visible to regular site visitors. I've found that including the SEO list in the tag does the trick, i.e.
I've included the source code for an ASP.NET user control below - this is the code I'm using on the blog. You can see an example of the output on any page with Facebook comments. The code uses Json.net.
I messed up the first upgrade attempt because the updater utility updates the source folder (containing the newly downloaded 2.8 code) instead of the destination folder (containing the current version of your blog). This is a little odd and the result is I uploaded an unchanged instance and then embarrassingly complained the the Facebook bug hadn't been fixed. It had, just not in the folder I was expecting. I probably didn't pay enough attention to the instruction video.
Having got that out of the way I discovered that new posts were appearing with a bad link (to /.aspx instead of /blog-title.aspx). I rarely post using the editor as I have a home-grown post by email service running. After a bit of digging it turns out that prior to 2.8 you could leave the slug empty when creating a post but now this results in the bad link. Luckily there isn't much effort require to fix this, you just need to set the slug before saving the new post:
In the middle of playing with this my live site died and started returning a 500 error. No amount of uploading the working local copy would fix this. Happily Server Intellect have outstanding support and restored a working backup for me in the middle of the night. Thanks chaps!