I've just tidied up and released a tool I've used for a while to sort photos and videos. It does a pretty good job figuring out the date each was taken and then moves them to a year + month subfolder. The source code and a binary release are now available on github - see photo-sorter.
This is a command line application with two arguments, a source folder and a destination folder. Use it like this (paths are examples and note that if there are spaces then the entire argument needs to be in quotes):
This will process all files in the source folder, including subfolders, even if they are not photos or videos. Each file will be moved to a year + month subfolder in the destination (i.e. 2019-08) or to a special subfolder (An Unknown Date) for any files where the date the photo or video was taken cannot be determined.
In addition to moving files the tool also handles de-duplication. If the file already exists in the destination folder it is just deleted from the source and not moved. This is checked by file contents (hash) and not by name. If a different file with the same name already exists in the destination folder then PhotoSorter will move it to a unique, new filename.
I originally wrote this to handle my 'Google Photos' folder - when this feature worked it just dumped everything from Google Photos into one Drive folder with no organization. I used this periodically to tidy everything into my Photos folder also backed up to Google Drive. Now that Google has stopped syncing Drive and Photos this is still useful, especially with my script that copies new photos over to Google Drive.
I have been experimenting with word2vec recently. Word2vec trains a neural network to guess which word is likely to appear given the context of the surrounding words. The result is a vector representation of each word in the trained vocabulary with some amazing properties (the canonical example is king - man + woman = queen). You can also find similar words by looking at cosine distance - words that are close in meaning have vectors that are close in orientation.
This sounds like it should work well for finding related posts. Spoiler alert: it does!
My old system listed posts with similar tags. This worked reasonably well, but it depended on me remembering to add enough tags to each post and a lot of the time it really just listed a few recent posts that were loosely related. The new system (live now) does a much better job which should be helpful to visitors and is likely to help with SEO as well.
I don't have a full implementation to share as it's reasonably tightly coupled to my custom CMS but here is a code snippet which should be enough to get this up and running anywhere:
The first step is getting a vector representation of a post. Word2vec just gives you a vector for a word (or short phrase depending on how the model is trained). A related technology, doc2vec, adds the document to the vector. This could be useful but isn't really what I needed here (i.e. I could solve my forgetfulness around adding tags by training a model to suggest them for me - might be a good project for another day). I ended up using a pre-trained model and then averaging together the vectors for each word. This paper (PDF) suggests that this isn't too crazy.
For the model I used word2vec-slim which condenses the Google News model down from 3 million words to 300k. This is because my blog runs on a very modest EC2 instance and a multi-gigabyte model might kill it. I load the model into Word2vec.Tools (available via NuGet) and then just get the word vectors (GetRepresentationFor(...).NumericVector) and average them together.
I haven't included code to build the word list but I just took every word from the post, title, meta description and tag list, removed stop words (the, and, etc) and converted to lower case.
Now that each post has a vector representation it's easy to compute the most related posts. For a given post compute the cosine distance between the post vector and every other post. Sort the list in ascending order and pick however many you want from the top (the distance between the post and itself would be 1, a totally unrelated post would be 0). The last line in the code sample shows this comparison for one post pair using Accord.Math, also on Nuget.
I'm really happy with the results. This was a fast implementation and a huge improvement over tag based related posts.
I needed a console app that reads some inputs from an online Excel workbook, does some processing and then writes back the results to a different worksheet. Because I enjoy pain I decided to use the thinly documented new Microsoft.Graph client library. The sample code below assumes that you have a work or education Office 365 subscription.
Paste the code into a new console project and then follow the instructions at the top to add the necessary NuGet packages. You'll also need to register an application at https://portal.azure.com/. You want a Native application and you'll need the Application ID and the redirect URL (just make up some non-routable URL for this). Under Required Permissions for the app you should add read and write files delegated permissions for the Microsoft Graph API.
Hope this saves you a few hours. Comment below if you need a more detailed explanation for any of the above.
I'm working on page speed and Google PageSpeed Insights is telling me that my PNGs are just way too large. Sadly .NET does not provide any way to optimize PNG images so there is no easy fix - just unmanaged libraries and command line tools.
I have an allergy to manual processes so I've lashed up some code to automatically find and optimize PNGs in my App_Data folder using PNGCRUSH. I can call CrushAllImages() to fix up everything or CrushImage() when I need to fix up a specific PNG. Code below:
Note that I'm using this to inline CSS for this blog. The pages are cached so I'm not worried about how well this action performs. My blog is also basically all landing pages so I'm also not worried about caching a non-inline version for later use, I just drop all the CSS on every page.
Did you know that Windows still has a vestigial finger command with just about nothing left to talk to? One of my New Year's resolutions is to bring finger back and unlike the stalled webfinger project I need to make some progress. Here's some C# to run your own personal finger daemon... you just need to create a .plan file in your home directory (haven't done that for a while):
I've used the ZoneInfo (PublicDomain.ZoneInfo) project from CodePlex for quite a few years, especially in Catfood Earth. The project had rusted a little so I emailed the author (Mark Rodrigues) and he was kind enough to add me as a developer. I've just updated ZoneInfo with some of the local changes I'd made and a variety of patches from the CodePlex community. It now works with the latest IANA tzdata file, at least for the test cases I can run. Let me know if I missed something (and thanks Mark for letting me contribute back to this very helpful project).
I had a play with the Google Spreadsheets API recently to feed in some data from a C# application. The getting started guide is great and I was authenticated and adding dummy data in no time. But as soon as I started to work with real data I got:
"The remote server returned an error: (400) Bad Request."
And digging deeper into the response:
"We're sorry, a server error occurred. Please wait a bit and try reloading your spreadsheet."
The original sample code still worked so it didn't seem like any sort of temporary glitch as the message suggests. After much hair torn it turns out I was getting this error because I had used the literal column names from my spreadsheet. The API expects them to be lower case with spaces removed. If not columns match you get the unhelpful error above, if at least one column matches you get a successful insert with some missing data.
Error messages are one of the hardest parts of an API to get right. If you're not very detailed then what seems obvious to you can leave your developers stumped.