Links for November 2021

Mimica, which automates RPA, raises $6M Series A funding led by Khosla Ventures

I'm holding out for the company that automates robotic process automation automation.

(All Etc Posts)

Links for September 2021

Updated on Tuesday, September 14, 2021
Snap Out of It, America: Give Kids the Right to Vote

So I disagree with this but it's interesting and well argued. A better idea is my life expectancy weighted voting plan.

--

The ease of mail-in voting may increase turnout in California’s recall election.

NYT finally twigs.

--

A woman is suing S.F. for $50 million over a parking ticket, saying tire chalk is unconstitutional

In one of the cases, filed Sept. 4, plaintiff Maria Infante seeks $50 million and class-action status after a San Francisco parking enforcement officer wielding chalk on a residential street gave her a $95 ticket.
The second case, filed the same day against San Leandro, demands $5 million for class members whose tires were chalked to financially benefit the city.

Civilization continues to collapse. I had my tongue in my cheek for this proposed constitutional amendment but I'm not so sure any more...

--

How to Call Customer Service and Actually Get What You Want

Wired has this generic article on getting support with some insights that might have been cutting age a decade ago. I'm still waiting for CAPTGUAs.

(All Etc Posts)

Links for August 2021

Updated on Wednesday, September 1, 2021

Links for July 2021

The New York Times: Opinion | America Needs to Break Up Its Biggest States

I'd rather get rid of the electoral college and have a national, ranked choice vote but this is interesting.

(All Etc Posts)

Route map and elevation profile for Hike Posts

Hike Profiles and Map

I have just updated my blog with two new features for hike posts.

The route is now shown in an embedded Google map. The map has a pin for the start of the hike (and for any geolocated photos included with the post) and a red line for the GPS track. Don't use this for following the route - better to download the KML file and have an offline map - but it's great for getting a sense of the hike.

Below the map is a new hike profile that plots elevation against distance together with estimated total distance and elevation gain. This is fairly tricky as my phone GPS isn't the best and altitude in particular jumps around a fair bit. To make this reasonably stable I take trackpoints every 60th of a mile and I only count elevation gain when it has gone up more than 100 feet. This probably errs on the side of underestimating unless the signal is really bad. The overall shape is a pretty good guide but please don't depend on the distance and elevation gain.

(Hike Map)

Links for June 2021

Updated on Thursday, June 24, 2021

Links for April 2021

Updated on Thursday, April 22, 2021
The New York Times: How Brexit Ruined Easter for Britain’s Chocolate Makers

Interesting that there also seems to be a shortage on British supermarket shelves as well then.

--

Easter egg hunt: UK shoppers disappointed by shortages | Easter | The Guardian

Here's the Guardian on the shortage. So British chocolate can't be found in Europe or the UK. Where is it going? Eezy Freezy!

--

The story behind Kidlapse

Latest update on Kidlapse.

--

Search Engine Roundtable: Google: User Generate Content Products Reviews Will Have A Hard Time Ranking Well

And instead we get affiliate link stuffed fluff pieces?

--

The Washington Post: We should soon stop catering to the vaccine holdouts

Yes

(All Etc Posts)

Links for March 2021

Solar System Cartogram

Might need revisiting when we get a submarine into Enceladus.

(All Etc Posts)

ITHCWY Redesign

I've just launched a redesign of I Thought He Came With You. The main thrust is to make the site more usable on desktops. Which seems nuts, but the data doesn't lie. The site has low mobile traffic and for a while I thought this was some kind of technical issue. I optimized the design heavily for mobile and spent a lot of time on speed and some AMP. I guess it's the content. Google loves it when I write documentation for them and doesn't think I have anything useful to say on politics. They're probably right. So I've gone back to having an old school sidebar and I've taken the performance hit of using Bootstrap to get some better looking forms and navigation without spending a lot of time on it. I hope you enjoy it, and if you find anything broken please email or leave a comment.

(All Etc Posts)

Better related posts with word2vec (C#)

I have been experimenting with word2vec recently. Word2vec trains a neural network to guess which word is likely to appear given the context of the surrounding words. The result is a vector representation of each word in the trained vocabulary with some amazing properties (the canonical example is king - man + woman = queen). You can also find similar words by looking at cosine distance - words that are close in meaning have vectors that are close in orientation.

This sounds like it should work well for finding related posts. Spoiler alert: it does!

My old system listed posts with similar tags. This worked reasonably well, but it depended on me remembering to add enough tags to each post and a lot of the time it really just listed a few recent posts that were loosely related. The new system (live now) does a much better job which should be helpful to visitors and is likely to help with SEO as well.

I don't have a full implementation to share as it's reasonably tightly coupled to my custom CMS but here is a code snippet which should be enough to get this up and running anywhere:

The first step is getting a vector representation of a post. Word2vec just gives you a vector for a word (or short phrase depending on how the model is trained). A related technology, doc2vec, adds the document to the vector. This could be useful but isn't really what I needed here (i.e. I could solve my forgetfulness around adding tags by training a model to suggest them for me - might be a good project for another day). I ended up using a pre-trained model and then averaging together the vectors for each word. This paper (PDF) suggests that this isn't too crazy.

For the model I used word2vec-slim which condenses the Google News model down from 3 million words to 300k. This is because my blog runs on a very modest EC2 instance and a multi-gigabyte model might kill it. I load the model into Word2vec.Tools (available via NuGet) and then just get the word vectors (GetRepresentationFor(...).NumericVector) and average them together.

I haven't included code to build the word list but I just took every word from the post, title, meta description and tag list, removed stop words (the, and, etc) and converted to lower case.

Now that each post has a vector representation it's easy to compute the most related posts. For a given post compute the cosine distance between the post vector and every other post. Sort the list in ascending order and pick however many you want from the top (the distance between the post and itself would be 1, a totally unrelated post would be 0). The last line in the code sample shows this comparison for one post pair using Accord.Math, also on Nuget.

I'm really happy with the results. This was a fast implementation and a huge improvement over tag based related posts.

(All Code Posts)