Links for February 2023

Upgrading from word2vec to OpenAI

[1536]

In 2018 I upgraded the related posts functionality on this blog to use word2vec. This was hacked together by averaging the vectors for interesting words in each post together and then looking for the closest vectors. It worked quite well, but the state of the art has moved on just a little bit since then.

OpenAI has an embeddings API and recently released a cheaper model called text-embedding-ada-002. The vectors have 1,536 dimensions, a pretty significant increase from the 300 I was using with word2vec. Creating vectors for all my posts took a few minutes and cost $0.11 which is pretty affordable. As you'd expect those related posts are now significantly more related and useful. Thanks OpenAI!

I shared some code previously for the word2vec hack. This is a lot more straightforward - call the API with the post text and then compare the vectors with cosine distance to find the most related. It works well for search too.

Add your comment...

Related Posts

(All Code Posts)

(Published to the Fediverse as: Upgrading from word2vec to OpenAI #code #ml #openai #ithcwy #word2vec Using the Open AI embeddings API to find better related posts for a blog. )

Migrating a C# Integration from GA3 to GA4

Updated on Saturday, May 6, 2023

GA4

This blog has a couple of Google Analytics integrations - the popular posts list is pulled from GA, and the unnecessarily accurate count of non visitors in the footer. I just migrated from the GA3 API to the GA4 API. The backend for this blog is ASP.NET MVC with .NET 4.8. One day I might catch up with the cool kids and try to get on .NET Core, but not today.

Here's where I stubbed my toe:

I'm following this code sample to make my first GA4 call. After installing the NuGet package I couldn't find BetaAnalyticsDataClient anywhere. It turns out that there is a Google.Apis.AnalyticsData.v1beta package and a Google.Analytics.Data.V1Beta which is only available if you check 'Include prerelease' when searching. You want the second one. I'm not in love with BetaAnalyticsDataClient as a class name, it suggests all sorts of breaking changes are coming. My GA3 integration has ticked over for years with no changes. Maybe GA4 is going to be more like Google Ads and shank you with breaking changes every few months. Moving on...

Wow the error messages are good. Kudos to the API team. I'm so used to cryptic bullshit but this API tells you what you're doing wrong and sends back helpful pointers and even URLs. Every API should be this friendly. I got through the remaining problems fairly quickly because of this.

The code sample passes the property ID as 'property/nnnnnnn' but the API is expecting 'properties/nnnnnnn'.

I'd been using a ServiceAccountCredential created from a .p12 file for GA3. This doesn't seem to be supported for BetaAnalyticsDataClient but I was able to generate a new credential with a .json serialization of the credentials and passing this to BetaAnalyticsDataClient worked fine. I had a permission denied error, this was because I hadn't added the service account email address to the property and doing so got me some data.

The client library is pretty classy (as in too many classes). Creating a filter to exclude internal users involves four nested classes - a FilterExpression that has another FilterExpression for a not condition and then this needs a Filter and the Filter needs a StringFilter. Tedious. And including enums for metrics and dimensions is too much trouble so adding those now requires Metric and Dimension classes but these are just initialized with a string. The list is here.

Lastly when it comes to running the thing the site won't start and says:

"CS0012: The type 'System.Object' is defined in an assembly that is not referenced. You must add a reference to assembly 'netstandard, Version=2.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51'."

Presumably due to some NuGet horror or other. Adding that reference indeed fixes the problem and hopefully doesn't create a new one.

I am now technically if not emotionally prepared for GA3 to be switched off.

Add your comment...

Related Posts

(All Code Posts)

(Published to the Fediverse as: Migrating a C# Integration from GA3 to GA4 #code #ga4 #ithcwy #c# Some tips on avoiding pitfalls when migrating a C# application from Google Analytics 3 to Google Analytics 4. )

Server Migration Complete

2022

ITHCWY just migrated to a new server with more capacity. From clouds to pollution to covid monitoring there is a lot going on under the hood here these days. I can already see that search is a lot more snappy.

Man, moving Windows servers just sucks. Internal GDI error - that's a permissions issue obviously (not my first time). Can't publish? You must need to install Web Deploy twice and then uninstall it and install it one more time for luck. Web.config error at startup? Of course Windows Server 2022 still doesn't come with URL Rewriting and you need to install it from some corner of the MS website like an animal. Yes, I should use something else but my custom CMS is probably reaching the same level of complexity as an F16 and I'm just one person. Enjoy the fast search!

Add your comment...

Related Posts

(All Etc Posts)

(Published to the Fediverse as: Server Migration Complete #etc #ithcwy #microsoft ITCHWY has migrated to a beefer box and a new OS and still seems to be working. Oh, and search is faster! )

ITHCWY on Mastodon

Updated on Thursday, February 16, 2023

A Mastodon in a primordial forest

Well I picked a bad year to dabble back in Twitter. I'm not ready to delete it all again, but I am experimenting with Mastodon. There is an official ITHCWY account here, this is fully automated via the Mastodon API and will publish posts, comments and news shares. It's currently posting its way through the back catalog.  I also have a personal account here. So far Mastodon is a pretty good experience. There is some stress in picking a server but once you're on everything else is easy. Will just have to see if there is a regression to the mean as the number of users continues to increase.

Add your comment...

Related Posts

(All Etc Posts)

Links for December 2022

Updated on Saturday, December 31, 2022

1,000th Post!

1k

This is the 1,000th (possibly voluntary) post on I Thought He Came With You. To celebrate, here are 17 posts in no particular order from the past 17+ years of blogging.

Methyl L-α-aspartyl-L-fucking-phenylalaninate: What does the UK have against sugar?

How to fix software patents: I co-founded a startup that was killed by a patent lawsuit and so I think it's fair to say I have strong feelings about this. I've had various ideas for improving the system over the years, this one is still the best. It's a radical proposal to stop examining patents altogether, while continuing to protect genuine innovation. Fuck the trolls.

Extreme Environmentalism: Environmentalists are slowly coming around to nuclear power, but is it possible that the greenest thing to do is even more radical?

What do you get when you multiply six by nine? Brexit.: If Douglas Adams were still alive I'm 42% certain this is what he would have written about Brexit. The problems we need to solve a species are better solved together, stop putting the 'B' arkers in power FFS.

The real reason Americans don't have passports: Maybe this is finally going to get fixed, but why can't we overhaul basic government services like other countries seem to manage to do routinely.

ESRI Shapefile Reader in .NET: A shapefile is a common file format for GIS (Geographic Information System) data like county or country borders. I needed to work with this for an update to Catfood Earth and there was nothing that made it easy to just load the data and do something with it. I ended up writing and releasing an open source library which became quite popular. I'm still discovering interesting and/or frightening places where this is used, like laying out power lines or forming evacuation plans.

Response to GGNRA Draft Dog Management Plan: In which I fought the National Park Service and eventually won. It wasn't just me, but it still felt good. Let happy dogs forever roam Fort Funston.

Bishops: Get them off of the kids and out of my government.

I didn't think I'd ever fall for fake news on Facebook: I discover that I'm just as dumb as everyone else which is a bit humbling. Social media sucks and we should all go back to wonderful blogs like this one. It's a problem that only you can fix - do something like this.

Got It: I hate this trend in interaction design.

Export Google Fit Daily Steps, Weight and Distance to a Google Sheet: This is by far the most popular post on my blog. I think that's partly because Google loves it when you do the hard work of supporting their products for them. It also scratches a real itch for a lot of people who want to liberate their data and so something interesting with it. And I just love apps script which gives you free and easy cloud computing, it's Google at their very best.

Sod Searle And Sod His Sodding Room: So much sloppy thinking about AI. I had lectures about this at university when they should have just had us read Gödel, Escher, Bach instead.

Meeting Defragmenter: I have achieved this vision manually via slow nudges and strategic calendar blocks and it's actually pretty great. You end up with some horrible days and some transcendentally good ones. YMMV.

Reviews and links for March 2011: I still read a lot, but I'm not as good as I used to be at taking the time to write a thoughtful review. Whenever I remember the time I tore Eric Carle (RIP) a new one for The Very Quiet Cricket I make a resolution to start again.

Cam of Fortune!: I so nearly got fired for this, when the Managing Director at my first post-University job got the fax instead of me. He thought it was for real. And then I did this when we moved offices. How did I stay employed?

Lock up the Flexible Spending Account Administrators. It turns out that meaningless paperwork is a worse problem for society than actual serial killers.

Bernal Hill: My very first blog post from August 13, 2005. I had just received a GPS for my birthday and was very excited about getting data from hikes but also very embarrassed about writing anything in public on the internet. I had been online for more than a decade at this point. My first email address was 1991 and my first website 1996 (lost even to the Internet Archive, www.catfood.demon.co.uk). The first thing I actually ever published online (in 1997) was this article exposing something very interesting about the Pentium processor. I wrote this with a friend earlier and it ended up in the first issue of Catfood Magazine. It has been a long time since I've used a stand alone GPS, but I have posted a lot of hikes.

Add your comment...

Related Posts

(All Etc Posts)

(Published to the Fediverse as: 1,000th Post! #etc #ithcwy Celebrating 1,000 posts on I Thought He Came With You, Robert Ellison's blog, with seventeen posts from the past seventeen years of blogging. )

Webmention on ITHCWY

Updated on Sunday, July 23, 2023

webmention

ITHCWY now supports a basic webmention implementation. Any inbound mentions will be dropped in the post moderation queue (so may take up to a few hours to appear as I check everything manually to keep the spam out). If an outbound link supports webmention then it will be mentioned. I'm only doing this for new and updated posts, not for the full archive. I'm a little Fediverse curious and this is a first step towards maybe implementing Bridgy Fed or even rolling my own ActivityPub implementation. Mostly I miss trackbacks and hope that we can figure out how to have nice things again.

Updated 2023-07-23 23:45:

I just added Bridgy Fed support so ITHCWY is sailing into the Fediverse. Details here.

Add your comment...

Related Posts

(All Etc Posts)

(Published to the Fediverse as: Webmention on ITHCWY #etc #ithcwy #bridgyfed #fediverse #indieweb #webmention Inbound and outbound webmentions are now supported for I Thought He Came With You blog posts. )

Links for October 2022

Links for July 2022

Updated on Friday, July 29, 2022