Upgrading from word2vec to OpenAI

[1536]

In 2018 I upgraded the related posts functionality on this blog to use word2vec. This was hacked together by averaging the vectors for interesting words in each post together and then looking for the closest vectors. It worked quite well, but the state of the art has moved on just a little bit since then.

OpenAI has an embeddings API and recently released a cheaper model called text-embedding-ada-002. The vectors have 1,536 dimensions, a pretty significant increase from the 300 I was using with word2vec. Creating vectors for all my posts took a few minutes and cost $0.11 which is pretty affordable. As you'd expect those related posts are now significantly more related and useful. Thanks OpenAI!

I shared some code previously for the word2vec hack. This is a lot more straightforward - call the API with the post text and then compare the vectors with cosine distance to find the most related. It works well for search too.

Add your comment...

Related Posts

You Might Also Like

(All Code Posts)

Migrating a C# Integration from GA3 to GA4

GA4

This blog has a couple of Google Analytics integrations - the popular posts list is pulled from GA, and the unnecessarily accurate count of non visitors in the footer. I just migrated from the GA3 API to the GA4 API. The backend for this blog is ASP.NET MVC with .NET 4.8. One day I might catch up with the cool kids and try to get on .NET Core, but not today.

Here's where I stubbed my toe:

I'm following this code sample to make my first GA4 call. After installing the NuGet package I couldn't find BetaAnalyticsDataClient anywhere. It turns out that there is a Google.Apis.AnalyticsData.v1beta package and a Google.Analytics.Data.V1Beta which is only available if you check 'Include prerelease' when searching. You want the second one. I'm not in love with BetaAnalyticsDataClient as a class name, it suggests all sorts of breaking changes are coming. My GA3 integration has ticked over for years with no changes. Maybe GA4 is going to be more like Google Ads and shank you with breaking changes every few months. Moving on...

Wow the error messages are good. Kudos to the API team. I'm so used to cryptic bullshit but this API tells you what you're doing wrong and sends back helpful pointers and even URLs. Every API should be this friendly. I got through the remaining problems fairly quickly because of this.

The code sample passes the property ID as 'property/nnnnnnn' but the API is expecting 'properties/nnnnnnn'.

I'd been using a ServiceAccountCredential created from a .p12 file for GA3. This doesn't seem to be supported for BetaAnalyticsDataClient but I was able to generate a new credential with a .json serialization of the credentials and passing this to BetaAnalyticsDataClient worked fine. I had a permission denied error, this was because I hadn't added the service account email address to the property and doing so got me some data.

The client library is pretty classy (as in too many classes). Creating a filter to exclude internal users involves four nested classes - a FilterExpression that has another FilterExpression for a not condition and then this needs a Filter and the Filter needs a StringFilter. Tedious. And including enums for metrics and dimensions is too much trouble so adding those now requires Metric and Dimension classes but these are just initialized with a string. The list is here.

Lastly when it comes to running the thing the site won't start and says:

"CS0012: The type 'System.Object' is defined in an assembly that is not referenced. You must add a reference to assembly 'netstandard, Version=2.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51'."

Presumably due to some NuGet horror or other. Adding that reference indeed fixes the problem and hopefully doesn't create a new one.

I am now technically if not emotionally prepared for GA3 to be switched off.

Add your comment...

Related Posts

You Might Also Like

(All Code Posts)

Timelapse of San Francisco Clouds After Various Atmospheric Rivers

Timelapse of San Francisco Clouds After Various Atmospheric Rivers

Four time lapse sequences of cool clouds over San Francisco at sunset following the January 2022 sequence of atmospheric rivers.

Add your comment...

Related Posts

You Might Also Like

(More Timelapses)

Server Migration Complete

2022

ITHCWY just migrated to a new server with more capacity. From clouds to pollution to covid monitoring there is a lot going on under the hood here these days. I can already see that search is a lot more snappy.

Man, moving Windows servers just sucks. Internal GDI error - that's a permissions issue obviously (not my first time). Can't publish? You must need to install Web Deploy twice and then uninstall it and install it one more time for luck. Web.config error at startup? Of course Windows Server 2022 still doesn't come with URL Rewriting and you need to install it from some corner of the MS website like an animal. Yes, I should use something else but my custom CMS is probably reaching the same level of complexity as an F16 and I'm just one person. Enjoy the fast search!

Add your comment...

Related Posts

You Might Also Like

(All Etc Posts)

Echo Show Me The Door

Dogs, Ability to Poop, Echo Show

I have had my Echo Show for a little over five years. That's an eternity in AI, you'd expect some amazing advances over half a decade but it's still a timer with a screen.

A timer that you can talk to while your hands are busy or dirty is actually an amazing thing, and it can switch off my Christmas lights without me having to vault a sofa and risk losing an eye. But apparently Amazon isn't making any money from it and so they're laying off staff and paring back their smart home additions.

Possibly they've been working on the wrong thing? When asking about dogs Alexa managed to put them firmly in the camp of things that can poop. ChatGPT says:

"A dog is a mammal and a common household pet, known for its loyalty and ability to be trained. It is a member of the Canidae family, which also includes wolves, coyotes, and foxes, and it is believed to have been the first domesticated animal. Dogs come in a wide variety of shapes and sizes, and they are used for a variety of purposes, such as hunting, herding, protection, and companionship."

That's just a funny bug somewhere. The real irritation is the device getting more aggressive every year. "By the way..." it says while everyone shouts at it to shut the fuck up. Worse, every couple of weeks now they push something new to the home screen. I just want to see my photos and there is now a list of 75,000 adverts to switch off first. And, they've started dropping promotions that can't be switched off in settings as well. So this thing that I bought is so aggravating now that I'll never buy another one.

It could have been different. No, I'm never going to say Alexa, buy me a printer (if you're using text to speech on this post my deepest apologies). But (just one idea) what if I could have a meal planning conversation that adds the ingredients to my shopping basket and the recipes to my home screen? I can then edit the cart before I order and have the instructions on a convenient screen while I cook and listen to a podcast.

Add your comment...

Related Posts

You Might Also Like

(All Marketing Posts)

San Francisco 311 Cases Animation

San Francisco 311 Cases Animation

This animation shows a random sample of 311 cases that have a photo and specific location. It covers July 4, 2013 to January 5, 2023. Created using the 311 dataset plotted over a street map of San Francisco.

Add your comment...

Related Posts

You Might Also Like

(All Etc Posts)

After the Storm

After the Storm

SONY ILCE-7C 20mm f1.8 1/320s ISO100

Photo of the twisted wreckage of a tree on Strawberry Hill in the middle of Stow Lake (Golden Gate Park, San Francisco).

Add your comment...

Related Posts

You Might Also Like

(Recent Photos)

Rocky Outcrop Park

Rocky Outcrop Park

SONY ILCE-7C 20mm f3.5 1/160s ISO100

Photo of Rocky Outcrop Park in San Francisco. 14th Avenue next to this urban cliff is usually a wind tunnel.

Add your comment...

Related Posts

You Might Also Like

(Recent Photos)

Reviews for January 2023

Updated on Sunday, January 29, 2023

Spoilers!

Books

Borne

Review:Books:Borne

Borne by Jeff VanderMeer is a Ballardian visit to a ruined city where biotech from a mysterious corporation has spread out and taken over. Borne is a fast growing lump of said biotech, found on a giant flying Bear by a scavenger called Rachel. Then, things get weird. Excellent book, and there is another installment called Dead Astronauts which I will get to soon(ish).

Movies

Berlin Syndrome

Australian girl falls for the wrong German guy, much captivity and violence ensues. Meh.

Emergency Declaration

Emergency Direction is a Korean film where a hemorrhagic virus is released onto a plane. It's reasonably diverting although fairly predictable. What I loved about it is that the introduction banged on about what a big deal an emergency declaration was and that it was martial law for a plane and they could land with top priority anywhere. By the time the captain declares an emergency half the passengers are dead, everyone else is infected and at least two countries had threatened to shoot them down. But being the title of the film they linger on it like it's a big deal that will change everything.

Lou

Allison Janney plays Liam Neeson rescuing a kidnapped child. I enjoyed it a lot.

The Lost City

Review:Movies:The Lost City

The Lost City with Sandra Bullock, Channing Tatum and Daniel Radcliffe is trying to bring Romancing the Stone back. It's funny.

Troll

Roar Uthaug is the undisputed master of Norwegian disaster movies. Troll is less enthralling than his waves and earthquakes and tunnels though. There is a scene with some helicopters toting church bells that is worth the price of admission (free with Netflix) but the ending is very underwhelming. Also, a lot of it reads like a criticism of a multicultural Norway what with Christianity having driven the Trolls out to start with, and the need for church bells to scare the beast away. Then that plot is dropped and a Muslim soldier is introduced like some scriptwriters were fundamentally at odds with each other. Reasonably fun though.

Music

Habits (Acoustic)

I'm listening to a lot of Tove Lo at the moment, and this acoustic version of Habits is why YouTube beats the living daylights out of any Dolby Atmos enabled streaming service.

Podcasts

Crypto Island

Crypto Island from PJ Vogt barely got started and then slowly died with months between episodes. Vogt tells a good story so it was worth a listen when it occasionally appeared but it was hard to piece together. I'm looking forward to whatever he does next, and he's apparently working on something.

The New Gurus

Review:Podcasts:The New Gurus

New Year's resolution: write more reviews. I stopped using Goodreads for anything meaningful a few years ago but had integrated it into my blog CMS to automatically post reviews. As a result these have become a little sad. I've just whipped up a new review system that doesn't use Goodreads as motivation to both write more and to review things that are not books (gasp).

The New Gurus is a BBC Podcast presented by Helen Lewis that surveys all kinds of modern woo. From bitcoin to productivity via health scams and equity training. As a skeptically inclined person I loved it and binged the whole series over a couple of days. Helen concludes that the New Gurus are mostly men mostly looking to find the right path to masculinity. I think you always need to be looking for what they're selling beyond the obvious - if it's vitamin supplements or a political candidacy the cult suddenly makes a lot more sense. With only half an hour per topic I also found myself wishing she could spend more time on each - I'd happily listen to a series on most of the subjects covered here. Highly recommended unless you're a believer in which case it will probably make you quite mad.

The Reith Lectures

Review:Podcasts:The Reith Lectures

I always enjoy The Reith Lectures, it's what the BBC is for. They also have a massive archive which is fun to dig through. This year has four speakers covering FDR's four freedoms: freedom of speech, freedom of worship, freedom from want, and freedom from fear.

Two standouts for me. Chimamanda Ngozi Adichie covered freedom of speech, and in the sense that I understand it, which is that you might get offended. An absolute barnstormer of a lecture. At the other end of the spectrum Rowan Williams mounted an incredibly poor defense for the freedom to worship. His central point seems to be that we need to prioritize minority opinions (i.e. Anglicans) in order to make progress on human issues and to the best of my recollection religion has usually been dragged to the new consensus at the end rather than the beginning. He also cloaked this intellectual dishonesty in feminism and gay rights which is frankly offensive from the former leader of a church that won't marry gays and doesn't think women can handle the job. I loved both for very different reasons.

TV

Bosch Legacy

Review:TV:Bosch Legacy

I enjoyed the Bosch series on Amazon Prime. I haven't read the actual Bosch novels (there are a lot, if I start I might ever end) but I have read the four that costar Renée Ballard and those are pretty good. So I had high expectations for this new series.

Bosch Legacy is not on Amazon Prime, it's on Amazon Freevee which Amazon somewhat confusingly describes as a 'premium free streaming service'. Sounds too good to be true, and it is. It's chock full of really bad ads. I pay all kinds of money to avoid ads but in this case you can't even buy the series on Amazon, you have to choke down those ads to watch it. It's not as clunky as Hulu, but it suffers from not having many advertisers and so by the time I've seen the same ad for the 50th time I loathe that company with a vengeance. I'll never buy their products in the future out of spite, and if I happen to have any in the house I'll use the ad break to throw them out. Also, after using Hulu once I never did again. I think Freevee is in the same category.

At least all those ads paid to bring Bosch back to the faithful, right? Well, yes and no. The core cast of Titus Welliver, Mimi Rogers, Madison Lintz are fantastic and the plot is there, but the production is cheap and shows it. Early on Bosch's nice house which must cost a few bucks to film in gets red tagged and he's forced to move into some generic office. And they can't afford many cameras or crew either, there are regular cuts that just look off (like in a conversation when switching between actors and the person is clearly not quite where they should be). Overall it was OK, but tainted by being too cheap and in a bad streaming neighborhood.

His Dark Materials Season Three

Review:TV:His Dark Materials Season Three

The BBC/HBO version of His Dark Materials wraps up with season three based on Philip Pullman's The Amber Spyglass. What a fantastic trilogy. And after the disappointing film this adaptation is a revelation. Pullman is going after the Catholic Church specifically and religion in general, but in a world of materialized souls in the form of daemons and actual angels. In the end the church is defeated and the multiverse saved by individual sacrifice more than the army assembled for the final battle. What a heartbreaking ending. Amazing.

Jack Ryan Series 3

Review:TV:Jack Ryan Series 3

Jack Ryan is back, thwarting an attempt to overthrow the Russian government. Right now that seems like it might be a good idea? Is he the baddie? This is very much Tom Clancy's and not Tom Clancy, but for Season 3 they've figured out that in a spy thriller the plot is of little consequence if it's happening with urgency against a backdrop of beautiful European cities and I'll watch that all day (or at least for a few nights). The main problem was watching this the same month as Slow Horses.

Slow Horses Season 2

Review:TV:Slow Horses Season 2

I love Mick Herron's Slough House books and was very excited that Apple adapted them for TV. The second season just dropped, and it's even better than the first. Still very faithful to the book but the characters are more comfortable and distinct. I hope they start work on the Oxford Investigations as well (a different but also good set of Herron books). With the first season I was very unconvinced by Gary Oldman as Jackson Lamb (in my head this role was played by Daniel Ryan). Oldman has managed to get a lot more Lamb-like for season 2.

The White Lotus Season Two

The second season of The White Lotus is good but not great. There is a moment when Jennifer Coolidge's Tanya McQuiod asks for an Oreo Cheesecake and it's such a devastatingly wrong thing to ask for at a hotel in Sicily but a humanizing moment at the same time. But this series is more of a murder mystery and less of a tone poem. It's just a little less lucious, and outrageous and funny and relaxing. Please stop now and don't make any more. We don't need to go to The White Lotus Rochester Parkway.

Treason

Treason on Netflix is a brief five episode double-crossing spy caper. It's a little soapy - sort of a Harlan Coben's Bond, if that Bond was stuck in London.

(All images included with ITCHWY reviews are the property of their respective owners and are used to illustrate reviews only.)

Add your comment...

Related Posts

You Might Also Like

(All Book Reviews)

Assassination Coordinates

Testing out some shapefile code with a zoom into San Francisco. This uses five different shapefiles:

Country borders are from Eric Muller's fips-10 shapefile.

States and US Counties come from the United States Census Bureau.

San Francisco 5 foot elevation contours from DataSF.

Finally the street map for San Francisco is from data.gov.

These are almost all based on different projections and I did my best to actually line everything up but if you're heading over for coffee it's probably best to stick with Google Maps.

Add your comment...

Related Posts

You Might Also Like

(All Etc Posts)