Shipping a website in a day with Generative AI

Updated on Saturday, November 18, 2023

Can you tell me a story about a shop?

It usually takes me a few weeks to get a new website up and running. Last weekend I tried an experiment with Cloudflare Pages and generative AI.

I have wanted to find an excuse to test Pages for a while. It's a pretty awesome product. I'm not doing anything too fancy with it - I have a local generator app that creates the pages for my site. Committing to the right branch in git automatically deploys to Cloudflare's edge network. It seems to do the right thing with all the file types I've thrown at it so far. My only complaint at this point is that it doesn't handle subdirectories. Everything needs to hang off the root unless you want to write some code. I think this is possible with Cloudflare Workers but that's for another day.

The generative piece is automatically writing content for review and publication. For each generated page I'm creating a prompt to write the post, and then another prompt to summarize it for meta descriptions and referencing it from other pages. I also create an embedding to use for interlinking related posts. Finally I create a third prompt to gin up an appropriate image. The site generator stitches these together into HTML and as soon as I commit, the updates are live.

The site is not yet a work of art, and there is plenty to optimize and add, but the basic thing was working in a few hours. It's all ridiculously cheap as well. I'm more than a little frightened for Google given how much of this must be going on right now. And then the next generation of LLMs will be trained on the garbage produced by the current crop.

My super rapid site is called Shop Stories, collecting / dreaming takes of ecommerce heroics. I'll report back if anyone goes there.

Add your comment...

Related Posts

(All Code Posts)

code, ml

Vernal (Spring) Equinox 2023

Vernal (Spring) Equinox 2023

Spring for the Northern Hemisphere, and Autumn south of the Equator, starts right now - 21:25 UTC on March 20, 2023. The image above shows the exact moment of the equinox in Catfood Earth.

Add your comment...

Related Posts

(All Code Posts)

(Published to the Fediverse as: Vernal (Spring) Equinox 2023 #code #earth #equinox #spring #autumn #vernal Catfood Earth render of the exact moment of the Spring Equinox for 2023 (21:25 UTC on March 20, 2023). )

Predicting when fog will flow through the Golden Gate using ML.NET

Predicting when fog will flow through the Golden Gate using ML.NET

I'd like to make a time lapse of the moment when fog enters the Golden Gate and flows under the Golden Gate Bridge. It's surprisingly hard to know when conditions will be just right though. Often the weather is pleasant at my house while the fog is sneaking through and there is very little chance of me checking a webcam or satellite image. I decided to fix this about a year ago and started collecting data. The best bet seemed to be GOES-West CONUS - Band 2 which is a high resolution daylight satellite image that shows clouds and fog. I put together a Google Apps Script project to save an hourly snapshot and left if running. Here's a video of the data so far, zoomed in for a HD aspect ratio and scaled up a bit:

It's pretty obvious to me when conditions are just right. Could an ML model learn that this was about to happen from an image that was three hours older?

The first step was dividing thousands of images into two classes - frames where the fog would be perfect in three hours and frames where this was not going to happen. I built a little WPF tool to label the data (I don't use this often these days and every time I do I marvel at how the Image control has defaults that won't show the image FFS). This had the potential to be tedious so I built in some heuristics to flag likely candidates and then knocked out the false positives. Because the satellite images include clouds there is often white in the Golden Gate that is cloud cover rather than fog. At the end of the process I had two subfolders full of images to work with.

My goal this weekend was to get something working, and then refine every few months as I get more data. Right now I have 18 images that are in the Fog class and 7,539 that are NoFog. I also wanted this running on my blog, which is .NET 4.8 and will stay that way until I get a couple of weeks of forced bed rest. ML.NET says that it's based on .NET Standard and so should run anywhere.

Having local automl is very cool once you get it working. For large datasets this might not be a great option, but not having to wrangle with the cloud was also very appealing for this project.

Getting GPU training configured involved many gigabytes of installs. Get the latest Visual Studio 2022. Get the latest ML.NET model builder. Sign up for an NVIDIA developer account and install terrifyingly old and specific versions of CUDA and cuDNN. This last part was the worst because the CUDA installer wanted to downgrade my graphics driver, warned directly that this would cause problems and then claimed that it couldn't find a supported version of Visual Studio. I nervously unchecked everything that was already installed, and so far model builder has run fine and I don't seem to have caused any driver problems.

For image classification settings you can choose micro-accuracy (the default), macro-accuracy, logarithmic loss, or logarithmic loss reduction. Micro-accuracy is based on the contribution of all classes and unsurprisingly it's useless in this case as just predicting 'no' works very well overall. Maco-accuracy is the average of the accuracy of each class and this produced reasonable results for me. Possibly too good, I probably have some overfitting and will spend some time on that soon.

After training the model builder has an evaluate tab which is pretty worthless, at least for this model/case. You can spot check the prediction for specific images, and then there is one overall number for the performance of the model. I'm used to looking at precision and recall and it looks like I'll have to spend some time building separate tooling to do this. Hopefully this will improve in future versions.

At this point I have a .NET 6 console application that can make plausible looking predictions. Overall I'm very impressed with how easy it was to get this far.

Integrating with my blog though was very sad. After a lot of NuGet'ing and Googling I came to realize that ML.NET will not play nice with .NET 4.8, at least for image classification. Having dared to anger the NuGet gods I did a git reset --hard and called out to a new .NET 6 process to handle the classification. For my application I'm only running the prediction once per hour so I'm not bothered by performance. That .NET Standard claim proved to be unhelpful and I could have used just about anything.

The model is now running hourly. I have put up a dedicated page, Golden Gate Fog Prediction, with the latest forecast and plan to improve this over time. If this would be a useful tool for you please leave a comment below (right now it emails me when there is a positive prediction, it could potentially email a list of people).

Updated 2023-03-12 23:24:

After building some tooling to quantify this first model I have some hard metrics to add. Precision is 23%. This means there is a high rate of false positives. Recall is 78%. This means that when there really is fog the model does a pretty good job of predicting it. Overall the f1 score is 35% which is not great. In practice the model doesn't miss the condition I'm trying to detect often but it will send you out only to be disappointed most of the time. I'm not that surprised given how few positive cases I had to work with so far. My next steps are collecting more training data and looking more carefully at the labeling process to make sure I'm not missing some reasonable positive cases.

Add your comment...

Related Posts

(All Code Posts)

(Published to the Fediverse as: Predicting when fog will flow through the Golden Gate using ML.NET #code #video #ml #fog Using Microsoft's AutoML in ML.NET to build an image classifier that predicts fog flowing under the Golden Gate Bridge. )

Catfood.Shapefile 2.00

Updated on Sunday, March 12, 2023

I just released Catfood.Shapefile 2.00, my .NET parser for ESRI Shapefiles.

The big change is that I have migrated to .NET Standard 2.0. This makes it possible to use from .NET Core as well as classic .NET Framework from 4.6.1 up. If you need to use an older version of .NET Framework then you'll want to stick with Catfood.Shapefile 1.60.

Catfood.Shapefile is now also available via NuGet. This is the recommended way to install. The source code is still available on GitHub.

Add your comment...

Related Posts

(All Code Posts)

Upgrading from word2vec to OpenAI

[1536]

In 2018 I upgraded the related posts functionality on this blog to use word2vec. This was hacked together by averaging the vectors for interesting words in each post together and then looking for the closest vectors. It worked quite well, but the state of the art has moved on just a little bit since then.

OpenAI has an embeddings API and recently released a cheaper model called text-embedding-ada-002. The vectors have 1,536 dimensions, a pretty significant increase from the 300 I was using with word2vec. Creating vectors for all my posts took a few minutes and cost $0.11 which is pretty affordable. As you'd expect those related posts are now significantly more related and useful. Thanks OpenAI!

I shared some code previously for the word2vec hack. This is a lot more straightforward - call the API with the post text and then compare the vectors with cosine distance to find the most related. It works well for search too.

Add your comment...

Related Posts

(All Code Posts)

(Published to the Fediverse as: Upgrading from word2vec to OpenAI #code #ml #openai #ithcwy #word2vec Using the Open AI embeddings API to find better related posts for a blog. )

Migrating a C# Integration from GA3 to GA4

Updated on Saturday, May 6, 2023

GA4

This blog has a couple of Google Analytics integrations - the popular posts list is pulled from GA, and the unnecessarily accurate count of non visitors in the footer. I just migrated from the GA3 API to the GA4 API. The backend for this blog is ASP.NET MVC with .NET 4.8. One day I might catch up with the cool kids and try to get on .NET Core, but not today.

Here's where I stubbed my toe:

I'm following this code sample to make my first GA4 call. After installing the NuGet package I couldn't find BetaAnalyticsDataClient anywhere. It turns out that there is a Google.Apis.AnalyticsData.v1beta package and a Google.Analytics.Data.V1Beta which is only available if you check 'Include prerelease' when searching. You want the second one. I'm not in love with BetaAnalyticsDataClient as a class name, it suggests all sorts of breaking changes are coming. My GA3 integration has ticked over for years with no changes. Maybe GA4 is going to be more like Google Ads and shank you with breaking changes every few months. Moving on...

Wow the error messages are good. Kudos to the API team. I'm so used to cryptic bullshit but this API tells you what you're doing wrong and sends back helpful pointers and even URLs. Every API should be this friendly. I got through the remaining problems fairly quickly because of this.

The code sample passes the property ID as 'property/nnnnnnn' but the API is expecting 'properties/nnnnnnn'.

I'd been using a ServiceAccountCredential created from a .p12 file for GA3. This doesn't seem to be supported for BetaAnalyticsDataClient but I was able to generate a new credential with a .json serialization of the credentials and passing this to BetaAnalyticsDataClient worked fine. I had a permission denied error, this was because I hadn't added the service account email address to the property and doing so got me some data.

The client library is pretty classy (as in too many classes). Creating a filter to exclude internal users involves four nested classes - a FilterExpression that has another FilterExpression for a not condition and then this needs a Filter and the Filter needs a StringFilter. Tedious. And including enums for metrics and dimensions is too much trouble so adding those now requires Metric and Dimension classes but these are just initialized with a string. The list is here.

Lastly when it comes to running the thing the site won't start and says:

"CS0012: The type 'System.Object' is defined in an assembly that is not referenced. You must add a reference to assembly 'netstandard, Version=2.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51'."

Presumably due to some NuGet horror or other. Adding that reference indeed fixes the problem and hopefully doesn't create a new one.

I am now technically if not emotionally prepared for GA3 to be switched off.

Add your comment...

Related Posts

(All Code Posts)

(Published to the Fediverse as: Migrating a C# Integration from GA3 to GA4 #code #ga4 #ithcwy #c# Some tips on avoiding pitfalls when migrating a C# application from Google Analytics 3 to Google Analytics 4. )

Winter Solstice 2022

Winter Solstice 2022

Winter Solstice 2022 (December 21, 2022 at 21:48 UTC) as rendered in Catfood Earth. Winter starts right now in the Northern Hemisphere, Summer if you happen to be south of the Equator.

Add your comment...

Related Posts

(All Code Posts)

(Published to the Fediverse as: Winter Solstice 2022 #code #winter #solstice #catfood #earth The exact moment (2022-12-21 21:48 UTC) of Winter Solstice 2022 rendered in Catfood Earth. )

Catfood WebCamSaver 3.30

Updated on Sunday, January 22, 2023

Catfood WebCamSaver 3.30

Catfood WebCamSaver 3.30 is now available to download. This release contains the latest webcam updates.

Add your comment...

Related Posts

(All Code Posts)

Send event parameters with every event and multiple tags in Google Analytics 4

gtag

There are some event parameters that are useful to send with every event. Google has a helpful guide here which even covers the case where you have multiple tags (I'm running GA4 and UA during the migration and this isn't unusual). You're supposed to call gtag set before calling config on each tag. This isn't working for me though, I see nothing coming through in the debug view.

Calling set before config works fine for user properties (my journey of discovery yesterday) but unless I was doing something stupid that I haven't seen yet no dice for event parameters. The code above uses the config method of initialization with a shared object to prevent duplicating code. This seems to work fine.

Add your comment...

Related Posts

(All Code Posts)

(Published to the Fediverse as: Send event parameters with every event and multiple tags in Google Analytics 4 #code #ga4 How to send common event parameters with every event in Google Analytics 4 )

User scoped custom dimensions in Google Analytics 4 using gtag

A user parameter in the GA4 debug view

Given that everything in Google Analytics 4 is an event I expected user scoped custom dimensions to be a regular event parameter as well. They're not. And most of the documentation both from Google and others talks about Google Tag Manager which doesn't help if you're just using gtag. It's not that hard to implement, but figuring out all the pieces was way harder than it should be. I hope this helps the next person...

To create a user scoped custom dimension you need a user property. This is different from an event property. In gtag you need to call set on user_properties with an object containing the user properties to set like this:

If you want the user property on the hit sent with page load do this right before the config call when loading your gtag (you can do it as part of the config call if you're just setting up one tag, or as above before the config call which is helpful if using more than one tag). If you don't know the value of the user property at page load time you can also set it and then send an event.

From the client you can inspect the beacon sent to GA4 for an event, for example:

Event parameters get a prefix of ep or epn. In this case I'm using a parameter rc_score set to 0.9 which appears in the beacon as epn.rc_score=0.9. GA auto-detects numeric values and uses epn for these. The user parameter gets a up prefix and in this case is up.user_quality=low. (In this specific case I'm sending a recapture score as both a custom metric and a user scoped dimension so I can segment out high and low quality users, at least from the perspective of recapture).

More visually you can use the debug view in the configure section of GA4 (why there are settings in the reporting interface as well as the settings interface I have no idea). To use this pass { 'debug_mode':true } to your gtag config call. To confirm that you're getting user properties look for an orange icon in the timeline (see screenshot at the top of this post). There is also a helpful user properties active now box at the bottom right of this screen.

Once you have this working you still need to wire it up in GA4. Wait 24 hours... then go to Configure -> Custom definitions. Add a new custom dimension, make sure you pick user scoped and you should then be able to select the user property to use to populate the dimension.

Add your comment...

Related Posts

(All Code Posts)

(Published to the Fediverse as: User scoped custom dimensions in Google Analytics 4 using gtag #code #ga4 Step by step guide for sending user scoped custom dimensions to Google Analytics 4 via gtag. )