Predicting when fog will flow through the Golden Gate using ML.NET

Predicting when fog will flow through the Golden Gate using ML.NET

I'd like to make a time lapse of the moment when fog enters the Golden Gate and flows under the Golden Gate Bridge. It's surprisingly hard to know when conditions will be just right though. Often the weather is pleasant at my house while the fog is sneaking through and there is very little chance of me checking a webcam or satellite image. I decided to fix this about a year ago and started collecting data. The best bet seemed to be GOES-West CONUS - Band 2 which is a high resolution daylight satellite image that shows clouds and fog. I put together a Google Apps Script project to save an hourly snapshot and left if running. Here's a video of the data so far, zoomed in for a HD aspect ratio and scaled up a bit:

It's pretty obvious to me when conditions are just right. Could an ML model learn that this was about to happen from an image that was three hours older?

The first step was dividing thousands of images into two classes - frames where the fog would be perfect in three hours and frames where this was not going to happen. I built a little WPF tool to label the data (I don't use this often these days and every time I do I marvel at how the Image control has defaults that won't show the image FFS). This had the potential to be tedious so I built in some heuristics to flag likely candidates and then knocked out the false positives. Because the satellite images include clouds there is often white in the Golden Gate that is cloud cover rather than fog. At the end of the process I had two subfolders full of images to work with.

My goal this weekend was to get something working, and then refine every few months as I get more data. Right now I have 18 images that are in the Fog class and 7,539 that are NoFog. I also wanted this running on my blog, which is .NET 4.8 and will stay that way until I get a couple of weeks of forced bed rest. ML.NET says that it's based on .NET Standard and so should run anywhere.

Having local automl is very cool once you get it working. For large datasets this might not be a great option, but not having to wrangle with the cloud was also very appealing for this project.

Getting GPU training configured involved many gigabytes of installs. Get the latest Visual Studio 2022. Get the latest ML.NET model builder. Sign up for an NVIDIA developer account and install terrifyingly old and specific versions of CUDA and cuDNN. This last part was the worst because the CUDA installer wanted to downgrade my graphics driver, warned directly that this would cause problems and then claimed that it couldn't find a supported version of Visual Studio. I nervously unchecked everything that was already installed, and so far model builder has run fine and I don't seem to have caused any driver problems.

For image classification settings you can choose micro-accuracy (the default), macro-accuracy, logarithmic loss, or logarithmic loss reduction. Micro-accuracy is based on the contribution of all classes and unsurprisingly it's useless in this case as just predicting 'no' works very well overall. Maco-accuracy is the average of the accuracy of each class and this produced reasonable results for me. Possibly too good, I probably have some overfitting and will spend some time on that soon.

After training the model builder has an evaluate tab which is pretty worthless, at least for this model/case. You can spot check the prediction for specific images, and then there is one overall number for the performance of the model. I'm used to looking at precision and recall and it looks like I'll have to spend some time building separate tooling to do this. Hopefully this will improve in future versions.

At this point I have a .NET 6 console application that can make plausible looking predictions. Overall I'm very impressed with how easy it was to get this far.

Integrating with my blog though was very sad. After a lot of NuGet'ing and Googling I came to realize that ML.NET will not play nice with .NET 4.8, at least for image classification. Having dared to anger the NuGet gods I did a git reset --hard and called out to a new .NET 6 process to handle the classification. For my application I'm only running the prediction once per hour so I'm not bothered by performance. That .NET Standard claim proved to be unhelpful and I could have used just about anything.

The model is now running hourly. I have put up a dedicated page, Golden Gate Fog Prediction, with the latest forecast and plan to improve this over time. If this would be a useful tool for you please leave a comment below (right now it emails me when there is a positive prediction, it could potentially email a list of people).

Updated 2023-03-12 23:24:

After building some tooling to quantify this first model I have some hard metrics to add. Precision is 23%. This means there is a high rate of false positives. Recall is 78%. This means that when there really is fog the model does a pretty good job of predicting it. Overall the f1 score is 35% which is not great. In practice the model doesn't miss the condition I'm trying to detect often but it will send you out only to be disappointed most of the time. I'm not that surprised given how few positive cases I had to work with so far. My next steps are collecting more training data and looking more carefully at the labeling process to make sure I'm not missing some reasonable positive cases.

Add your comment...

Related Posts

(All Code Posts)

(Published to the Fediverse as: Predicting when fog will flow through the Golden Gate using ML.NET #code #video #ml #fog Using Microsoft's AutoML in ML.NET to build an image classifier that predicts fog flowing under the Golden Gate Bridge. )

Links for March 2023

58 Birds

Picture of 58 birds on some power lines recently.

Google Pixel 6 Pro 19mm f3.5 1/230s ISO33

Photo of 58 birds gathered on some power lines.

Add your comment...

Related Posts

(Recent Photos)

(Published to the Fediverse as: 58 Birds #photo #birds 58 Birds )

World Webcams 2

This is a 4k sequel to the World Time Lapse movie I made many years ago. It also uses webcams from the Catfood WebCamSaver database. I used a Google Apps Script project to save frames, upscaled to 4K using Topaz Gigapixel AI, turned the upscaled frames into movies with ffmpeg, and finally edited the highlights together with DaVinci Resolve.

Add your comment...

Related Posts

(More Timelapses)

(Published to the Fediverse as: World Webcams 2 #timelapse #video #catfood #webcamsaver #4k A 4k time lapse of many different webcams around the world (from the Catfood WebCamSaver database). )

ITHCWY Newsletter for February 2023

GSC Monitor

Timelapse of some great clouds after the January storms in California.

Here's an animation of ten years of San Francisco 311 cases using photos and locations.

I've started a new more comprehensive review format that includes TV, Movies and Podcasts. Check out January and February.

My Echo Show is driving me nuts.

If you're into reading ESRI Shapefiles in .NET my library has migrated to .NET Standard and is now on NuGet. Read more. And enjoy this shapefile based zoom to my neighborhood.

Hikes to Phantom Falls and Shell Ridge Open Space.

Related posts are now way more related thanks to moving from Word2Vec to OpenAI embeddings. I've also been reluctantly moving to Google Analytics 4. Here are some API tips.

Analysis of OpenAI's risible blog post on how close they are to creating artificial general intelligence (AGI).

Previously:

Links for February 2023

Add your comment...

Related Posts

(All Etc Posts)

OpenAGI, or why we shouldn't trust Open AI to protect us from the Singularity

OpenAGI, or why we shouldn't trust Open AI to protect us from the Singularity

Open AI just dropped a pretty remarkable blog post on their roadmap for not destroying civilization with their imminent artificial general intelligence (AGI):

"As our systems get closer to AGI, we are becoming increasingly cautious with the creation and deployment of our models. Our decisions will require much more caution than society usually applies to new technologies, and more caution than many users would like."

Now, I'm around 98% sure that Open AI mostly answers the question: What if we allocated unlimited resources to building a better auto-complete? ChatGPT is an amazing tool but it's amazing at guessing which word (token) is likely to appear next. Quite possibly their blog post is just an exercise in anchoring - if they're 95% of the way to AGI then GPT4 must be pretty amazing and therefore worth a lot of money. If everyone realized that they're more like 2% of the way there, and the next 1% is going to be exponentially difficult, then some of the froth would blow off.

But what if they really are close to the singularity? After all, we have no idea what causes non-artificial intelligence.

Their ideas for keeping us safe are a little disturbing:

"We think public standards about when an AGI effort should stop a training run, decide a model is safe to release, or pull a model from production use are important."

Given the lack of transparency around the inner workings of ML models, and the lack of knowledge around what intelligence even looks like, this is a pretty risible idea. And:

"Finally, we think it’s important that major world governments have insight about training runs above a certain scale."

We are facing down the prospect of a second Trump term while the UK has a Prime Minister who thinks that a homeless person might be 'in business'.

The most concerning part for me is:

"...we hope for a global conversation about three key questions: how to govern these systems, how to fairly distribute the benefits they generate, and how to fairly share access."

Creating AGI would be an amazing and terrifying accomplishment. Treating it as a slave feels like the most surefire way to usher in the most terrifying possible consequences, for us and for the AGIs.

Full disclosure: I use Open AI embeddings for related posts and site search. The words on this blog are my own though. I do occasionally generate a post image using Stable Diffusion like the rather strange one above.

Add your comment...

Related Posts

(All Etc Posts)

(Published to the Fediverse as: OpenAGI, or why we shouldn't trust Open AI to protect us from the Singularity #etc #openai #ml What OpenAI got wrong in their blog post on AGI and how we should treat AGIs if they ever arrive. )

Phantom Falls

Phantom Falls

Phantom Falls

Phantom Falls

We did this four mile out and back hike during a small gap between record breaking winter storms. There was snow, fog, mud, many ordinary cows and one very furious one. The first waterfall, Ravine Falls, was very pretty and the eponymous Phantom Falls lived up to its name. Will try again sometime. This is close to Oroville, California in the North Table Mountain Ecological Reserve.

Hike starts at: 39.595612, -121.541953. View in Google Earth.

Add your comment...

Related Posts

(Hike Map)

(Published to the Fediverse as: Phantom Falls #hike #waterfall #map Four mile out and back hike to Ravine Falls and Phantom Falls in the North Table Mountain Ecological Reserve near Oroville, California. )

Links for February 2023

Short Shell Ridge Open Space Loop

Short Shell Ridge Open Space Loop

A two mile Shell Ridge Open Space loop in Walnut Creek. This follows Fossil Hill Trail to Indian Creek Trail, then loops back via the Briones to Mount Diablo Regional Trail and finally Jeep Trail. The Jeep Trail portion was to avoid getting the dog wet and it failed miserably thanks to a hidden muddy wallow.

Hike starts at: 37.894476, -122.030699. View in Google Earth.

Add your comment...

Related Posts

(Hike Map)

(Published to the Fediverse as: Short Shell Ridge Open Space Loop #hike #dogwalk #walnutcreek #shellridge #map An easy two mile loop in Shell Ridge Open Space, Walnut Creek. Suitable for dogs. )

Reviews for February 2023

By Robert Ellison. Updated on Monday, February 27, 2023.

Spoilers!

Books

A Hundred Billion Ghosts

Review:Books:A Hundred Billion Ghosts

One day, for no reason, all the ghosts suddenly become visible. Hijinks, romance and too much detail about breakfast cereal ensue. Fun.

Children of Memory

This is the third book in Adrian Tchaikovsky's Children of Time space opera series. Children of Time is about spiders becoming intelligent, Children of Ruin adds intelligent octopuses, a simulated human that runs on ants, and some kind of brain occupying bacteria which all together substantially expand the Children of... universe. Children of Memory stops things down a bit an focuses on a single colony gone awry, being investigated by an Octopus, the ant-simulation (an instance on more conventional hardware, well eventually more than one instance), a couple of spiders, a super-bacteria host and some possibly sentient crows. I think that's roughly it, it's been a while since I started reading these books and my memory might be mixing up a few plot points by now. The crows are really interesting - probably not sentient individually but in pairs they probably are. This is quite a poke at the state of modern AI and maybe a spin on Searle's Room (which I hate). It's a thought provoking and sometimes exposition heavy look at the nature of consciousness taking place against the backdrop of a holodeck episode of Star Trek given a movie budget. It's a very unique series and I'd like another one please!

Movies

White Noise

Review:Movies:White Noise

I haven't read the book this is based on but maybe I should add it to my list. The movie is a gorgeously renditioned 80's meditation on the fear of death. It's mostly funny or willfully strange but has a touch of disaster movie in the mix as well. I quite enjoyed it.

Podcasts

Burn Wild

Review:Podcasts:Burn Wild

Burn Wild sets out to answer the question "How far is too far to go to save the planet?" and I nearly gave up halfway through as it seemed to be giving too much credence to the idea that as long as nobody is hurt, it's OK to blow shit up. In the end though it doesn't let anyone off the hook, and several people come to realize that they did no good and may even have done some harm. Worthwhile.

The Missing Cryptoqueen

Review:Podcasts:The Missing Cryptoqueen

I'd never heard about Dr Ruja Ignatova or OneCoin before listening to this podcast from Jamie Bartlett that makes the case that it's a ponzi scheme rather than a revolutionary crypto currency. She has been in the news very recently and so I'm expecting another episode or two soon.

Although, to be fair, crypto has always bothered me as a way to describe Bitcoin and other blockchain technologies. Crypto means hidden or secret. Blockchains by their nature are very open and not secret - that's kind of the whole point. A ledger that is not distributed is a database, like what a bank uses (or a ponzi scheme). So maybe she has a point.

TV

Black Summer Season 1

Review:TV:Black Summer Season 1

My favorite part of any apocalypse drama are the moments when civilization is just starting to fall apart. Like some anti-Gibson, the end of the world is here, it's just not evenly distributed. Black Summer just goes straight to the biting. No foreplay.

Why make yet another zombie show? Couldn't it be Mormans trying to fill the book of the dead a bit faster? Or environmentalists trying to slim the population down by seven billion people and you get infected by podcasts? Netflix, call me.

I loved early Walking Dead, and I think it was mostly the dialog. Those scenes where the cast is strolling through Georgia, on the way to some salvation that won't pan out and shooting the shit along the way. Black Summer probably has five lines of dialog in total. As if it realizes it has nothing to say the runtime reduces with each episode - 44 mins at the start, halved to 22 mins by the finale.

It's tense though so I'll probably still watch season 2.

Dark Summer Season 2

I tried. It's now winter (get it, the sequel to summer) and there seems to be a completely different cast. I wasn't that invested in the original lot, but I certainly don't even begin to care about these people. It keeps everything that made season 1 miserable but drops the tension. I made it to halfway through the second episode and then bailed.

(All images included with ITHCWY reviews are the property of their respective owners and are used to illustrate reviews only.)

Add your comment...

Related Posts

(All Reviews)