Predicting when fog will flow through the Golden Gate using ML.NET
I'd like to make a time lapse of the moment when fog enters the Golden Gate and flows under the Golden Gate Bridge. It's surprisingly hard to know when conditions will be just right though. Often the weather is pleasant at my house while the fog is sneaking through and there is very little chance of me checking a webcam or satellite image. I decided to fix this about a year ago and started collecting data. The best bet seemed to be GOES-West CONUS - Band 2 which is a high resolution daylight satellite image that shows clouds and fog. I put together a Google Apps Script project to save an hourly snapshot and left if running. Here's a video of the data so far, zoomed in for a HD aspect ratio and scaled up a bit:
It's pretty obvious to me when conditions are just right. Could an ML model learn that this was about to happen from an image that was three hours older?
The first step was dividing thousands of images into two classes - frames where the fog would be perfect in three hours and frames where this was not going to happen. I built a little WPF tool to label the data (I don't use this often these days and every time I do I marvel at how the Image control has defaults that won't show the image FFS). This had the potential to be tedious so I built in some heuristics to flag likely candidates and then knocked out the false positives. Because the satellite images include clouds there is often white in the Golden Gate that is cloud cover rather than fog. At the end of the process I had two subfolders full of images to work with.
My goal this weekend was to get something working, and then refine every few months as I get more data. Right now I have 18 images that are in the Fog class and 7,539 that are NoFog. I also wanted this running on my blog, which is .NET 4.8 and will stay that way until I get a couple of weeks of forced bed rest. ML.NET says that it's based on .NET Standard and so should run anywhere.
Having local automl is very cool once you get it working. For large datasets this might not be a great option, but not having to wrangle with the cloud was also very appealing for this project.
Getting GPU training configured involved many gigabytes of installs. Get the latest Visual Studio 2022. Get the latest ML.NET model builder. Sign up for an NVIDIA developer account and install terrifyingly old and specific versions of CUDA and cuDNN. This last part was the worst because the CUDA installer wanted to downgrade my graphics driver, warned directly that this would cause problems and then claimed that it couldn't find a supported version of Visual Studio. I nervously unchecked everything that was already installed, and so far model builder has run fine and I don't seem to have caused any driver problems.
For image classification settings you can choose micro-accuracy (the default), macro-accuracy, logarithmic loss, or logarithmic loss reduction. Micro-accuracy is based on the contribution of all classes and unsurprisingly it's useless in this case as just predicting 'no' works very well overall. Maco-accuracy is the average of the accuracy of each class and this produced reasonable results for me. Possibly too good, I probably have some overfitting and will spend some time on that soon.
After training the model builder has an evaluate tab which is pretty worthless, at least for this model/case. You can spot check the prediction for specific images, and then there is one overall number for the performance of the model. I'm used to looking at precision and recall and it looks like I'll have to spend some time building separate tooling to do this. Hopefully this will improve in future versions.
At this point I have a .NET 6 console application that can make plausible looking predictions. Overall I'm very impressed with how easy it was to get this far.
Integrating with my blog though was very sad. After a lot of NuGet'ing and Googling I came to realize that ML.NET will not play nice with .NET 4.8, at least for image classification. Having dared to anger the NuGet gods I did a git reset --hard and called out to a new .NET 6 process to handle the classification. For my application I'm only running the prediction once per hour so I'm not bothered by performance. That .NET Standard claim proved to be unhelpful and I could have used just about anything.
The model is now running hourly. I have put up a dedicated page, Golden Gate Fog Prediction, with the latest forecast and plan to improve this over time. If this would be a useful tool for you please leave a comment below (right now it emails me when there is a positive prediction, it could potentially email a list of people).
Updated 2023-03-12 23:24:
After building some tooling to quantify this first model I have some hard metrics to add. Precision is 23%. This means there is a high rate of false positives. Recall is 78%. This means that when there really is fog the model does a pretty good job of predicting it. Overall the f1 score is 35% which is not great. In practice the model doesn't miss the condition I'm trying to detect often but it will send you out only to be disappointed most of the time. I'm not that surprised given how few positive cases I had to work with so far. My next steps are collecting more training data and looking more carefully at the labeling process to make sure I'm not missing some reasonable positive cases.
All comments are moderated. Your email address is used to display a Gravatar and optionally for notification of new comments and to sign up for the newsletter.