Monitor page index status with Google Sheets, Apps Script and the Google Search Console API

Updated on Friday, May 20, 2022

GSC Monitor

Overview

Google just released URL inspection as part of the Search Console API. I check for issues periodically in Search Console but it would be great to just get an email when an issue crops up. The Apps Script project below does just that by monitoring URLs from your sitemap for changes and sending an email whenever anything is detected. The Search Console API has a limit of 2,000 calls per day and Apps Script also imposes a time limit on scripts. The approach I take below assigns a random day of the week to each URL to limit the number checked on each run. Depending on the size of your site you may want to remove this check (or go the other way if you have a large number of URLs to monitor). Follow the steps below to get your own monitoring spreadsheet up and running.

Google Sheet

Create a new spreadsheet in Google Drive and call it anything you want. Rename an empty sheet to 'gsc', this sheet will store the index data. You don't need to make any other changes to this sheet.

Choose Apps Script from the Extensions menu. This will open up the script editor for your spreadsheet. I find that sometimes the editor opens with the wrong account if you're signed into more than one. If that applies to you, check quickly to make sure the right account is selected. With Code.gs selected copy and paste the script below replacing the default function:

There are a few configuration variables to enter at the top. AlertEmail is the email address to notify when index status changes are detected. SitemapUrl is the full URL of your sitemap. The current implementation does not support sitemap index files, this needs to be a regular sitemap containing URLs. SearchConsoleProperty should be the URL of the site to monitor, or sc-domain: followed by the domain for domain properties (for this site sc-domain:ithoughthecamewithyou.com).

Click Project Settings (the cog in the left hand menu of the script editor) and copy the Script ID. Make a note of this for later.

API Console

Next we need to configure the Search Console API in the Google Cloud Platform Console.

  1. Create  a new project (click the drop down to the right of 'Google Cloud Platform Console' and then New Project). Pick any name you like. 
  2. Once the project is created, find APIs and Services in the left hand menu and choose Library.
  3. Search for Search Console, click on the Search Console API and then click Enable. 
  4. A new screen will load, click Credentials in the left hand menu. 
  5. Click Configure Consent Screen and choose the internal type.
  6. Fill in the required fields - application name and contact emails. 
  7. Add the ./auth/webmasters.readonly scope for readonly access to Search Console data.
  8. Once the consent screen is complete click Credentials in the left hand menu again.
  9. Click Create Credentials at the top and choose OAuth Client ID. 
  10. Choose Web Application.
  11. Add https://script.google.com/macros/d/{SCRIPTID}/usercallback to authorized redirect URLs, replacing {SCRIPTID} including the brackets with the Apps Script ID you noted above. 
  12. Make a note of the Project ID.

Complete the Apps Script

Return to the Apps Script project and find the settings page. Set the GCP project to the project ID you noted above. Also on this page check the Show "appsscript.json" manifest file in editor option.

Go to the code editor and open appsscript.json. Add the following line:

"oauthScopes": ["https://www.googleapis.com/auth/script.external_request", "https://www.googleapis.com/auth/webmasters.readonly", "https://www.googleapis.com/auth/spreadsheets.currentonly", "https://www.googleapis.com/auth/script.send_mail"],

Make sure the script is saved and close the script editor window. Reload the spreadsheet. Once the reload completes you should have a Search Console menu at the top of the spreadsheet. Choose Update Data from the Search Console menu. This will run for a few minutes and then populate the 'gsc' sheet with your URL data. See UrlInspectionResult in the API documentation for more information about the meaning of each field. You should also get a lengthy email with a notification for each URL that was inspected. This will continue for the first week and then you'll only get updates for interesting status changes.

Scheduling

Now the project is working, open the script editor again (Apps Script from the Extensions menu) and open Triggers (the clock icon in the left hand menu). Click Add Trigger at the very bottom right of the window. Select runUpdate as the function to run. Change event source to time-driven, and then select day timer and an hour to run the script. Lastly click Save. Your Google Search Console monitor will now run every day, and if the index status of a page changes you'll get an email about it within a week. The spreadsheet will also come in handy for other analysis and reporting.

Troubleshooting

You might need to tweak the script if you start hitting limits. The URL inspection API currently has a 2,000 call / day quota. Apps Script will only run for around 7 mins on a free account and 20 mins if you have Google Workspace. If either of these limits apply you could modify the 'checkDay' logic to use day of the month (or year, or ...) to reduce the number of URLs inspected on each run. If you need to do this remember to update the Check Day column on the 'gsc' sheet as well.

The script assumes that the URLs you want to monitor are in your sitemap. If this is not the case you can add URLs to the sheet directly. As long as they are part of the configured property you will still get results. If you use this method you might want to comment out the updatePagesFromSitemap() call in runUpdate() to save time.

If anything else goes wrong please leave a comment below and I'll do my best to help you.

Updated 2022-02-08 17:40:

Monitor page index status with Google Sheets, Apps Script and the Google Search Console API

After a couple of days I have a full dump of my sitemap from the page index status API. I wrote this script for the alerting possibilities but couldn't resist some analytics once the dataset was complete.

The chart above shows sessions vs days since the last Google Crawl. Pretty stark - Google keeps a close eye on the pages it sends traffic to and not so much on the others.

I set lastmod honestly and there is good news here. I could only find two cases where Google had not crawled the page since the last modified date. So when the sitemap says a page has changed the odds are good that it will get another crack at the index. The two exceptions are unusual posts that are updated hourly and weekly respectively and both have been crawled recently.

The breakdown of index status matches Google Search Console pretty well but I have a handful of pages that are 'Indexed, not submitted in sitemap', even though they are in the sitemap and no such status is shown on Search Console. I don't know if this is a glitch in the index status API or something to do with how the pages were discovered. Some light searching suggests that this message is usually what you would expect it to be.

Lastly, updating the sheet for my site is more bound by script execution time than the API limits. I changed it to run every hour and instead of partitioning by day of week I used a random hour of the day which means I check every URL at least once every 24 hours.

Updated 2022-04-24 10:50:

I just updated the code and post above. I've had occasional issues where updating the sheet failed which caused the next run to go back to the beginning with no saved index status. To reduce the chance of this happening I've added some retry logic and also improved the speed of the sheet load and save functions. I also had a comment on the code that suggested an easier way to handle OAuth and have incorporated this in the new version.

More Google Apps Script Projects

(All Code Posts)

How to get SEO credit for Facebook Comments (the missing manual)

Updated on Sunday, September 30, 2018

How to get SEO credit for Facebook Comments (the missing manual)

I've been using the Facebook Comments Box on this blog since I parted ways with Disqus. One issue with the Facebook system is that you won't get SEO credit for comments displayed in an iframe. They have an API to retrieve comments but the documentation is pretty light and so here are three critical tips to get it working.

The first thing to know is that comments can be nested. Once you've got a list of comments to enumerate through you need to check each comment to see if it has it's own list of comments and so on. This is pretty easy to handle.

The second thing is that the first page of JSON returned from the API is totally different from the other pages. This is crazy and can bite you if you don't test it thoroughly. For https://developers.facebook.com/docs/reference/plugins/comments/ the first page is https://graph.facebook.com/comments/?ids=https://developers.facebook.com/docs/reference/plugins/comments/. The second page is embedded at the bottom of the first page and is currently https://graph.facebook.com/10150360250580608/comments?limit=25&offset=25&__after_id=10150360250580608_28167854 (if that link is broken check the first page for a new one). The path to the comment list is "https://developers.facebook.com/docs/reference/plugins/comments/" -> "comments" -> "data" on the first page and just "data" on the second. So you need to handle both formats as well as the URL being included as the root object on the first page. Don't know why this would be the case, just need to handle it.

Last but not least you want to include the comments in a way that can be indexed by search engines but not visible to regular site visitors. I've found that including the SEO list in the tag does the trick, i.e.

I've included the source code for an ASP.NET user control below - this is the code I'm using on the blog. You can see an example of the output on any page with Facebook comments. The code uses Json.net.

FacebookComments.ascx:

FacebookComments.ascx.cs

(Related: Top 5 reasons to hate the Facebook like button; Monitor page index status with Google Sheets, Apps Script and the Google Search Console API; BlogEngine.NET most popular pages widget using Google Analytics)

(You might also like: Grand Canyon; Fallen Leaf Lake; Autumnal Equinox 2018)

(All Marketing Posts)

It was where he left it

Updated on Thursday, November 12, 2015

It was where he left it

Not to bang on about the BBC and their horrible headlines but 'lost' is a bit different from 'I was in a panic for 20 minutes' but actually it was exactly where I left it. How quickly can you go from 'Nation shall speak peace unto nation' to SEO whore...

(Related: The Trust Project, Fake News and a Partial Facebook Uninstall; Got It; I Love Email)

(You might also like: Social Undistancing; Mount Davidson; Twenty-Four Hours with Twilio)

(All Marketing Posts)

Is PAD dead?

Updated on Thursday, December 26, 2019

I’ve been a member of the Association of Software Professionals (née Shareware Professionals) for nearly ten years and I publish PAD files for most Catfood products. The idea is that PAD provides a standardized XML format for syndicating product information out to download sites. I used the ASPs PAD directory back in 2009 to analyze product information available in this format.

It used to be that download sites were a significant source of traffic and downloads, especially cnet’s Download.com. This just isn’t the case for me any more. I need to really massage Google Analytics to find any referral traffic from download sites. While I have PAD files available I no longer make any effort to promote them. Given a spare minute or two it’s far more effective to write a blog post.

Historically Download.com did a great job surfacing new products. They’ve shifted to emphasizing paid placements and the most popular so unless you’re a category killer (in a pre-defined category) it’s much harder to get traction. Lower tier download sites used to offer some SEO benefit but I really don’t see this any more as the sites have got wise to preserving their limited link juice.

Other than inbound marketing efforts my best source of traffic is creating Google Gadget versions of my products where appropriate. These provide independent value and link to the downloadable version for people who want more (a fusion of freemium and shareware).

Having got to the point of giving up on PAD I’m excited to see Ryan Smyth’s rallying cry in the introduction to a series on this topic on cynic.me:

“I’m going to once and for all shut up the many nay-sayers that are constantly poo-pooing on PAD, Robosoft, and download sites.”

Maybe I’m just doing it wrong. I hope so and I’ll be happy to learn something if this is the case.

(Related: State of the Micro-ISV-osphere; Export Google Fit Daily Steps, Weight and Distance to a Google Sheet; Migrating from Blogger to BlogEngine.NET)

(You might also like: I didn't think I'd ever fall for fake news on Facebook; Summit Loop, San Bruno Mountain; High-Frequency Trading)

(All Marketing Posts)