Monitor page index status with Google Sheets, Apps Script and the Google Search Console API

Updated on Friday, February 9, 2024

GSC Monitor

Overview

Google just released URL inspection as part of the Search Console API. I check for issues periodically in Search Console but it would be great to just get an email when an issue crops up. The Apps Script project below does just that by monitoring URLs from your sitemap for changes and sending an email whenever anything is detected. The Search Console API has a limit of 2,000 calls per day and Apps Script also imposes a time limit on scripts. The approach I take below assigns a random day of the week to each URL to limit the number checked on each run. Depending on the size of your site you may want to remove this check (or go the other way if you have a large number of URLs to monitor). Follow the steps below to get your own monitoring spreadsheet up and running.

Google Sheet

Create a new spreadsheet in Google Drive and call it anything you want. Rename an empty sheet to 'gsc', this sheet will store the index data. You don't need to make any other changes to this sheet.

Choose Apps Script from the Extensions menu. This will open up the script editor for your spreadsheet. I find that sometimes the editor opens with the wrong account if you're signed into more than one. If that applies to you, check quickly to make sure the right account is selected. With Code.gs selected copy and paste the script below replacing the default function:

There are a few configuration variables to enter at the top. AlertEmail is the email address to notify when index status changes are detected. SitemapUrl is the full URL of your sitemap. The current implementation does not support sitemap index files, this needs to be a regular sitemap containing URLs. SearchConsoleProperty should be the URL of the site to monitor, or sc-domain: followed by the domain for domain properties (for this site sc-domain:ithoughthecamewithyou.com).

Click Project Settings (the cog in the left hand menu of the script editor) and copy the Script ID. Make a note of this for later.

API Console

Next we need to configure the Search Console API in the Google Cloud Platform Console.

  1. Create  a new project (click the drop down to the right of 'Google Cloud Platform Console' and then New Project). Pick any name you like. 
  2. Once the project is created, find APIs and Services in the left hand menu and choose Library.
  3. Search for Search Console, click on the Search Console API and then click Enable. 
  4. A new screen will load, click Credentials in the left hand menu. 
  5. Click Configure Consent Screen and choose the internal type.
  6. Fill in the required fields - application name and contact emails. 
  7. Add the ./auth/webmasters.readonly scope for readonly access to Search Console data.
  8. Once the consent screen is complete click Credentials in the left hand menu again.
  9. Click Create Credentials at the top and choose OAuth Client ID. 
  10. Choose Web Application.
  11. Add https://script.google.com/macros/d/{SCRIPTID}/usercallback to authorized redirect URLs, replacing {SCRIPTID} including the brackets with the Apps Script ID you noted above. 
  12. Make a note of the Project ID.

Complete the Apps Script

Return to the Apps Script project and find the settings page. Set the GCP project to the project ID you noted above. Also on this page check the Show "appsscript.json" manifest file in editor option.

Go to the code editor and open appsscript.json. Add the following line:

"oauthScopes": ["https://www.googleapis.com/auth/script.external_request", "https://www.googleapis.com/auth/webmasters.readonly", "https://www.googleapis.com/auth/spreadsheets.currentonly", "https://www.googleapis.com/auth/script.send_mail"],

Make sure the script is saved and close the script editor window. Reload the spreadsheet. Once the reload completes you should have a Search Console menu at the top of the spreadsheet. Choose Update Data from the Search Console menu. This will run for a few minutes and then populate the 'gsc' sheet with your URL data. See UrlInspectionResult in the API documentation for more information about the meaning of each field. You should also get a lengthy email with a notification for each URL that was inspected. This will continue for the first week and then you'll only get updates for interesting status changes.

Scheduling

Now the project is working, open the script editor again (Apps Script from the Extensions menu) and open Triggers (the clock icon in the left hand menu). Click Add Trigger at the very bottom right of the window. Select runUpdate as the function to run. Change event source to time-driven, and then select day timer and an hour to run the script. Lastly click Save. Your Google Search Console monitor will now run every day, and if the index status of a page changes you'll get an email about it within a week. The spreadsheet will also come in handy for other analysis and reporting.

Troubleshooting

You might need to tweak the script if you start hitting limits. The URL inspection API currently has a 2,000 call / day quota. Apps Script will only run for around 7 mins on a free account and 20 mins if you have Google Workspace. If either of these limits apply you could modify the 'checkDay' logic to use day of the month (or year, or ...) to reduce the number of URLs inspected on each run. If you need to do this remember to update the Check Day column on the 'gsc' sheet as well.

The script assumes that the URLs you want to monitor are in your sitemap. If this is not the case you can add URLs to the sheet directly. As long as they are part of the configured property you will still get results. If you use this method you might want to comment out the updatePagesFromSitemap() call in runUpdate() to save time.

If anything else goes wrong please leave a comment below and I'll do my best to help you.

Updated 2022-02-08 17:40:

Monitor page index status with Google Sheets, Apps Script and the Google Search Console API

After a couple of days I have a full dump of my sitemap from the page index status API. I wrote this script for the alerting possibilities but couldn't resist some analytics once the dataset was complete.

The chart above shows sessions vs days since the last Google Crawl. Pretty stark - Google keeps a close eye on the pages it sends traffic to and not so much on the others.

I set lastmod honestly and there is good news here. I could only find two cases where Google had not crawled the page since the last modified date. So when the sitemap says a page has changed the odds are good that it will get another crack at the index. The two exceptions are unusual posts that are updated hourly and weekly respectively and both have been crawled recently.

The breakdown of index status matches Google Search Console pretty well but I have a handful of pages that are 'Indexed, not submitted in sitemap', even though they are in the sitemap and no such status is shown on Search Console. I don't know if this is a glitch in the index status API or something to do with how the pages were discovered. Some light searching suggests that this message is usually what you would expect it to be.

Lastly, updating the sheet for my site is more bound by script execution time than the API limits. I changed it to run every hour and instead of partitioning by day of week I used a random hour of the day which means I check every URL at least once every 24 hours.

Updated 2022-04-24 10:50:

I just updated the code and post above. I've had occasional issues where updating the sheet failed which caused the next run to go back to the beginning with no saved index status. To reduce the chance of this happening I've added some retry logic and also improved the speed of the sheet load and save functions. I also had a comment on the code that suggested an easier way to handle OAuth and have incorporated this in the new version.

Add your comment...

More Google Apps Script Projects

(All Code Posts)

(Published to the Fediverse as: Monitor page index status with Google Sheets, Apps Script and the Google Search Console API #code #searchconsole #appsscript #google #sheets #drive #seo How to use Apps Script in Google Sheets to automatically monitor index status in Google Search Console and get an email alert if anything changes. )

Comments

Robert Ellison

MC - this is the Google Cloud Project ID. It's in the URL when you're editing the project (project=xxx) and you can find it in a more digestible form by clicking the three dots at the top right of the screen and choosing Project settings.

John Mueller

@ithoughthecamewithyou.com This is pretty cool, nice work!

MC

Your section regarding "Make note of Project ID" is ambiguous. There is no detail of where to get the project ID and this is the first instance in which you use the word project ID in the article; so I'm unsure if you're referencing the script ID or something else. Stuck at this portion.

JL

Is there a way to increase the 2,000 request/day limit?

Robert Ellison

Check that you have the right URL for the sitemap (and that the sitemap is not a sitemap index).

David

I just launched this and a new sheet was not created. I created a new tab called gsc but no headers showed up and it said couldn't find last data. I added some previous data and now I am getting the error "Exception: Error on line 15: The entity "raquo" was referenced, but not declared."

Robert Ellison

If you post the full URL I can check. Or look at the file, if it has a reference to another XML file or files then it's an index, if it has your actual site URLs in then it's a regular sitemap.

Mohammad

i question this part "SitemapUrl is the full URL of your sitemap. The current implementation does not support sitemap index files, this needs to be a regular sitemap containing URLs"

i use sitemap yoast this part "namesite/post-sitemap.xml"

is this wrong?

or create urls only site with tools online?

thanks for answering

Robert Ellison

If you're getting as far as headers written to the sheet with no errors then the problem is likely the sitemap file. Check that the URL is correct, and that it's a sitemap file (not a sitemap index file).

Mohammad

Thanks for answering

I fill all fields script, only read fields sitemap! Not filled col "Verdict","ROBOTS.TXT" "State","Indexing State","Last Crawl Time","Page Fetch State","Google Canonical","User Canonical"

I do not know why?

I went exactly step by step with the training.

Add Comment

All comments are moderated. Your email address is used to display a Gravatar and optionally for notification of new comments and to sign up for the newsletter.

Newsletter

Related

Upload