Monitor page index status with Google Sheets, Apps Script and the Google Search Console API

Updated on Friday, May 20, 2022

GSC Monitor

Overview

Google just released URL inspection as part of the Search Console API. I check for issues periodically in Search Console but it would be great to just get an email when an issue crops up. The Apps Script project below does just that by monitoring URLs from your sitemap for changes and sending an email whenever anything is detected. The Search Console API has a limit of 2,000 calls per day and Apps Script also imposes a time limit on scripts. The approach I take below assigns a random day of the week to each URL to limit the number checked on each run. Depending on the size of your site you may want to remove this check (or go the other way if you have a large number of URLs to monitor). Follow the steps below to get your own monitoring spreadsheet up and running.

Google Sheet

Create a new spreadsheet in Google Drive and call it anything you want. Rename an empty sheet to 'gsc', this sheet will store the index data. You don't need to make any other changes to this sheet.

Choose Apps Script from the Extensions menu. This will open up the script editor for your spreadsheet. I find that sometimes the editor opens with the wrong account if you're signed into more than one. If that applies to you, check quickly to make sure the right account is selected. With Code.gs selected copy and paste the script below replacing the default function:

There are a few configuration variables to enter at the top. AlertEmail is the email address to notify when index status changes are detected. SitemapUrl is the full URL of your sitemap. The current implementation does not support sitemap index files, this needs to be a regular sitemap containing URLs. SearchConsoleProperty should be the URL of the site to monitor, or sc-domain: followed by the domain for domain properties (for this site sc-domain:ithoughthecamewithyou.com).

Click Project Settings (the cog in the left hand menu of the script editor) and copy the Script ID. Make a note of this for later.

API Console

Next we need to configure the Search Console API in the Google Cloud Platform Console.

  1. Create  a new project (click the drop down to the right of 'Google Cloud Platform Console' and then New Project). Pick any name you like. 
  2. Once the project is created, find APIs and Services in the left hand menu and choose Library.
  3. Search for Search Console, click on the Search Console API and then click Enable. 
  4. A new screen will load, click Credentials in the left hand menu. 
  5. Click Configure Consent Screen and choose the internal type.
  6. Fill in the required fields - application name and contact emails. 
  7. Add the ./auth/webmasters.readonly scope for readonly access to Search Console data.
  8. Once the consent screen is complete click Credentials in the left hand menu again.
  9. Click Create Credentials at the top and choose OAuth Client ID. 
  10. Choose Web Application.
  11. Add https://script.google.com/macros/d/{SCRIPTID}/usercallback to authorized redirect URLs, replacing {SCRIPTID} including the brackets with the Apps Script ID you noted above. 
  12. Make a note of the Project ID.

Complete the Apps Script

Return to the Apps Script project and find the settings page. Set the GCP project to the project ID you noted above. Also on this page check the Show "appsscript.json" manifest file in editor option.

Go to the code editor and open appsscript.json. Add the following line:

"oauthScopes": ["https://www.googleapis.com/auth/script.external_request", "https://www.googleapis.com/auth/webmasters.readonly", "https://www.googleapis.com/auth/spreadsheets.currentonly", "https://www.googleapis.com/auth/script.send_mail"],

Make sure the script is saved and close the script editor window. Reload the spreadsheet. Once the reload completes you should have a Search Console menu at the top of the spreadsheet. Choose Update Data from the Search Console menu. This will run for a few minutes and then populate the 'gsc' sheet with your URL data. See UrlInspectionResult in the API documentation for more information about the meaning of each field. You should also get a lengthy email with a notification for each URL that was inspected. This will continue for the first week and then you'll only get updates for interesting status changes.

Scheduling

Now the project is working, open the script editor again (Apps Script from the Extensions menu) and open Triggers (the clock icon in the left hand menu). Click Add Trigger at the very bottom right of the window. Select runUpdate as the function to run. Change event source to time-driven, and then select day timer and an hour to run the script. Lastly click Save. Your Google Search Console monitor will now run every day, and if the index status of a page changes you'll get an email about it within a week. The spreadsheet will also come in handy for other analysis and reporting.

Troubleshooting

You might need to tweak the script if you start hitting limits. The URL inspection API currently has a 2,000 call / day quota. Apps Script will only run for around 7 mins on a free account and 20 mins if you have Google Workspace. If either of these limits apply you could modify the 'checkDay' logic to use day of the month (or year, or ...) to reduce the number of URLs inspected on each run. If you need to do this remember to update the Check Day column on the 'gsc' sheet as well.

The script assumes that the URLs you want to monitor are in your sitemap. If this is not the case you can add URLs to the sheet directly. As long as they are part of the configured property you will still get results. If you use this method you might want to comment out the updatePagesFromSitemap() call in runUpdate() to save time.

If anything else goes wrong please leave a comment below and I'll do my best to help you.

Updated 2022-02-08 17:40:

Monitor page index status with Google Sheets, Apps Script and the Google Search Console API

After a couple of days I have a full dump of my sitemap from the page index status API. I wrote this script for the alerting possibilities but couldn't resist some analytics once the dataset was complete.

The chart above shows sessions vs days since the last Google Crawl. Pretty stark - Google keeps a close eye on the pages it sends traffic to and not so much on the others.

I set lastmod honestly and there is good news here. I could only find two cases where Google had not crawled the page since the last modified date. So when the sitemap says a page has changed the odds are good that it will get another crack at the index. The two exceptions are unusual posts that are updated hourly and weekly respectively and both have been crawled recently.

The breakdown of index status matches Google Search Console pretty well but I have a handful of pages that are 'Indexed, not submitted in sitemap', even though they are in the sitemap and no such status is shown on Search Console. I don't know if this is a glitch in the index status API or something to do with how the pages were discovered. Some light searching suggests that this message is usually what you would expect it to be.

Lastly, updating the sheet for my site is more bound by script execution time than the API limits. I changed it to run every hour and instead of partitioning by day of week I used a random hour of the day which means I check every URL at least once every 24 hours.

Updated 2022-04-24 10:50:

I just updated the code and post above. I've had occasional issues where updating the sheet failed which caused the next run to go back to the beginning with no saved index status. To reduce the chance of this happening I've added some retry logic and also improved the speed of the sheet load and save functions. I also had a comment on the code that suggested an easier way to handle OAuth and have incorporated this in the new version.

More Google Apps Script Projects

(All Code Posts)

Comments

bucheron

Hi Robert

Thank you very much for this script, it's fine work.

Would you know be any chance why I get many urls with just "Added from sitemap" as data ? No verdict, no indexing state.

Thank you

Robert Ellison

Hi Bucheron, the 'Added from sitemap' status is the default when a URL is first loaded into the sheet. This will change to a real status once the index status API has been called for that URL.

bucheron

@robert and let's say I have a small website with a small sitemap. What lines do I need to change in the code to check instantly and not wait specific days ? thank you

Robert Ellison

Change getDay() to getHours() on line 35 and schedule the script to run every hour instead of every day. This will run across hours 0-6. Optionally change line 111 to 24 instead of 7 (to spread out checks throughout the day). If you do this you need to change the spreadsheet column checkday to have 0-23 values instead of 0-6 (or just delete all rows and allow the script to regenerate. Hope that helps.

bucheron

I played a lot with the script and it itself is working fine, but the GSC authorisation seems to worn off really quickly, like 2 hours.

Script works fine, then 2 hours later back to "Error processing url Error: Access not granted or expired." and I have to authorize again.

Have you encountered the same issue ?

Robert Ellison

I don't have that issue but I have seen Apps Script get into a bad state. Try this, go to your Google Account, Security and find the Apps with access to your account section. The script will be in there with search console access. Click Remove Access. Then in the spreadsheet choose Reset Settings from the Search Console menu. Then run Authorize... from the same menu and authorize again. Then see if the authorization sticks.

bucheron

No, it still doesn't sticks, tried many times. I must say I don't know where to go from there.

Robert Ellison

The error suggests that the refresh token isn't available. Clearing script properties and reauthorizing usually fixes any issue like this. The only other possibility I can think of right now is that the script is being run by a different user than the one that authorized it - is this possible in your case (multiple Google accounts)? The property store used to save the refresh token is user specific.

Mohammad

hello

step 5.Click Configure Consent Screen and choose the internal type. User type enable external type! why disable "internal type"?

please help me

thanks

Robert Ellison

Try external and put something in all the required fields. Let me know if that doesn't work for you.

Mohammad

Thanks for answering

I fill all fields script, only read fields sitemap! Not filled col "Verdict","ROBOTS.TXT" "State","Indexing State","Last Crawl Time","Page Fetch State","Google Canonical","User Canonical"

I do not know why?

I went exactly step by step with the training.

Robert Ellison

If you're getting as far as headers written to the sheet with no errors then the problem is likely the sitemap file. Check that the URL is correct, and that it's a sitemap file (not a sitemap index file).

Mohammad

i question this part "SitemapUrl is the full URL of your sitemap. The current implementation does not support sitemap index files, this needs to be a regular sitemap containing URLs"

i use sitemap yoast this part "namesite/post-sitemap.xml"

is this wrong?

or create urls only site with tools online?

thanks for answering

Robert Ellison

If you post the full URL I can check. Or look at the file, if it has a reference to another XML file or files then it's an index, if it has your actual site URLs in then it's a regular sitemap.

David

I just launched this and a new sheet was not created. I created a new tab called gsc but no headers showed up and it said couldn't find last data. I added some previous data and now I am getting the error "Exception: Error on line 15: The entity "raquo" was referenced, but not declared."

Robert Ellison

Check that you have the right URL for the sitemap (and that the sitemap is not a sitemap index).

Add Comment

All comments are moderated. Your email address is used to display a Gravatar and optionally for notification of new comments and to sign up for the newsletter.

I Thought He Came With You is Robert Ellison's blog.

Newsletter

Related

Average Server Response Time in Azure Metrics