Real-Time Indexing

Category
Complexity
1/5
Date published
2020-02-04
Author
  • Vincent Terrasi
Prerequisites
  • Bing API Token
  • Google API Token
Links

Automatically Detect New URLs and Submit Them to Search Engines for Indexing

Context

Getting new pages indexed is a challenge for SEOs in industries with frequently evolving sites, particularly e-commerce, classifieds, and online publishers, where the rapid search visibility of new pages directly affects the business.

The major search engines provide means of manually submitting pages for indexing.

However, in the use cases described above, this can require SEO teams to maintain extensive daily lists of pages created--sometimes automatically--by production and content teams. Obtaining a complete list can be difficult.

Furthermore, depending on the number of new pages, manual submission is often not a feasible option.

To submit URLs to search engines, we will use the following methods, which you should be familiar with before beginning: URL Submission API (Bing), sitemap availability (Google)

Objectives
  • Automate the creation of a list of new URLs based on crawl data
  • Automate the batch submission of the list of new URLs for indexing by search engines
  • Submit new URLs to search engines in real time for indexing or discovery, as soon as they are discovered by your crawl analyses
Method

To detect new URLs, we use OnCrawl's API and the Crawl over Crawl feature, which compares two crawls run on the same website. This allows OnCrawl to analyze the differences on the website between the time of the first crawl and the time of the second. This analysis includes a list of new URLs that were not present in the earlier crawl.

To submit URLs to Bing, we use the "Submit URL" API, a faster way to tell Bing about your new or updated URLs. It can be used to submit up to 10,000 URLs per day for immediate crawl and indexation.

To submit URLs to Google,, we rely partially on the Indexing API, which allows any site owner to directly notify Google when pages that contain JobPosting or BoradcastEvent Schema objects are added or removed. This allows Google to schedule these pages for a fresh crawl, which can lead to faster indexing and higher quality user traffic. For all other URLs, we take advantage of Google's use of sitemaps as a key source of URLs to be discovered.