Blogs

EMAIL: info@example.com

Specbee: Boost Up your Drupal 9 SEO game – Implementing the RobotsTxt module in Drupal 9

Boost Up your Drupal 9 SEO game – Implementing the RobotsTxt module in Drupal 9
Ankitha
02 Feb, 2021

The Robots.txt file is a very underrated on-page SEO factor. Not everybody realizes the value it brings to the table. The Robots.txt file is like an access control system that tells the crawler bots which pages need to be crawled and which ones don’t. It is a rule book for your website which is read by the various web spiders before it attempts a crawl on your website.

There are tons of amazing Drupal SEO modules in version 9 and 8 that help make our jobs easier and boosts SEO ranking. And one of them is the RobotTxt module. The RobotsTxt module in Drupal 9 (and 8) is a handy feature that enables easy control of the Robots.Txt file in a multisite Drupal environment. You can dynamically create and edit the Robots.txt files for each site via the UI. Let’s learn more about this utility module and how to implement it in Drupal 9.
RobotsTxt module for Drupal SEO

But how does Robots.Txt help with SEO?

So, Robots.Txt files restrict crawlers from crawling some pages. But why wouldn’t you want all your pages/files to be crawled, right? Why do you need to have any restrictions whatsoever? Well, in this case, the more isn’t always merrier.

  • Without a Robots.txt file, you are allowing web spiders to crawl all your webpages, sections and files. This uses up your Crawl Budget (Yes, that’s a thing) – which can affect your SEO.
  • A crawl budget is the number of your pages crawled by web spiders (Google bot, Yahoo, Bing, etc.) in a given timeframe. Too many pages to crawl could decrease your chances of being indexed faster. Not only that, you might also lose out on indexing the important pages!
  • Not all your pages need to be crawled. For example, I’m sure you wouldn’t want Google to crawl your development / staging environment web pages or your internal login pages.
  • You might want to restrict media files (images, videos or other documents) from being crawled upon.
  • If you have a reasonable number of duplicate content pages, it is a better idea to add them to the Robots.Txt file instead of using canonical links on each of those pages.

How to Install and Implement the RobotsTxt Module in Drupal 9

The RobotsTxt Drupal 9 module is great when you want to dynamically generate a Robot.Txt file for each of your website when you are running many sites from one codebase (multisite environment). 

Step 1: Install the RobotsTxt Module for Drupal 9

Using composer: 

  
composer require 'drupal/robotstxt:^1.4'

Step 2: Enable the module

Go to Home > Administration > Extend (/admin/modules) and enable RobotsTxt module.
Generates Robots.txt File

 

Step 3: Remove the existing Robots.txt file

Once the module is installed, make sure to delete (or rename) the robots.txt file in the root of your Drupal installation for this module to display its own robots.txt file(s). Otherwise, the module cannot intercept requests for the /robots.txt path.
Remove Robots.txt file

Step 4: Configure

Navigate to Home -> Administration -> Configuration -> Search and metadata -> RobotsTxt (/admin/config/search/robotstxt), where you can add in your changes to “Contents of robots.txt” region. Save the configuration.
Contents of Robots.txt

Step 5: Verify

To verify your changes please visit https://yoursitename.com/robots.txt
Verify Robots.txt File

RobotTxt API

If you want to implement a shared list of directives across your multisite environment, you can implement the RobotsTxt API. The module has a single hook called  hook_robotstxt(). The hook allows you to define extra directives via code. 

The example below will add a Disallow for /foo and /bar to the bottom of the robots.txt file without having to add them manually to the “Contents of robots.txt” region in the UI.

  
/**
* Add additional lines to the site's robots.txt file.
*
* @return array 
*   An array of strings to add to the robots.txt.
*/
function hook_robotstxt() {
  return [
   'Disallow: /foo',
   'Disallow: /bar',
 ];
}

Shefali ShettyApr 05, 2017