Skip to main content

User Agent Blocking

With user agent blocking at Raidboxes you can deny access to your site for bots and crawlers. This article explains how to use the feature.

Niko Baumgartl avatar
Written by Niko Baumgartl
Updated over a year ago

Every request to your site comes with a specific user agent. However, not all traffic to your site is wanted or even appropriate. It can therefore make sense to prevent certain user agents from accessing your website completely, especially if those requests originate from a wide range of IP addresses or from a floating IP address.



What is a user agent?

The user agent is part of the HTTP header that is sent by the client while connecting to your site. The user agent contains information about the browser that's being used, its version and the user's operating system.

Examples:

General Syntax:
User-Agent: <product> / <product-version> <comment>

Syntax of a browser user agent:
User-Agent: Mozilla/5.0 (<system-information>) <platform> (<platform-details>) <extensions>

Example browser user agent:
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36

Example BOT User Agent:
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)


You can find detailed information on user agent syntax and further examples on the Mozilla MSN Docs site.

Why block certain user agents?

Blocking certain user agents is especially helpful when high CPU loads occur due to aggressive crawling by specific bots. You can then identify those bots by their user agent and set up rules that prevent them from accessing your site.

Most bots and crawlers don't pose a threat to your site, however. There are even bots that are crucial for your website's success, like the Googlebot.

Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)


Sometimes these bots cause an unusual amount of requests and load by crawling all of your site's URLs with high frequency. In most of those cases, this happens when certain SEO tools crawl your site.

You can also forbid bots to crawl your site or parts of it entirely by setting up rules in the robots.txt file. Keep in mind that these rules are only soft directives and are not followed by all bots and crawlers.

Another way to block access to your site is our IP Blocking feature. In the case of bots and crawlers, it might prove difficult to effectively block them by IP because their requests commonly originate from a lot of different IP addresses.

How to block user agents

To block certain or specific user agents, follow these steps:

  • Navigate to the Settings page of your Box

  • Click the menu item Security

  • Click on the submenu item User Agent Blocking

  • Click the NEW ENTRY button

  • A modal will pop up where you can specify how to block the user agent

  • USER AGENT NAME: Enter a name for the blocking rule that you'll recognize later. This name has no technical implications and is only for internal purposes. Example: Google Bot Mobile

  • STRING: Enter the string of the user agent you want to block.

  • MATCH MODE: Select the match mode you want to be applied to the specified string.

  • String contains the complete user agent string: Select this method if your string contains the complete user agent. With this setting, only user agents with this exact string will be blocked. (so-called exact match)


    Example:

    Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

  • String contains just a part of the user agent string: Select this method if your string contains just a part of the user agent identifier you want to block. With this setting, all user agents that contain your specified string will be blocked. (so-called contains match)

    Example:

    Googlebot

  • When it's all set up, click the SAVE button

  • The modal will close and you'll now find a list of all of your blocked user agents

  • Click CREATE ENTRY if you want to block additional user agents

  • Click the pencil icon if you want to edit an existing entry

  • Click the trash icon if you want to delete an existing entry


Please note that each new entry and each change will take approximately one minute to take effect on your box


FAQ - Frequently asked questions about user agent blocking

Is it possible to check which user agents are currently accessing my site?

You can check the user agents of each request in the access logs of your box. If you need help identifying the number of requests generated by each bot, please contact our support team. We can provide a bot traffic analysis for the current and previous day.

What is the response of the NGINX server if a blocked user agent tries to access my website?
The server will respond with a 403 status code.

Is it possible to see how many and which user agents are blocked per day?
You can search for requests with a 403 status code response in your box's access logs.

Is it possible to upload a file with the user agent strings that should be blocked?
We currently don't support the import of user agent lists.

How long does it take for a new rule to take effect?
It will take from a couple of seconds to about 1-2 minutes to get the new rule up and running. On average, it takes around one minute.

What should I consider when setting up new rules?
Don't block browser user agents with contains match e.g. Mozilla, AppleWebKit, Chrome, Chromium, Gecko etc.. Also don't block operating systems with contains match e.g. Macintosh, iPhone, Windows NT, Linux, iPad, Linux, x11, Android. This will most probably block a lot of your website's normal traffic.

Is there a difference if the user agent string is lower case or upper case?
No, both modes are case insensitive

Is it possible to add regular expressions (regex) in the "user agent string" field?
No, regex are not supported.

Did this answer your question?