Skip to main content

User Agent Blocking

In this article, we’ll show you how to block user agents and explain the technical basics behind it.

Written by Kerstin Kegel

When does User Agent Blocking make sense?

Aggressive crawling or bot traffic can lead to high CPU usage and performance issues.

If requests are coming from many different IP addresses, traditional IP blocking is often not enough. In these cases, you can identify and block bots more effectively based on their user agent.

💡 What is a User Agent?

A user agent is part of the HTTP headers sent by a client to a server. It contains information about the browser, version, operating system, or the identity of a bot.

Example of a browser user agent:

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36

Example of a bot:

Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

🔗 For more technical details about the syntax, check the official Mozilla documentation.


Blocking a User Agent

Here’s how to do it:

Step 1: Open the feature

a) Open the relevant Box and go to Box Settings

b) Select Security and navigate to User Agent Blocking

c) Click the grey 'New Entry' button

Step 2: Fill in the fields

In the modal window, enter the required values:

a) User Agent Name

An internal name to help you recognize the rule later.
This name does not affect how the rule works.

💡 Tip:
As a best practice, use a clear naming structure:

[Bot or crawler name] – [purpose or reason]

Examples:

  • AhrefsBot – high crawl volume

  • SemrushBot – performance impact

  • GPTBot – unwanted AI crawling

  • MJ12bot – excessive requests

b) User Agent String

This is the string used to identify the bot you want to block.

You can find the correct value in your Box’s access logs.
The user agent appears at the end of each log line (usually in quotes or brackets, depending on the format).

Example log entry

66.249.66.1 - - [18 Feb 2026:12:34:56 +0000] "GET / HTTP/1.1" 200 5321 "-" "Mozilla/5.0 (compatible; DotBot/1.2; +https://moz.com/dotbot)"

In this example, the User Agent String would be: DotBot

Common bot examples from real-world usage:

BLEXBot, SeznamBot, YandexBot, PetalBot, Bytespider, ZoominfoBot

(Before blocking a bot, always check whether it is relevant for your website.)

c) Match Mode


(Exact Match): String is an exact match of the user agent

Best if you want to block one very specific bot
Low risk, but rarely needed in practice


(Contains Match): String contains part of the user agent

Best for clearly identifiable bot names like AhrefsBot or DotBot

⚠️ Avoid using generic terms like:


Mozilla, Chrome, Safari, AppleWebKit, Gecko, Windows, Linux

These are used by real browsers and would block legitimate visitors.


Step 3: Save the rule

a) Click Save to apply the rule

The modal will close, and you’ll see an overview of all existing entries.

From there, you can:

  • create new entries

  • edit existing rules

  • delete rules

ℹ️ New or updated rules usually take effect after about 1 minute.


Before blocking a user agent

Not every bot is a problem. Some crawlers are essential for visibility and SEO.

Only block user agents if at least one of the following applies:

  • A very high number of requests within a short period of time

  • Noticeable CPU load or performance issues

  • Several thousand requests per hour

  • Access to many irrelevant URLs

  • Repeated crawling of the same pages

If there are no performance issues, blocking is usually not necessary.



Which bots you should usually not block

The following bots are generally recommended and should not be blocked:

  • Googlebot

  • bingbot

  • AdsBot Google

  • Applebot

Blocking these bots can negatively impact your SEO and search engine indexing.


How to identify problematic crawling

Check your Access Logs for the following indicators:

  • Number of requests per IP or user agent

  • Request frequency

  • Response times

  • Status codes

If a bot is only generating a small number of requests per hour, there is usually no need to take action.


Temporary blocking during high server load

In exceptional cases, it can make sense to temporarily block bots that are normally desirable (e.g. search engines or SEO crawlers).

For example:

  • During a traffic spike

  • In case of high CPU load

  • During migrations or maintenance

  • When the site is already unstable

⚠️ Important:
Permanently blocking search engines can negatively impact your SEO and indexing.

However, temporary blocking can help stabilize your site in the short term.

Once the load has decreased, you should remove the rule again.


AI Bots - opportunities and risks

In recent years, so-called AI bots have increasingly started accessing websites.
Examples include:

  • GPTBot

  • ClaudeBot

  • PerplexityBot

  • Google Extended

  • Bytespider

These bots are used to train AI models or to power AI-driven search and discovery systems.

Potential downsides

  • High crawling frequency

  • Access to many pages in a short period of time

  • Increased CPU load

  • Unwanted reuse of your content

Potential benefits

  • Visibility in AI-powered search systems

  • Potential additional traffic

  • Relevance in future search environments

Whether to block AI bots is a deliberate strategic decision.

There is no clear “right” or “wrong” here.

Consider the following:

  • Do you want your content to appear in AI systems?

  • Is the bot causing measurable performance issues?

  • Is your priority maximum reach, or protecting your content?

If there are no performance issues, blocking is usually not necessary.


What happens when a user agent is blocked?

Blocked requests receive the HTTP status code 403 (Forbidden).

This means the server recognizes the request but actively denies access.
The bot or client is not allowed to access the requested resource.

Because the request is rejected early:

  • no PHP processes are started

  • the WordPress instance is not loaded

  • no database queries are executed

As a result, server load is significantly reduced compared to normal page requests.


Difference between robots.txt and User Agent Blocking

The robots.txt file allows you to give instructions to search engines and bots about which parts of your website may be crawled.

However, these instructions are voluntary.
Reputable search engines follow them, but many bots simply ignore them.

User Agent Blocking, on the other hand, is enforced at the server level.
Requests from blocked user agents are immediately rejected with a 403 (Forbidden) status code.

Key difference:

  • robots.txt - controls crawling behavior

  • User Agent Blocking - completely blocks access

When to use which?

  • Use robots.txt if you want to limit crawling of specific areas

  • Use User Agent Blocking if a bot ignores robots.txt and causes noticeable load


​ℹ️ Related articles

Did this answer your question?