When does User Agent Blocking make sense?
Aggressive crawling or bot traffic can lead to high CPU usage and performance issues.
If requests are coming from many different IP addresses, traditional IP blocking is often not enough. In these cases, you can identify and block bots more effectively based on their user agent.
💡 What is a User Agent?
A user agent is part of the HTTP headers sent by a client to a server. It contains information about the browser, version, operating system, or the identity of a bot.
Example of a browser user agent:
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36
Example of a bot:
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
🔗 For more technical details about the syntax, check the official Mozilla documentation.
Blocking a User Agent
Here’s how to do it:
Step 1: Open the feature
a) Open the relevant Box and go to Box Settings
b) Select Security and navigate to User Agent Blocking
c) Click the grey 'New Entry' button
Step 2: Fill in the fields
In the modal window, enter the required values:
a) User Agent Name
An internal name to help you recognize the rule later.
This name does not affect how the rule works.
💡 Tip:
As a best practice, use a clear naming structure:
[Bot or crawler name] – [purpose or reason]
Examples:
AhrefsBot – high crawl volume
SemrushBot – performance impact
GPTBot – unwanted AI crawling
MJ12bot – excessive requests
b) User Agent String
This is the string used to identify the bot you want to block.
You can find the correct value in your Box’s access logs.
The user agent appears at the end of each log line (usually in quotes or brackets, depending on the format).
Example log entry
66.249.66.1 - - [18 Feb 2026:12:34:56 +0000] "GET / HTTP/1.1" 200 5321 "-" "Mozilla/5.0 (compatible; DotBot/1.2; +https://moz.com/dotbot)"
In this example, the User Agent String would be: DotBot
Common bot examples from real-world usage:
BLEXBot, SeznamBot, YandexBot, PetalBot, Bytespider, ZoominfoBot
(Before blocking a bot, always check whether it is relevant for your website.)
c) Match Mode
(Exact Match): String is an exact match of the user agent
Best if you want to block one very specific bot
Low risk, but rarely needed in practice
(Contains Match): String contains part of the user agent
Best for clearly identifiable bot names like AhrefsBot or DotBot
⚠️ Avoid using generic terms like:
Mozilla, Chrome, Safari, AppleWebKit, Gecko, Windows, Linux
These are used by real browsers and would block legitimate visitors.
Step 3: Save the rule
a) Click Save to apply the rule
The modal will close, and you’ll see an overview of all existing entries.
From there, you can:
create new entries
edit existing rules
delete rules
ℹ️ New or updated rules usually take effect after about 1 minute.
Before blocking a user agent
Not every bot is a problem. Some crawlers are essential for visibility and SEO.
Only block user agents if at least one of the following applies:
A very high number of requests within a short period of time
Noticeable CPU load or performance issues
Several thousand requests per hour
Access to many irrelevant URLs
Repeated crawling of the same pages
If there are no performance issues, blocking is usually not necessary.
Which bots you should usually not block
The following bots are generally recommended and should not be blocked:
Googlebot
bingbot
AdsBot Google
Applebot
Blocking these bots can negatively impact your SEO and search engine indexing.
How to identify problematic crawling
Check your Access Logs for the following indicators:
Number of requests per IP or user agent
Request frequency
Response times
Status codes
If a bot is only generating a small number of requests per hour, there is usually no need to take action.
Temporary blocking during high server load
In exceptional cases, it can make sense to temporarily block bots that are normally desirable (e.g. search engines or SEO crawlers).
For example:
During a traffic spike
In case of high CPU load
During migrations or maintenance
When the site is already unstable
⚠️ Important:
Permanently blocking search engines can negatively impact your SEO and indexing.
However, temporary blocking can help stabilize your site in the short term.
Once the load has decreased, you should remove the rule again.
AI Bots - opportunities and risks
In recent years, so-called AI bots have increasingly started accessing websites.
Examples include:
GPTBot
ClaudeBot
PerplexityBot
Google Extended
Bytespider
These bots are used to train AI models or to power AI-driven search and discovery systems.
Potential downsides
High crawling frequency
Access to many pages in a short period of time
Increased CPU load
Unwanted reuse of your content
Potential benefits
Visibility in AI-powered search systems
Potential additional traffic
Relevance in future search environments
Whether to block AI bots is a deliberate strategic decision.
There is no clear “right” or “wrong” here.
Consider the following:
Do you want your content to appear in AI systems?
Is the bot causing measurable performance issues?
Is your priority maximum reach, or protecting your content?
If there are no performance issues, blocking is usually not necessary.
What happens when a user agent is blocked?
Blocked requests receive the HTTP status code 403 (Forbidden).
This means the server recognizes the request but actively denies access.
The bot or client is not allowed to access the requested resource.
Because the request is rejected early:
no PHP processes are started
the WordPress instance is not loaded
no database queries are executed
As a result, server load is significantly reduced compared to normal page requests.
Difference between robots.txt and User Agent Blocking
The robots.txt file allows you to give instructions to search engines and bots about which parts of your website may be crawled.
However, these instructions are voluntary.
Reputable search engines follow them, but many bots simply ignore them.
User Agent Blocking, on the other hand, is enforced at the server level.
Requests from blocked user agents are immediately rejected with a 403 (Forbidden) status code.
Key difference:
robots.txt - controls crawling behavior
User Agent Blocking - completely blocks access
When to use which?
Use robots.txt if you want to limit crawling of specific areas
Use User Agent Blocking if a bot ignores robots.txt and causes noticeable load
ℹ️ Related articles



