Bot, Hidden, and Invalid Traffic
To accurately measure market share, bot, hidden, and invalid sessions must be removed from the data. NetMarketShare reporting detects and
removes visitor sessions that are determined to be either hidden or invalid. A large and growing percentage of visitor sessions
are detected as invalid. We use a variety of methods to determine whether a visitor session is valid or not.
Data Breakdown (Nov., 2017)
Category |
Percent of Sessions |
Valid sessions |
75.4% |
Hidden (see below for more info) |
11.6% |
Block list |
4.1% |
Colocation |
3.4% |
Spoofed user agent |
2.7% |
Known bot |
1.3% |
Other proprietary methods |
the remaining |
Detection Methods
Method |
Description |
Spoofed User Agents |
Bots often rotate their user agents in order to appear to be more than one device and generate realistic looking traffic. We have developed technology to match the user agent to the browser's capabilities and detect sessions that have altered their user agent.
|
Block List |
We check every I.P. address against our database of known infected machines. This detects machines that have been hijacked as spambots and also machines that are infected with viruses and generate large amounts of automated traffic and clicks. This database is maintained in realtime in order to detect emerging sources. |
Data Center Origin |
We maintain a database of data center I.P. address ranges, since many bot networks will use data centers in other countries to proxy
traffic. A session from within, for example, an Amazon AWS data center address block is unlikely to be a human.
|
Public Web Proxies |
Similar to using a data center to proxy traffic, public web proxies are also used. We maintain a database of public web proxies in order
to exclude sessions from them.
|
Invalid Searches |
To appear to be from a search engine, often bots create fake referrer headers. In many cases, these headers differ from real search engine
referrer structures.
|
Other Proprietary Methods |
We currently have developed many other methods for detecting fraudulent sessions and this continues to be a primary focus of our research
efforts due to the magnitude of the problem.
|
Hidden Session Detection
A significant percentage of web pages loaded are never visible.
For a variety of reasons, pages downloaded from the web are often not visible on the user's device. This can skew the usage share data since the
amount of hidden pages varies by browser and platform. Some reasons for this are:
Reason |
Description |
Preloading |
Search engines will preload pages in the background while a user types in a search query. The search engine attempts to
predict which link or links the user will click on and loads the pages from those links. This is a way to improve the performance of web
browsing, however many of the preloaded pages are never made visible and should not be counted in usage share statistics.
|
Browser Window Hidden |
This occurs when a browser window is behind another window.
|
Background Browser Tabs |
A browser tab can be launched in the background and load pages. These pages are never visible unless the user opens the tab.
|
Bots |
A large and growing amount of website traffic is generated by bots with the intention of committing ad fraud. A significant portion
of our data collection analysis involves removing this traffic from our usage share statistics. Even if we cannot detect that the session was
generated by a bot, the page will often never be visible.
|