Market Share Statistics for Internet Technologies

Bot, Hidden, and Invalid Traffic

To accurately measure market share, bot, hidden, and invalid sessions must be removed from the data. NetMarketShare reporting detects and removes visitor sessions that are determined to be either hidden or invalid. A large and growing percentage of visitor sessions are detected as invalid. We use a variety of methods to determine whether a visitor session is valid or not.

Data Breakdown (Nov., 2017)

Category Percent of Sessions
Valid sessions 75.4%
Hidden (see below for more info) 11.6%
Block list 4.1%
Colocation 3.4%
Spoofed user agent 2.7%
Known bot 1.3%
Other proprietary methods the remaining

Detection Methods

Method Description
Spoofed User Agents Bots often rotate their user agents in order to appear to be more than one device and generate realistic looking traffic. We have developed technology to match the user agent to the browser's capabilities and detect sessions that have altered their user agent.
Block List We check every I.P. address against our database of known infected machines. This detects machines that have been hijacked as spambots and also machines that are infected with viruses and generate large amounts of automated traffic and clicks. This database is maintained in realtime in order to detect emerging sources.
Data Center Origin We maintain a database of data center I.P. address ranges, since many bot networks will use data centers in other countries to proxy traffic. A session from within, for example, an Amazon AWS data center address block is unlikely to be a human.
Public Web Proxies Similar to using a data center to proxy traffic, public web proxies are also used. We maintain a database of public web proxies in order to exclude sessions from them.
Invalid Searches To appear to be from a search engine, often bots create fake referrer headers. In many cases, these headers differ from real search engine referrer structures.
Other Proprietary Methods We currently have developed many other methods for detecting fraudulent sessions and this continues to be a primary focus of our research efforts due to the magnitude of the problem.

Hidden Session Detection

A significant percentage of web pages loaded are never visible.

For a variety of reasons, pages downloaded from the web are often not visible on the user's device. This can skew the usage share data since the amount of hidden pages varies by browser and platform. Some reasons for this are:

Reason Description
Preloading Search engines will preload pages in the background while a user types in a search query. The search engine attempts to predict which link or links the user will click on and loads the pages from those links. This is a way to improve the performance of web browsing, however many of the preloaded pages are never made visible and should not be counted in usage share statistics.
Browser Window Hidden This occurs when a browser window is behind another window.
Background Browser Tabs A browser tab can be launched in the background and load pages. These pages are never visible unless the user opens the tab.
Bots A large and growing amount of website traffic is generated by bots with the intention of committing ad fraud. A significant portion of our data collection analysis involves removing this traffic from our usage share statistics. Even if we cannot detect that the session was generated by a bot, the page will often never be visible.