Ultimate Guide Filter Google Analytics Spam & Bot Traffic

Google Analytics Referral Spam & Bot Traffic – This is the latest in a long string of spamming large audiences via the Internet.

EDITED 12/15/2016

EDITED 7-24-15: Google has announced that they are looking into a global solution. Analytics Edge released an intuitive Advanced Segment for Google Analytics which implements most of the information we discussed below.

If you are interested in learning more about referral spam, keep reading!

You’ve probably recently logged into your Google Analytics account and seen odd domains like darodar.com, buttons-for-website.com, Huffington Post, and countless others in your referral report.

Perhaps you have looked at your organic keyword reports, and discovered “I love Italy”

You may have noticed a spike in traffic to your site, either directly or indirectly. However, this tends to disappear after a while.

You may have seen visits to URLs that are not on your site (not hack pages).

Spam Filtering is Important

Two types of spam can end up in your Analytics Profile –

First, bots don’t visit your site. We call them “ghost bots”. Ghost bots are spam that is similar to email spam, comment spam, and flyers under your car windshield.

The second category is bots that visit your site. They produce analytics spam as a side effect of their different purposes.

To understand the differences and to prevent them from happening, it is crucial to be able to tell the difference. The effect is almost the same.

Both can cause data skew and pollute website analytics. This could lead to poor interpretations or poor marketing decisions.

Analytics goes beyond counting visits. Analytics tells you the entire story of your online business.

Referral traffic can be skewed by bots. Bots can also decrease conversion rates because they don’t purchase anything or send leads.

These are not things you can ignore or dismiss. You must address them without any side effects, such as slowing down website performance or excluding false positives from analytics.

Who’s it? Why is it happening

I have previously written about comment spam and the people behind spam, analytics spam, and spam.

Ghostbots are people who use almost-free methods to reach an audience or annoy digital graffiti artists.

There are many types of zombie bots. They can be malicious or poorly designed.

Sometimes bots can create fraudulent advertising networks.

They leave behind spamming messages and they will continue to do so

What it does and how to fix it

There is no single solution to all bots (with and without Google), but there are some things that you can do to improve your analytics.

An aside: It is not recommended to use the Referral Exclusion under the Property to filter spam.

  • It’s not the only way.
  • It’s not correct.
  • This can be used to redirect the visitor to the site.
  • Using historical data to check for false positives is not a good way to verify.

Many websites ( even highly-respected ones) recommend server-side technical modifications. Htp access edits.

The Google Analytics checkbox for filtering known bots and spiders is ineffective against ghosts and other bots.

These are the steps to eliminate most analytics spam without exposing data or filtering for false positives.

We will create a separate view to ensure you have clear data. To make viewing your historical view easier, we’ll create an advanced area.

Create a new view using the analytics view. A view with 100% unfiltered data will be useful so that you can check for false positives.

Click on Admin to go to your view’s dashboard. Next, click Settings. Then, click Create Copy.

Name it something like 2 – [www.yourwebsite.com] // Bot Exclusion View.

This view is used to filter all bot traffic. After this view has been set up, an advanced section will be created that will allow you to apply to the primary profile.

Ghost Bot Filtering

Ghost referrers are sessions that appear in analytics, but never occurred. The bot did this by firing an analysis code with a random number. It can be misused.

They don’t appear on your server so they can’t be blocked or filtered.

Because domain names change frequently, they cannot be filtered.

Filter with Hostname is the answer. Navigate in your historical view to Audience Technology Network and choose Hostname as the primary dimension. Be sure to specify the date range for at least the last year.

Hostname is the “full domain address of page requested”. Ghost bots can’t create this dimension as they call UA codes randomly, and don’t visit sites.

Open the historical view hostname report. Set the date range as far back and as long as you can. You should find visits on your domain, translate.google.com, maybe web.archive.org. Your payment processor domain will be visible if you have an ecommerce store. All other hostnames are likely spam, particularly (not set), if they do not serve your content.

Create a list with all valid hostnames. Next, create a regex to only include valid hostnames. An example would be:

yourwebsite.com|translate.google.com|archive.org

This regex will capture subdomains from our main domain and load our site in Google Translate, archive.org, or Google Translate.

Click on Admin to view Filters in the Bot Exclusion View. Add a custom filter.

Enter your regex into the field and select Include Only Hostname.

Save the filter name and save it.

This View filters ghost bots that do not set your domain name within the hostname dimension. It was nearly foolproof until November 2016.

It’s harder than it used to be.

This filter should not be too precise. November 2016 from one of our Bot Exclusion profiles.

Our website is located at a subdomain www. Our website is located on a subdomain of www.

Note that you will need the hostname filter to allow you to serve content to a subdomain (e.g. a new microsite, or shopping cart).

Search your Analytics for suspicious traffic. This round contained legitimate traffic sources but had very spam-like language footprints.

Filtering Zombie Bots

Zombies can visit your site and make it invisible. You have more options. Block them from your server to increase your server load and add an extra layer of scrubbing.

Technical knowledge is essential to keep your site up and running.

These are the steps that I used to filter out bots from analytics.

First, you need to find a common footprint. It can be found at Audience Technology – Network Domain. This report displays the ISP your visitors use to visit your site.

Most people will use familiar retail ISP brands such as Comcast or Verizon.

If you sort this report by Bounce Rate, you should highlight a few items. You can use regex expressions to find domains without user engagement.

amazon|google|msn|microsoft|automattic

Next, you’ll use the Browser & OS report. It can be found at Audience Technology Browser & Os.

This will verify that Mozilla Compatible Agent visited you. These bots will be added to a filter soon.

These footprints capture most of the zombie bots. Let’s look at how to identify specific zombie bots.

Click Acquisition- All Traffic Source/Medium, and then each medium will appear.

Add another dimension to the list, and continue cycling through Users, Traffic, and Users. A dimension such as Internet Explorer7 that shows engagement metrics could indicate a bot.

Watch out for additional footprints. Some zombie bots may not leave footprints.

Now, navigate to the Admin section and Filters in your Bot exclusion view.

Repeat the steps for ghost bots, but instead of Hostname create two filters to exclude both the Network Domain regex and the Browser/OS regex.

Create a new filter to exclude any zombie bots you find. For example, you can create a new filter to Exclude all Referrals from example.com|example.com and/or any others you can find. To verify your filter, make sure you use the Verify data feature

Filtering using advanced segments

You now have a view that filters the majority of bot traffic. It will need occasional auditing and amendments, but it can run itself.

Would you like to view historical traffic from your original perspective

To replicate the Filters you already have set up, an Advanced Segment will be required.

To view your historical data, go to the Reporting dashboard. Click Add Segment. Any name you like, such as “Filter known bots”

Click on Advanced Terms.

Now you’ll add the filters to your bot view. Be sure to note Include/Exclude. Verify your filtering by using the verification function.

Save.

Now you can select the Advanced Segment in any report. This will filter bot traffic according to the given date range. Historical data can also be used with the segment.

Next steps

The low-level nuisance and frustrating phase of spam in Analytics is where serious problems can arise for data-driven campaigns. You need to closely monitor your stats and find ways to post like this.

Until analytics giants find a better way, we’re stuck with filters that filter most bot traffic but don’t catch false positives.

  1. Find out how your site is affected by ghost and zombie bots.
  2. To filter bots, a new view can now be created.
  3. Add filters for ghost bots (Hostname), or zombie bots(Network domain & browser).
  4. Add an advanced section to your historical view using the same filters as for filtering historical traffic.
  5. Conducting regular audits of your analytics is a must. Don’t trust traffic numbers. Make sure you are getting the right story.

AnalyticsEdge offers an excellent article about the subject. Listen to the Bamboo Chalupa podcast episode, ” Why your Analytics Are Bullshit”

We will be happy to hear your thoughts

Leave a reply

Sezmi
Logo
Enable registration in settings - general