Eight Steps to Cleaner Data in Google Analytics

  1. Tag your sources
  2. Update organic search
  3. Normalise your data
  4. Exclude internal traffic
  5. Block the bots and spiders
  6. Remove anomalies
  7. Remove duplicate transactions

1. Secure your Site

One often misunderstood trouble spot for Google is how it treats clicks from a secured site (HTTPS) to an unsecured site (HTTP). If your site is still not secure (HTTP rather than HTTPS) and you have links from a growing number of sites that are, there’s a good chance an increasing percentage of your referral traffic will be stripped and masked as Direct Traffic. The quick solution is to secure your site to ensure any inbound links from HTTPS pages go to HTTPS pages on your own site.

2. Tag your Sources

This one is a simple one and it’s all on you. If you’re advertising across multiple channels or running email campaigns, it’s important to add tracking parameters to your campaigns. If you miss this one step, there’s a good chance your paid campaigns will be reported as Referral traffic, Direct or Other as shown here.

Default Channel Grouping showing Other channel with misplaced traffic
Referral source showing missing Medium tag
New channel to sort paid social campaigns

3. Update Organic Search

Though Google dominates organic search in the US and many other countries they are by no means the only player in town. In fact, there are a number of other existing and new entries in the space that tend to get missed by Google Analytics. In fact, it’s not uncommon to find search engines like Yahoo, Ask and DuckDuckGo sitting in your Referral basket.

Misplaced organic search traffic sitting in Referral channel
Example: Organic Search Source filter

4. Normalize your Data on the Fly

A common issue with Google Analytics is that you may see separate entries for what are effectively the same campaign name, content or page URL. The reason for this is that Google Analytics is case sensitive and captures each string as unique and distinct from others. In other words, if you’ve used campaign names for your marketing campaigns that include both lowercase and Uppercase (such as red dresses, Red Dresses and Red dresses) Google Analytics will treat them as separate campaigns rather than one in your reports.

Example: New filter to normalise all campaign names in lowercase
Example: New filter to normalize path paths to lowercase
  • l.facebook.com
  • facebook.com
  • business.facebook.com
  • lm.facebook.com
Example: Search and Replace filter to combine all Facebook traffic as a single source
  • Combine all Facebook, Linkedin, Twitter, Instagram & Pinterest sources
  • Remove all Query Parameters from Page Paths
  • Exclude Internal Traffic from IP Address
  • Include your Hostname (see section 5)
  • Exclude Referral Spam & Crawlers (see section 6)
Example: How to normalise data in Google Tag Manager

5. Exclude Internal Traffic

This may sound like a no brainer, but if you work for a large company with employees who regularly access the company website on a regular basis, there’s a good chance their actions are diluting your data. To remove your own data, just add your company’s IP address(es) and any third party agency or developer IP addresses who browse the site for their work as a filter.

Example: IP address exclusion filter

6. Block the Bots & Spiders

If you spend any time reviewing the source traffic of your site, you’ve probably come across weird looking traffic sources that leave you scratching your head. With a plethora of opportunities to scrape content, search for known vulnerabilities and conduct a host of activities there are many known bots and spiders roaming the web to take what they find back to their owners hive. While many of these bots and spiders are fairly benign, they end up inflating your data and messing with your sense of cleanliness.

Example: Basic bot filtering in Google Analytics

Exorcise your Ghosts

If you really want to ditch the dirty data, you need to block spam and other bots. One of the most common types of spam is from sources with fake hostnames, also called ghost spam.

Example: Filter to remove ghost spam

ISP Organisation Networks

Interestingly, one common area of bot related traffic arrives direct from ISP organisations like Google, Alibaba, Microsoft, Facebook and others. To remove these ISP organisations, you need to create a Filter with the following expression:

Example: Filter to remove bots from common Internet Service Providers

Watch your Language

Another area bots can affect your data is by disguising the language and showing up as some out of place language. One problematic source is traffic the that shows up is with the language set to C which doesn’t officially correspond with any specific language.

Example of bot traffic with language disguised as c
Example of bot traffic with special messages
Example: Filter to remove language spam

Eliminate Fake Referral Spam

Well before fake news entered the political discourse we had fake referrals also known as referral spam. Referral spam is a fake URL that registers as a source in Google Analytics in the hope you might visit the URL and learn what they have to offer.

Example: Referral Source Report in Google Analytics with Bounce Rate filter set to 100%
Example: Custom Report in Google Analytics with Bounce Rate filter set to 100%
Example: Exclusion filter to remove fake referral spam
Example: A segment template to filter historical data using scripts provided by Carlos Escalera
Example: Side by side of data with Clean segment applied

7. Remove Anomalies

While referral spam and bots can mess with your numbers, you may also come across anomalies in your own data that skew any comparisons.

Example: Month over month comparison with data containing anomalies
Example: Source of anomaly showing high number of Transactions & low Avg. Order Value
Example: True month over month comparison with anomalies removed

8. Remove Duplicate Transactions

One of the more annoying factors of Google Analytics is that once a session or action is recorded there’s no way to physically remove it from Google Analytics. This can become problematic if you allow your customers to revisit the order confirmation or receipt page more than once after their current session, thereby triggering the event and causing Google to record another transaction, as you can see here in this example:

Example: A custom report to check for duplicate transactions

Ecommerce & Data Driven Executive and Mentor @ Founders Institute Vancouver & Futurepreneur Canada