Securing Google Analytics 4 (GA4): Mitigating Spam and Protecting Measurement IDs with Server-Side Strategies

Disclaimer: Got a report of this may breaking some Google Ads Integration. I'm gathering details to understand the issue and try to find a workaround (if possible).

One of the most recurrent issues while trying keep or data clean is needing to fight with the spammers or the script kiddies that pollute our properties. This is not something new and it's been around since the start of the times.

This was even a bigger issue on Universal Analyticswhere property IDs were sequential. This made it easy for malicious actors to systematically spam all accounts with minimal effort.With the transition to Google Analytics 4 (GA4), the new Measurement IDs are no longer sequential. This change makes it significantly more difficult to indiscriminately spam all accounts, as the previous method of programmatically targeting consecutive IDs is no longer feasible.

Still we're not safe from having someone looking at this network requests or even having some people crawling websites and making a lost of the Measurement IDs used to having our properties with some unneeded data.

Also Rick Dronkers has been talking about this on linkedin https://www.linkedin.com/feed/update/urn:li:activity:7210944583294177281/ ,

Sadly, Google Analytics 4 (GA4) doesn't have good controls to stop or filter out unwanted data from coming into your account.

Due to the nature of the Analytics Tracking ,which happens mainly on client side, there's no way to stop this spam. But we can take some actions to try to mitigate this issue. And we're going to talk about the most important one from my point of view. This is:

Not Allowing spammers to know our real Measurement ID

The last month my company ( Analytics Debugger ) became a Stape.io partner, since we started to offer Server-Side services. Which is allowing me to start playing more with the technology. and luckily this reported spam attack over makes the perfect introduction for a new Server-Side Trick.

Running GTM/GA4 via SGTM in Stealth Mode.

We're using a Client Template to make our Measurement ID invisible to attackers. This will keep the script kiddies that crawl our sties away since they'll be getting a fake ID , And the tempalte will take care of forwarding the final request with the proper ID.

But David, the could still point to our server-side endpoint and hits will end hitting our property, and that right!, but since we're running a server-side endpoint we can enforce some rules ( working on refactoring some old code I have in PHP to make a SGTM template ) to prevent the spam. At this point we have a lot of rules to try to fight the spam traffic, for example:

User Agent Checking

Checking the Request IP address Against the associeted ASN ( Autonomus System Number ) , most of the ISP or DC providers have one, so it's a easy task to filter out not residential connections using this method. There're even some IP database that are open ( this information is provided by RIPE )

The use of public ip lists of DCs, Cloud servicers provides, VPNS, etc. For example: https://github.com/jhassine/server-ip-addresses , https://udger.com/resources/datacenter-list, https://github.com/growlfm/ipcat/blob/main/datacenters-stats.csv for example, if you're really into it there're some companies mainting some paid services for consulting if the current IP belongs to non resential connection

But not only this, we're on server side, meaning that we can easily build some last 15 minutes database list, and build some throttling mechanism, or we could also check the IP GEO location, let's be honest I'm from spain, getting too much traffic from some countries may the unsual. But the best of all this that could even build some internal IP Score, allowing us to tag spam traffic ( not event removing it ) for example assing the &tt= parameter via SGTM if the current score is > 5 ( you'll need to the setup the rules ).

In any case, I know some other people has been talking about this in the past, I feel this could be great change for running a hackaton with the smart people around ( maybe in the Analytics DevFest becames a reality :)

So, filtering hits has been a long-debated topic, where Server-Side can play an important role. However, if we don't hide our Measurement ID, spammers can directly target GA4 servers and bypass any implemented checks.

Let's me start showing how your setup will look like, please note that the possible attacker will only see our GTM-DEBUGEMALL, and G-DEBUGEMALL Ids. At any point he will be able to target Google Servers to grab the real data :)

But still with our Server Side send the data to Google., replaced the &tid by the real one:

Implementing this involves setting up a Server-Side Client Template, which you can download from here: GTM/GA4 Stealth Mode Client Template and setting up the real and fake ids you want to use in our setup, refer to the following screenshot for guidance:

Last step would be slightly updating our GTM Loading snippet. You may see that there's an extra /s/ before the gtm.js, this is because SGTM doesn't seem to allow you to claim the requests to the knows GTM/GA endpoints ( or I was not able to do it ... )

	<script>
 
	(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':
	new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],
	j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src=
	'https://our.server.side.endpoint.com/s/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);
	})(window,document,'script','dataLayer','GTM-DEBUGEMALL');

</script>

The template needs to be improved, and I don't consider it production ready, ( despite I'm using it on this own blog )

Essentially, we can utilize all our methods to combat spam, as we have been doing, while ensuring that the ID remains concealed to prevent attackers from circumventing any measures aimed at safeguarding our data integrity.