ANALYTICS

How to redact PII Data from Google Analytics 4 hits


David Vallejo
Share

If I were asked about some missing feature on Google Analytics 4 ( a.k.a. APP+WEB, New Web Analytics ), I would say it would be the lack of the customTask functionality that my friend Simo has leveraged in the last years.

Sadly at the moment there's nothing similar available ( I really hope to have something in the future ). In the past I collaborated on this Brian Clifton's post/code about How to Remove PII from Google Analytics, So I decided to base the redacting logic on it, just because a lot of people may have already some custom regex list and setup that could be re-used on here.

How it works

Google Analytics 4 bases it's tracking on using navigator.sendBeacon for sending the hits, and falling the old-fashined new Image() functionality if for any reason the current browser doesn't support the first one.

What we are doing in Monkey Patching the browser's sendBeacon functionality using a Proxy Pattern. In order to remove any PII (Personally Identificable Information) from hits payload before they reach the Google Analytlics 4 Endpoint.

Monkey patching is a technique to add, modify, or suppress the default behavior of a piece of code at runtime without changing its original source code. It has been extensively used in the past by libraries, such as MooTools, and developers to add methods that were missing in JavaScript.

https://www.audero.it/blog/2016/12/05/monkey-patching-javascript/

I don't expect GA4 to be failing over the new Image hits many times, but I'm currently working on adding some support for also redacting the hits being sent using this method.

Before going forward

Monkey Patching "never is" a the right way to go, but neither Google Analytics 4 or sendBeacon offers anything to achieve this functionality, so it's the last option to go.

The current code, only tried to override the hits going to Google Analytics 4 endpoint, and leaves any other hits to go in a transparent mode. I've also tried to check everything I was able to think of in order to prevent any issues.

Setting Up Everything

The only thing you need to do is running the attached code to your site, "before" GA4 fires any hit.

If you are using Google Tag Manager you should be using the Tag Secuencing for firing the code before the Config tag is fired, refer to the next screenshot for more details:

If you're using Tealium, you should run this as a "Pre Loader" extension for example.

Example of Redacted GA4 Payload Hit

The Code

(function() {

    /*
    *  
    * Analytics Debugger S.L.U. 2021 ( David Vallejo @thyng )
    *  MIT  License
    * All redact Logic is ran within this function
    * 
    */
    window.__piiRedact = window.__piiRedact || false;
    var piiRedact = function piiRedact(payload) {
        // Regex List
        var piiRegex = [{
            name: 'EMAIL',
            regex: /[^\/]{4}(@|%40)(?!example\.com)[^\/]{4}/gi,
            group: ''
        }, {
            name: 'SELF-EMAIL',
            regex: /[^\/]{4}(@|%40)(?=example\.com)[^\/]{4}/gi,
            group: ''
        }, {
            name: 'TEL',
            regex: /((tel=)|(telephone=)|(phone=)|(mobile=)|(mob=))[\d\+\s][^&\/\?]+/gi,
            group: '$1'
        }, {
            name: 'NAME',
            regex: /((firstname=)|(lastname=)|(surname=))[^&\/\?]+/gi,
            group: '$1'
        }, {
            name: 'PASSWORD',
            regex: /((password=)|(passwd=)|(pass=))[^&\/\?]+/gi,
            group: '$1'
        }, {
            name: 'ZIP',
            regex: /((postcode=)|(zipcode=)|(zip=))[^&\/\?]+/gi,
            group: '$1'
        }];

        // Helper Convert QueryString to an Object 
        var queryString2Object = function queryString2Object(str) {
            return (str || document.location.search).replace(/(^\?)/, "").split("&").map(function(n) {
                return n = n.split("="),
                this[n[0]] = decodeURIComponent(n[1]),
                this;
            }
            .bind({}))[0];
        };
        // Helper Convert an Object to a QueryString
        var Object2QueryString = function Object2QueryString(obj) {
            return Object.keys(obj).map(function(key) {
                return key + '=' + encodeURIComponent(obj[key]);
            }).join('&');
        };
        // Convert the current payload into an object
        var parsedPayload = queryString2Object(payload);
        // Loop through all keys and check the values agains our regexes list
        for (var pair in parsedPayload) {
            piiRegex.forEach(function(pii) {
                // The value is matching?
                if (parsedPayload[pair].match(pii.regex)) {
                    // Let's replace the key value based on the regex
                    parsedPayload[pair] = parsedPayload[pair].replace(pii.regex, pii.group + '[REDACTED ' + pii.name + ']');
                }
            });
        }
        // Build and send the payload back
        return Object2QueryString(parsedPayload);
    };
    if (!window.__piiRedact) {
        window.__piiRedact = !0;
        try {
            // Monkey Patch, sendBeacon 
            var proxied = window.navigator.sendBeacon;
            window.navigator.sendBeacon = function() {
                if (arguments && arguments[0].match(/google-analytics\.com.*v\=2\&/)) {

                    var endpoint = arguments[0].split('?')[0];
                    var query = arguments[0].split('?')[1];
                    var beacon = {
                        endpoint: endpoint,
                        // Check for PII
                        query: piiRedact(query),
                        events: []
                    };
                    // This is a multiple events hit
                    if (arguments[1]) {
                        arguments[1].split("\r\n").forEach(function(event) {
                            // Check for PII
                            beacon.events.push(piiRedact(event));
                        });
                    }

                    // We're all done, let's reassamble everything
                    arguments[0] = [beacon.endpoint, beacon.query].join('?');
                    if (arguments[1] && beacon.events.length > 0) {
                        beacon.events.join("\r\n");
                    }
                }
                return proxied.apply(this, arguments);
            }
            ;
        } catch (e) {
            // In case something goes wrong, let's apply back the arguments to the original function
            return proxied.apply(this, arguments);
        }
    }
}
)();