ANALYTICS

Cross-Domain Tracking on Google Analytics 4 (GA4)


David Vallejo
Share

Now that Universal Analytics deprecation has been officially announced, it's time to start writtingsome technical posts about how things works on Google Analytics 4.

First one, is going to be about how Cross Domain Tracking works, ( not only for GA4 but also for any GTAG.JS based tool ).

Back in time (about three years ago) I noticed a new "_gl" parameter being attached to some links by Google Analytics, I'd say it stands for "Google Linker" , just looking how it offers support for the cross-domain tracking not only for Google Analytics but also for all the other pixels based on the gtag containers (Adwords, Doubleclick ) and not only for Google Analytics. At that point I started to reverse engineer the code to see how it works. If you're interested on it, you can take a quick look to a notes draft I wrote at that time on this Gist:

To the date Google has only officially published a way to perform the cross-domain tracking . and this is adding our domains in the Data Stream configuration section. ( Admin > Data Streams > Select the stream > More Tagging Settings > Configure your domains )

We may think that some things are happening in the backend, but what's is really happening when we add some configuration in our Admin section is that the current served GTAG Container get's some extra settings.

On this case when we add some domains to our configurationm the GA4 GTAG Container is adding a click/submit listener with some condictions based on the current domains we've added. If you're used to work with Google Tag Manager, think about this like adding a Click Listener and a Form Submit Triggers with a triggering condition based on these domains list.

Please have in mind, that on this case, this will also affect the in-build outgoing links tracking on Google Analytics 4 (It will not trigger the events if the current link domain name is within the defined list on the admin side of our Data Stream).

If you read in between lines from the previous paragraphs you may have guessed that there must be some code on the GTAG.js code taking care of the cross-domain linking, and this means that we can reverse it.

How the Cross Domain Tracking works on the GA4/GTAG

First think is looking how the current new linker parameter looks like in GA4:

_gl=1*tp0qzs*_ga*OTYxNDI4MjA4LjE2NDg1NzM2OTM.*_ga_RNYCK86MYK*MTY0OTE3NjMxMy41LjEuMTY0OTE4MDM1OC4w

We can easily identify that we could split this in some different values by the "*" character:

KeyValue
1tp0qzs
_gaOTYxNDI4MjA4LjE2NDg1NzM2OTM
_ga_RNYCK86MYKMTY0OTE3NjMxMy41LjEuMTY0OTE4MDM1OC4w

The first value ( if I', not wrong ), it's a fingerprint based on the current user agent and browser plugins plus a time checking hash. This will check that the current browser getting the linker is the same as the one that generated it and that it was not generated long ago ( it used to have 2 minutes expiration time on Universal Analytics Linker ). This is done to prevent the cookies being overridden by mistake because shares a link with a linker value on it.

Please note, that we will have as many _ga_[A-Z]{6,9} keys as GA4 Streams being used in our website . and that also will be holding some other cookies values like the Adwords or Double Click ones. This will vary depending on your current setup. For now we'll be just focusing on the Google Analytics 4 ones.

GA4 Cookies Info

If we look back into Google Analytics history, we used to have _utma, _utmb, _utmc, _utmz cookies for Urchin (urchin.js) and First Google Analytics version (ga.js). At this timet all client/session tracking ( a,b,c cookies ) and attribution info (z cookie) was being calculated client-side, then again when Universal Analytics was released all this logic was moved to the server-side and then Google switched to use just one cookie ( _ga ) to keep hold the clientId ( &cid ) .

Now, it seems we switched to some hybrid method, where the session calculation is done client-side again and the attribution it's being calculated on the server-side. That's why we have a new cookie (in addition to the _ga one ), that's is being used to hold the current session_id (&sid), session_count (&sct), session_time, and based on this report also the session_start (&_ss), first_visit (&_fv) internal events .

Let's take a look to a typical Google Analytics 4 Cookies Set:

_ga:             GA1.1.961428208.1648573693
_ga_THYNGSTER:  GS1.1.1649176313.5.1.1649181571.0

There has not been any changes in our well-known "_ga" cookie. It still holds the current hostname segments count, a random Id and lastly the time when the cookie was created for the first time. Here is a table showing the current values ( I need to double check 2 of the values to be sure what are they for, that's why I actually set them as TBD )

ValueDescription
GS1Fixed String: Google Stream 1
1Current hostname segments count
1649176313Current Session Start Time
5Sessions Count
1TBD
1649181571Last hit Time
0TBD

This is the way GA4 is able to determine when a session start, the session duration, and the current session count, even checking the _ga cookie it will be able to define the first visit time.

Looking Inside the Google Analytics 4 Cross-Domain Tracking

If we take a look at the officials docs, there's no much info about how to customize the cross-domain tracking, beside than telling us to add our domains to the admin.

This can be kind of an issue/limitation if our setup is not just based on clicks, or form submits, I can think on some examples like wanting to do a cross-domain linking to an iFrame, or if our site is redirecting to a destination page that it's generated dynamically ( for example for these forms doing a validation and then redirecting the user to a search listing page, without doing any form submit at all )

These situations won't be handled by GA4 and on these cases we'd want to get the build the linkerParam for GA4 so we can attach it wherever we want.

The good point is that , even if it's not documented, we can use some global variables to grab the data, and even make use of some helpers available to generating the linker, decorate links or anything we want :)

First thing we'll we learn is about the window.google_tag_data.gl object. This global variable will be holding ALL the cross-domain config and information. This is: the current data model for the Google Linker Config.

It'll looks like:

{
    "decorators": [
        {
            "domains": [
                /outgoing\.com/
            ],
            "fragment": false,
            "placement": 1,
            "forms": true,
            "sameHost": false
        }
    ],
    "data": {
        "query": {},
        "fragment": {}
    },
    "init": true
}
decorators[:decorator] This holds the current decorators for the cross-domain, we have have more than 1 decorator since we have may have an AdWords gtag container adding one, and GA4 another one.
:decorator.domainsAn array of regexes to be matched against the current clicked link
:decorator.fragmentShould we attach the _gl parameter to the fragment (#)
:decorator.formsIs this decorator for "submit" events
:decorator.placementThe current order preference for applying the decorator
:decorator.sameHostN/A
data{}Current Linker Info
data.queryCross-linking parameters read from the QueryString ( decoded )
data.fragmentCross-linking parameters read from the Fragment ( decoded )

It's more simple that it looks at a first look. This variable will allow us to know if there are any "decorators" configured ( a decorator is meant to "add" the linkerParam (_gl) to any Link or Form that is matching the current domains on the list.

Also if on the current page load there was a valid linker parameter, the data jet will show us which clientIds ( cookies ) were overriden . Nice!


Note: In the event of an invalid linker (if it was generated on a different browser or some minutes ago , the linker won't work and the data here will show empty)

Trick #1 - Adding a new domain name for the auto-decoration

There's a small trick you can do to add new domains dynamically to GTAG Decorator. Remember we said that the current settings on the Admin were reflected on some code in our GTAG containers?, after checking the current code, we can push new domains programatically from JS, just like this:

if(window.google_tag_data && window.google_tag_data.gl.decorators && window.google_tag_data.gl.decorators.length > 0){
     window.google_tag_data.gl.decorators[0].domains.push(/analytics-ninja\.com/)
}

glBridge Utilities

If the case that we want to grab the current linker to we can add it ourselves to any link. There are also some good news, analytics.js exposes some utilities for performing this task.

The utils are available on the window.google_tag_data.glBridge object

As you can see there are the same as we use to have on Universal Analytics for setting the autoLinking, the decoration of links, the linkerParam generation. We are just focusing are the generate and get ones, the first one will be equivalent to the getLinkerParam , the second one will allow us to "unhash" the linker values.

google_tag_data.glBridge.generate({})

This function takes an object of clientIds values as an argument and returns a valid "_gl" linker value that we could attach to our links.

window.google_tag_data.glBridge.generate({
    _ga: '121321321321.2315648466',      
    _ga_THYNGSTER: '1649176313.5.1.1649183273.0'
});

As you can see, we'll need to grab our current cookies values and just pass them to the function and it will return our precious linkerParam :)


google_tag_data.glBridge.get()

This one is also pretty self-explanatory, it will grab the linker param from the current URL (if present) and it will return the client cookies/id decoded values hold on the linker .

Advise

Please note that universal.js is likely to be gone in like 1.5y. I don't expect Google to remove the analytics.js library and just stop processing hits at some point ( or maybe modifying the library so it doesn't fire hits at all). At the moment the gtag.js container doesn't expose this brigde functions, but it may do in some near future.

If you're look to some guidance about how to implement this functionality without relying on that library I'm providing some examples ( reversed from universal.js code )

If the current linkerParam value is "invalid" ( ie: was not generated from the same browser or it generated long ago ), this function will just return an empty object {}

NOTE: I'm working on a library that totally replicates this analytics.js Google Linker Bridge functionality . To have some future proof solution for when Universal Analytics is sunsetted. It will be publish in the next weeks.

If you're interested in all this I'm publishing some proof of concept functions that you could use as a base for your coding. This code should be adapted to support the Adwords, DoubleClick, multiple GA4 cookies, Google Signals ids ( _gid cookie ) , Google Remarketing Cookies ( _gac ) to be able to say that it's a good replacement. But it this point I'm offering these snippets ( all of them reversed/copied from the analytics.js source code )

var decrypt_cookies_ids = function(a, b) {
        var P = function(a) {
            if (encodeURIComponent instanceof Function) return encodeURIComponent(a);
            F(28);
            return a
        };   
        var m = function(a, b) {
            for (var c in b) b.hasOwnProperty(c) && (a[c] = b[c])
        };
        
        var H = function() {
            var a = {};
            var b = window.google_tag_data;
            window.google_tag_data = void 0 === b ? a : b;
            a = window.google_tag_data;
            b = a.gl;
            b && b.decorators || (b = {
                decorators: []
            }, a.gl = b);
            return b
        };
             
        var c = P(!!b);
        b = H();
        b.data || (b.data = {
            query: {},
            fragment: {}
        }, c(b.data));
        c = {};
        if (b = b.data) m(c, b.query), a && m(c, b.fragment);
        return c
    }
var generateLinkerParam = function(a) {
    // Function to properly grab ID's from Cookies
    var getCookiebyName = function(name) {
        var pair = document.cookie.match(new RegExp(name + '=([^;]+)'));
        return !!pair ? pair[1].match(/GA1\.[0-9]\.(.+)/)[1] : undefined;
    };

    // These are the 3 values used by the new linker
    var cookies = {
        _ga: getCookiebyName("_ga"),
        // Google Analytics GA ID
        _gac: undefined,
        // Google Remarketing
        _gid: getCookiebyName("_gid")// Google ID
    };

    // Calculate current browser_fingerprint based on UA, time, timezone and language
    // 
    var browser_fingerprint = (function(a, b) {
        var F = function(a) {
            // Didnt check what this does, the algo just needs F to be defined. commenting out
            Ne.set(a)
        };
        a = [window.navigator.userAgent, (new Date).getTimezoneOffset(), window.navigator.userLanguage || window.navigator.language, Math.floor((new Date).getTime() / 60 / 1E3) - (void 0 === b ? 0 : b), a].join("*");
        if (!(b = F)) {
            b = Array(256);
            for (var c = 0; 256 > c; c++) {
                for (var d = c, e = 0; 8 > e; e++)
                    d = d & 1 ? d >>> 1 ^ 3988292384 : d >>> 1;
                b[c] = d
            }
        }

        F = b;
        b = 4294967295;
        for (c = 0; c < a.length; c++)
            b = b >>> 8 ^ F[(b ^ a.charCodeAt(c)) & 255];
        return ((b ^ -1) >>> 0).toString(36);
    }
    )();

    // Function to hash the cookie value
    // The following functions takes a string and returns a hash value.
    var hash_cookie_value = function(val) {
        var A, C, D = function(a) {
            A = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_.";
            C = {
                "0": 52,
                "1": 53,
                "2": 54,
                "3": 55,
                "4": 56,
                "5": 57,
                "6": 58,
                "7": 59,
                "8": 60,
                "9": 61,
                "A": 0,
                "B": 1,
                "C": 2,
                "D": 3,
                "E": 4,
                "F": 5,
                "G": 6,
                "H": 7,
                "I": 8,
                "J": 9,
                "K": 10,
                "L": 11,
                "M": 12,
                "N": 13,
                "O": 14,
                "P": 15,
                "Q": 16,
                "R": 17,
                "S": 18,
                "T": 19,
                "U": 20,
                "V": 21,
                "W": 22,
                "X": 23,
                "Y": 24,
                "Z": 25,
                "a": 26,
                "b": 27,
                "c": 28,
                "d": 29,
                "e": 30,
                "f": 31,
                "g": 32,
                "h": 33,
                "i": 34,
                "j": 35,
                "k": 36,
                "l": 37,
                "m": 38,
                "n": 39,
                "o": 40,
                "p": 41,
                "q": 42,
                "r": 43,
                "s": 44,
                "t": 45,
                "u": 46,
                "v": 47,
                "w": 48,
                "x": 49,
                "y": 50,
                "z": 51,
                "-": 62,
                "_": 63,
                ".": 64
            };
            for (var b = [], c = 0; c < a.length; c += 3) {
                var d = c + 1 < a.length
                  , e = c + 2 < a.length
                  , g = a.charCodeAt(c)
                  , f = d ? a.charCodeAt(c + 1) : 0
                  , h = e ? a.charCodeAt(c + 2) : 0
                  , p = g >> 2;
                g = (g & 3) << 4 | f >> 4;
                f = (f & 15) << 2 | h >> 6;
                h &= 63;
                e || (h = 64,
                d || (f = 64));
                b.push(A[p], A[g], A[f], A[h])
            }
            return b.join("")
        };
        return D(String(val));
    };

    // Now we have all the data Let's build the linker String! =)
    // First value is a fixed "1" value, the current GA code does the same. May change in a future
    return ["1", browser_fingerprint, "_ga", hash_cookie_value(cookies._ga), "_gid", hash_cookie_value(cookies._gid)].join('*');
};