Google Analytics 4 (GA4) Events Demystified

At his point, many ( if not all ) have heard Google Analytics is moving to an "events" based tracking model with Google Analytics 4. But, what does it really imply? Do we have to worry about it?. To be honest, it's not a big ( from the implementation side ) deal since we have been already using "events" all the time, we used to call them hit types. If we look at it from the reporting side it may lead to some "hard times" when trying to use the data, not because it's better or worse, just because it's different.

This post will try to explain Google Analytics 4 Events from the technical perspective, trying to explain how to current event model works, where can the events come from, the limitations, etc.

I'd say that one of the most important things when working with GA4, is realizing how important is going to be the data model definition we do at the start. Because this is going to condition the future of our implementation and data.

But don't worry about this for now. we'll dig into this across the post ?.

How does Google Analytics 4 record the data

Google Analytics 4 works much similarly to Universal Analytics.

We'll be sending hits (network requests) to a specific endpoint ( https://endpoint.url/collect ). This shouldn't be anything new for anyone, that's how all analytics tools and pixels work. And this is the way it works for the client-side tracking (gtag.js), server-side tracking ( measurement protocol ), and the app tracking ( Firebase Analytics SDK ).

Tracking endpoints

I found there are 5 different endpoints that we could use to send the data to Google Analytics 4, these are:

https://www.google-analytics.com/g/collect
https://analytics.google.com/g/collect
https://custom.domain/g/collect (this will really forward the hits to the first one on this list)
https://app-measurement.com
https://www.google-analytics.com/mp/collect

Depending on where we are doing the tracking we'll be using one of them.

We could see hits flowing to 4 different endpoints for GA4 + 1 for Firebase

The first two endpoints are the ones used by the client-side tracking but you may wonder why sometimes we see the hits coming through analytics.google.com, and some other times via the google-analytics.com domain. The reason is that if current GA4 property has "Enable Google signals data collection info" turned on, GA4 will use the *.google.com endpoint ( si Google would be able to use their cookies to identify the users, I guess )

JavaScript Client Library

The page tracking is done using a library provided by Google, the same way we used to have analytics.js , ga.js or urchin.js libraries in the past Google Analytics versions.

The default code snippet will look like this:

<!-- Global site tag (gtag.js) - Google Analytics -->
<script async src="https://www.googletagmanager.com/gtag/js?id=G-THYNGSTER"></script>
<script>
  window.dataLayer = window.dataLayer || [];
  function gtag(){dataLayer.push(arguments);}
  gtag('js', new Date());

  gtag('config', 'G-THYNGSTER');
</script>

If you have noticed it the snippet loads a JavaScript file from www.googletagmanager.com domain, and this is because all gtag.js snippets are in essence a predefined Google Tag Manager template. It's not just a plain GTM container, since it does some internal stuff, but it works also based on tags, triggers, and variables.

Previous tracking libraries were offering a public API to perform all the tracking at our end, ie: it was accepting some methods/calls and converting them to hits, doing the cross-domain tracking allowing us to use Tasks, while at the same time doing some logic for generating the cookies, reading the browser details, and this library was shared across all the users worldwide web.

This is no longer working this way, now each Data Stream / Measurement ID will have its own snippet and it will load a separate js file. We may look at this as a performance penalty but it's done this way for a reason.

Each gtag.js container it's now built dynamically at Google's end and contains personalized code for the current property and also holds the settings for the current data Data Stream / Measurement ID. And that's why the container sizes are different for each container we check. Don't worry, this is normal and expected. The container size will vary depending on many things, like if we have the Enhanced measurement features we have enabled or the current settings we defined on the admin interface for our property.

One thing that has been confusing me since Google Analytics 4 arrived, was thinking that there were lots of things happening on the back that were hardly possible to debug, like the conversions, or the created / modified events.

And well, that's not the way it works, almost any setting or feature you enable on the admin it's going to be translated into code and will be executed on the client-side. This means that when you add a new event on the interface that's will add some code on the gtag.js container will send an event, and this will make that you "may" end seeing "ghost" events on the browser, don't waste your time as me trying to see why your implementation was firing duplicated events :). Or for example when we define a conversion event when we configure our internal domains or the ignored referrals.

While this approach may help some people in doing some common tracking tasks, on the other side it's preventing to do some advanced implementation because some "loved" features like the "customTasks" are now missing. I'm ok with Google trying to control how things are done, but there will always be sites that will need custom /U personalized implementations, and I really feel that Google should provide some public/documented API methods to easily perform some of the most used common tasks like the cross-domain tracking in Google Analytics 4.

Let's see some examples, when you "create a new event" from the Admin Interface, this event won't be created server-side, what' is happening is that GA4 will add some code logic to send that hit client-side.

Google Analytics 4 events creation modal

Another example would be when you enable the Enhanced Measurement, this will turn on having some code added to your container. Remember that we mentioned that GA4 was in essence a Google Tag Manager container?, if you take a look at the current Measuring categories you'll notice how they all match the current triggers available on GTM ( clicks tracking, scrolls tracking, youtube tracking )

And that's not all, when we change the session duration or the engagement time, some session_timeout variables will be updated internally (engagementSeconds, sessionMinutes, sessionHours)

We could keep going on examples, or build a full list, but that's likely going to get outdates sooner than later. The main idea you need to get from this part of the post is that GTAG is like a "predefined" GTM template and that all the tracking happens on the client's browser.

Firebase Analytics SDK

Apps are usually tracked using the Firebase Analytics SDK . A good starting point would be visiting the following Url: https://firebase.google.com/docs/analytics/get-started?platform=android&hl=en

The apps hits will use their own endpoint and format, the hits will go to https://app-measurement.com and the current payload will be sent in binary format, which makes it really difficult to debug, event if using Charles, Fiddles, or any other MITM proxy app.

If you want to debug your Firebase implementation. I recommend you use my Android Debugger for Windows. Once you install the app, you'll be able to request a free lifetime license.

Google Analytics 4 Measurement Protocol

Google Analytics finally offers a proper Measurement "Protocol", which is at the time of writing this post it's in Beta stage.

This protocol will use the https://www.google-analytics.com/mp/collect endpoint, and rather than having the developers build the request payloads using some non-intuitive keys, now it accepts a POST request with a JSON string attached to the body using application/json Content-Type:

fetch('https://www.google-analytics.com/mp/collect?measurement_id=G-THYNGSTER&api_secret=12zneF6DSDFSDFjJPgDAzzQ', {
  method: "POST",
  headers: {
     'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    "client_id": "12345678.87654321",
    "user_id": "RandomUserIdHash",
    "events": [{
      "name": "follow_me_at_twitter",
      "params": {
        "twitter_handle": "@thyng",
        "value": 7.77,
    },{
      "name": "follow_intent",
      "params": {
        "status": "success"
    }]
  })
});

Key	Type
`client_id`	`str`	Required.
`user_id`	`str`	Optional.
`timestamp_micros`	`int`	Optional. Hit offset. Up to 3 days ( 2,592e+11 microseconds ) before the current property's defined timezone.
`user_properties`	`{}`	Optional.
`non_personalized_ads`	`bool`	Optional. ( whatever use this event for ads personalization )
`event[]`	`[]`	Required. ( Max 25 Events per request )
`event[].name`	`str`	Required.
`events[].params`	`{}`	Optional.

In any case, there are some things you need to have in mind, you should keep your API Secret not exposed, meaning that this endpoint should not be used client-side, because that would mean that your API Secret would need to be exposed. This endpoint is more likely to be used to track offline interactions, ( like refunds ), or for tracking our transactions server-side.

At the time of writing this post ( Apr 2022 ), one of the biggest handicaps of this protocol is that it doesn't support any sessionId parameter, meaning that you won't be able to stitch the current server-side hits to the client-side session. This should be fixed over the next months,

In the meanwhile, I've published a the GA4 Payload Parameters CheatSheet, which you could use to send some server-side hits in the old-school way ( like we used to do with the first Measurement Protocol for Universal Analytics ) and where you could attach the "&sid" parameter.

There are of course some other points to have in mind, like that GA4 has some reserved event and parameters names, that you should not be using. We'll cover this later in the "events" section.

Events Model / Hit Types

Let's start by saying that everything on Google Analytics 4 is an "event". I'm sure that it's not the first time you hear that, and it's totally right, but at the same time if we strictly look to Universal Analytics we were also sending "events", but then we used to call them "hit types".

In a technical meaning, nothing has changed at all. We have networks requests to some endpoints. That is it!. If you want to learn a bit more about how the hits are built or sent from the web tracking library you can take a look at GA4: Google Analytics Measurement Protocol version 2 post to learn a bit more about how it works.

The main difference on GA4 is that now Google does not offer a fixed tracking data model besides the page_views and the e-commerce. Meaning that the responsibility for building a proper data model falls on us. While working on our definition we need to have in mind that there are some predefined/reserved event and parameters names and that we have some limits we need to have in count (About total events, names, and values lengths).

Universal Analytics Hit Types Model

If we take a closer look, since Urchin times we've been using "events" for our tracking in Google Analytics. Yep, I'm not joking, we had, we just called them "hit types".

Just so you know, we could replicate the current Universal Analytics Data Model in Google Analytics 4 following the next table of events:

Hit Type / Event	Parameters
`pageview`	- Location - Path - Title
`event`	- Category - Action - Label - Value - Non Interaction
`timing`	- Category - Variable - Label - Value
`social`	- Network - Action - Opt. Target
`exception`	- Description - Fatal
`screenview`	- Screen Name
`transaction` ( Legacy Ecommerce )	- Id - Affiliation - Revenue - Tax - Shipping - Coupon
`item` ( Legacy Ecommerce )	- Id - Name - Brand - Category - Variant - Price - Quantity

Even Google offers a setting that will automatically convert all your ga() calls to some predefined events on GA4. From your Data Stream configuration you can enable this feature and all events, timing, and exception events will be converted to GA4 events ( they will add a listener to the ga('sent', 'event|exception|timing') calls for doing this,

This tool wil map the data in the following way:

Event Name	Parameters
`[event_name]`	This will take the current eventAction eventCategory > event_category eventAction > event eventLabel > event_label eventValue > value
`timing_complete`	timingCategory > event_category timingLabel > event_label timingValue > value timingVar > name
`exception`	exDescription > description exFatal > fatal

Beware because since its converting all Event Actions on "events", depending on your current de events definition on Universal Analytics you have end up hitting the unique event names limit (500)

Google Analytics 4 Events

Event Sources

The events on Google Analytics 4 can come from 4 different sources. These are:

Public Web/App endpoint.
Measurement Protocol ( Server Side )
Internal self-generated events
Admin defined events

Public Web Endpoint

The main actual origin for GA4 events we've already talked about them. These are the event that is being generated on our site coming from the GTAG.js container ( Check the GA4 Payload Parameters CheatSheet here ).

Measurement Protocol ( Server Side )

Another source for our events is the measurement protocol. This works similarly to the public endpoint. but the hits would be sent via server-side and we'll need to use an API Secret within our requests.

Internal self-generated Events

This one can be a bit confusing, GA4 auto-generates some of the events we see in the reports. This means that we see some events in our reports that won't be seen in our browser.

This doesn't mean that they're being generated randomly or using some server-side logic. Most ( if not all ) of these events are created because a parameter was added to some event.

Our events payloads may have some extra parameters attached to them sometimes that will make GA4 internally spawn a separate event. As far as I've been able to identify this is the list of the internally generated events and what's the parameter that will trigger them.

Event Name	Trigger
session_start	&_ss
first_visit	&_fv
user_engagement	&seg

For example, if the current event payload contains a &_ss parameter, a session_start will be generated, if it contains a $_fv then we should be able to see a first_visit events and so on. This list may grow in the future (and it may be missing some events that I've not been able to spot yet)

If we've enabled the Enhanced Measurement, we may also see some events in our reports ( this time this event will be visible without the browser requests ), these are:

Event Name	Parameters
`click`	link_id link_classes link_url link_domain outbound
`file_download`	link_id link_text link_url file_name file_extension
`video_play` `video_pause` `video_seek` `video_buffering` `video_progress` `video_complete`	video_url video_title video_provider video_current_time video_duration video_percent visible
`view_search_results`	search_term
`scroll`	percent_scrolled
`page_view`	page_referrer ( URL and Title are Shared Parameters )

On the other side, when working with the Firebase Analytics SDK, this one will automatically track a lot of events, without us needing to explicitly define them.

Here is the current list of autogenerated event names by Firebase:

ad_activeview	`APP`
ad_click	`APP`
ad_exposure	`APP`
ad_impression	`APP`
ad_query	`APP`
adunit_exposure	`APP`
app_clear_data	`APP`
app_install	`APP`
app_update	`APP`
app_remove	`APP`
error	`APP`
first_open	`APP`
in_app_purchase	`APP`
notification_dismiss	`APP`
notification_foreground	`APP`
notification_open	`APP`
notification_receive	`APP`
os_update	`APP`
screen_view	`APP`
user_engagement	`APP`,

Note: These events will not count towards the unique events name limit

Admin defined events

We've already talked about these ones, when we create or modify an event within the admin section, these settings will be translated to the client-side tracking.

This means the following:

You may see events being fired on the browser that you didn't define on Google Tag Manager or GTAG. This is normal, don't go crazy with it. If you see a duplicate event or a new event that you don't know where it's coming from take a look at the Data Stream Settings
You may have some unexpected parameters or event names if a "modify" rule is being used.

Events Limitations

Google Analytics 4 is full of limitations in many aspects, and it makes it a bit difficult to understand all of them, even more, when the limits keep constantly changing.

We have limits for event names and values length, same for the event parameters and the user properties. At the same time, we have a limit on how many parameters and properties we can append to each event. And these limits may vary between the free and 360 versions.

There are also, some exporting limitations (The free version it's capped to 1M daily hit export to Big Query ) or the data retention settings wherein the free version will top at 14 months while the 360 will allow to hold up to 50 months on data.

But this is not all the limits we'll have ... we will also have limits for the total conversions, audiences, insights, and funnels we can set. This is not directly related to the events, so if you're interested you can visit the official Configuration Limits Information.

Collecting and Names Limitations

We can attach up to 25 event parameters ( 100 on GA4 360 ) to each event, and we can identify these values in our hits easily these are the ones starting with "^ep(|n).*". Event Parameters are meant to add some metadata to our events.

ep.event_origin: gtag

Each of these parameters should have a name no longer than 40 characters and a value not bigger than 100 characters.

At the same type, we have the "user properties", We can attach up to 25 user properties to each hit these are attributes that will describe segments for our users. For example, we could think about recording the current user newsletter sign-up status, or the total purchases made by the current user. We can identify his data in our hits because they will start with "^up(|n).*",

up.newsletter_opt_in: yes
upn.user_total_purchases: 43

Each of these properties should have a name no longer than 24 characters and a value not bigger than 36 characters.

Logged item	Limit	Free	360
`Events`	Event Name	40 chars
	Event parameter Name	40 chars
	Event parameter Value	100 chars
	Params per event	25	100
`User properties`	Total per Property	25
	Property Name	24 chars
	Property Value	36 chars
	User-ID	256 characters
`Custom dimensions`	Event Scope	50	125
	Item Scope	10
	User Scope	25	100
`Custom Metrics`	Event Scope	50	125
`Events Offset`		3 days

Full Limits Table

Event Values Typing

You may have noticed that some of the parameters start may start with up, ep, upn, epn . This is because an event parameter/user property can be either a string or a number, the good news is that we don't need to define them since they're automatically typed by GA4. Just take a look at the logic it's used to define if a parameter is a string or a number.

var value = 'something';
if(typeof(value) === "number" && !isNaN(value)){
    console.log("is a number parameter");   
}else{
    console.log("is a string parameter");
}

SGTM - Google Analytics 4 Hits

The last thing I want to shout out is that GA4 hits sent via Server Side Google Tag Manager, are able of doing two things that we won't see on the regular hits.

First of these is that the hits sent server-side are able to set first-party cookies on the user browser, this is achieved using a Cookie-set header to the request:

And the last one is that they may contain a response body, this is used to send back some pixels client-side. ie: SGTM builds up a pixel request and gets it back to the browser so it gets sent if for example, it was missing some third party cookie value (where sending it via server-side won't be making any difference )