Skip to content

Bug 1990006 - Create an initial ETL workflow for gecko-trace component#9553

Open
jjjalkanen wants to merge 10 commits into
mozilla:mainfrom
jjjalkanen:adopted-etl-gecko-trace
Open

Bug 1990006 - Create an initial ETL workflow for gecko-trace component#9553
jjjalkanen wants to merge 10 commits into
mozilla:mainfrom
jjjalkanen:adopted-etl-gecko-trace

Conversation

@jjjalkanen

Copy link
Copy Markdown

Description

This PR adds an initial ETL workflow for processing traces collected by the gecko-trace component from various Gecko based Firefox products. It's adopted from the original PR .

Related Tickets & Documents

https://bugzilla.mozilla.org/show_bug.cgi?id=1990006
https://docs.google.com/document/d/1HIcggXk8EZ7_4x57M20Rc9RmXenmh6cDtQ2KCJClvKo/edit?usp=sharing

@jjjalkanen jjjalkanen requested a review from a team as a code owner June 9, 2026 11:32
---
friendly_name: Gecko Trace
description: |-
Unified views of spans and traces from the Gecko Firefox engine across all Firefox applications.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can traces contain sensitive data? Mainly trying to understand if mozilla-confidential is appropriate here.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is strictly data-classification category 1, technical data

AVG(duration_nano) AS average_duration_nano,
COUNT(*) AS hits
FROM
`{{ target_project }}.{{ app_id }}_derived.gecko_trace_traces_v1`

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it really worth creating a derived dataset for this instead of having this logic in the view on top of gecko_trace_traces_v1?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. I changed it to a view (let me know if I didn't do it correctly)

Comment thread sql_generators/gecko_trace/__init__.py Outdated
PING_NAME = "gecko_trace"
APPLICATIONS = (
"firefox_desktop", # The desktop version of Firefox
"org_mozilla_fenix_nightly", # Nightly channel of Firefox Preview

@kik-kik kik-kik Jun 9, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"org_mozilla_fenix_nightly", # Nightly channel of Firefox Preview
"org_mozilla_fenix_nightly", # Nightly channel of Firefox for Android

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

TEMPLATES = THIS_MODULE / "templates"
PING_NAME = "gecko_trace"
APPLICATIONS = (
"firefox_desktop", # The desktop version of Firefox

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the intention to only work with non-release channels?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plan was to stabilize this with non-release channels only. I suppose we could also add all channels here and just not send any data from the non-release client builds.

Comment thread dags.yaml Outdated
owner: mvanstraten@mozilla.com
retries: 2
retry_delay: 30m
start_date: "2025-09-26"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this when we want the table start having data from? My suggestion we update this to reflect date when we add this DAG and instead after this gets merged adding managed_backfill.yaml for adding historical data to the table.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if it doesn't look right. The start date doesn't need to be so far back in the past, thanks for catching it.

@jjjalkanen

Copy link
Copy Markdown
Author

While changing gecko_trace_signatures_v1 to a view, I realized that we will need the capability to group the results by origin - mobile vs desktop, nightly vs release etc. So I now included app_id to the derived tables following the example of the other datasets.

michaelvanstraten and others added 7 commits June 12, 2026 13:53
This patch adds an initial ETL workflow for processing traces collected
by the [gecko-trace component](1) from varius Gecko based Firefox
products.

[1]:
https://searchfox.org/firefox-main/source/toolkit/components/gecko-trace
The issues need to be sent to DOM LWS team for the time being.
Clarifies Firefox Preview to Firefox Android and changes an early start
date to a recent start date plus backfill.
sqlglot doesn't understand right shift, and it appears that the UDFs
can be inlined into queries when we use a standardized way to
generate signatures.
@jjjalkanen jjjalkanen force-pushed the adopted-etl-gecko-trace branch from 6b8f21c to eda5a24 Compare June 12, 2026 11:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants