Bug 1990006 - Create an initial ETL workflow for gecko-trace component#9553
Bug 1990006 - Create an initial ETL workflow for gecko-trace component#9553jjjalkanen wants to merge 10 commits into
Conversation
| --- | ||
| friendly_name: Gecko Trace | ||
| description: |- | ||
| Unified views of spans and traces from the Gecko Firefox engine across all Firefox applications. |
There was a problem hiding this comment.
Can traces contain sensitive data? Mainly trying to understand if mozilla-confidential is appropriate here.
There was a problem hiding this comment.
This is strictly data-classification category 1, technical data
| AVG(duration_nano) AS average_duration_nano, | ||
| COUNT(*) AS hits | ||
| FROM | ||
| `{{ target_project }}.{{ app_id }}_derived.gecko_trace_traces_v1` |
There was a problem hiding this comment.
is it really worth creating a derived dataset for this instead of having this logic in the view on top of gecko_trace_traces_v1?
There was a problem hiding this comment.
No. I changed it to a view (let me know if I didn't do it correctly)
| PING_NAME = "gecko_trace" | ||
| APPLICATIONS = ( | ||
| "firefox_desktop", # The desktop version of Firefox | ||
| "org_mozilla_fenix_nightly", # Nightly channel of Firefox Preview |
There was a problem hiding this comment.
| "org_mozilla_fenix_nightly", # Nightly channel of Firefox Preview | |
| "org_mozilla_fenix_nightly", # Nightly channel of Firefox for Android |
| TEMPLATES = THIS_MODULE / "templates" | ||
| PING_NAME = "gecko_trace" | ||
| APPLICATIONS = ( | ||
| "firefox_desktop", # The desktop version of Firefox |
There was a problem hiding this comment.
Is the intention to only work with non-release channels?
There was a problem hiding this comment.
The plan was to stabilize this with non-release channels only. I suppose we could also add all channels here and just not send any data from the non-release client builds.
| owner: mvanstraten@mozilla.com | ||
| retries: 2 | ||
| retry_delay: 30m | ||
| start_date: "2025-09-26" |
There was a problem hiding this comment.
Is this when we want the table start having data from? My suggestion we update this to reflect date when we add this DAG and instead after this gets merged adding managed_backfill.yaml for adding historical data to the table.
There was a problem hiding this comment.
Let me know if it doesn't look right. The start date doesn't need to be so far back in the past, thanks for catching it.
4ebe4da to
d8da46e
Compare
|
While changing gecko_trace_signatures_v1 to a view, I realized that we will need the capability to group the results by origin - mobile vs desktop, nightly vs release etc. So I now included app_id to the derived tables following the example of the other datasets. |
39972c0 to
b309c89
Compare
b309c89 to
0eba4d8
Compare
0eba4d8 to
76e3666
Compare
76e3666 to
6b8f21c
Compare
This patch adds an initial ETL workflow for processing traces collected by the [gecko-trace component](1) from varius Gecko based Firefox products. [1]: https://searchfox.org/firefox-main/source/toolkit/components/gecko-trace
The issues need to be sent to DOM LWS team for the time being.
Clarifies Firefox Preview to Firefox Android and changes an early start date to a recent start date plus backfill.
sqlglot doesn't understand right shift, and it appears that the UDFs can be inlined into queries when we use a standardized way to generate signatures.
Partitioning by submission_date is sufficient for query optimization.
6b8f21c to
eda5a24
Compare
Description
This PR adds an initial ETL workflow for processing traces collected by the gecko-trace component from various Gecko based Firefox products. It's adopted from the original PR .
Related Tickets & Documents
https://bugzilla.mozilla.org/show_bug.cgi?id=1990006
https://docs.google.com/document/d/1HIcggXk8EZ7_4x57M20Rc9RmXenmh6cDtQ2KCJClvKo/edit?usp=sharing