Skip to content

Django URLs with multiple placeholders cut off after first placeholder #2392

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Abdkhan14 opened this issue Sep 25, 2023 · 5 comments · Fixed by #2432
Closed

Django URLs with multiple placeholders cut off after first placeholder #2392

Abdkhan14 opened this issue Sep 25, 2023 · 5 comments · Fixed by #2432
Assignees

Comments

@Abdkhan14
Copy link

Environment

SaaS (https://sentry.io/)

Steps to Reproduce

  1. The transaction is associated with loading a trace view.
  2. Here's the parent pageload: link
  3. Correct transaction summary: link
  4. Renamed transaction summary: link
  5. The change that introduced the bug: link

Expected Result

Transaction name is /api/0/organizations/{organization_slug}/events-trace/{trace_id}/

Actual Result

Transaction name is renamed from /api/0/organizations/{organization_slug}/events-trace/{trace_id}/ to /api/0/organizations/{organization_slug}/.

Product Area

Other

Link

No response

DSN

No response

Version

No response

@antonpirker antonpirker transferred this issue from getsentry/sentry Sep 26, 2023
@antonpirker antonpirker added this to the Django update milestone Sep 26, 2023
@smeubank
Copy link
Member

smeubank commented Oct 6, 2023

#2416

also noted here in this discussion

@sentrivana
Copy link
Contributor

Relevant/might be fixed by: #2325

@sentrivana sentrivana self-assigned this Oct 9, 2023
@sentrivana sentrivana linked a pull request Oct 9, 2023 that will close this issue
@sentrivana sentrivana changed the title Events trace endpoint transactions being renamed Django URLs with multiple placeholders cut off after first placeholder Oct 10, 2023
@sentrivana
Copy link
Contributor

sentrivana commented Oct 10, 2023

TL;DR: My proposed course of action for now:


Some background:

The fundamental problem here is that we're trying to parse a non-regular language with regular expressions in the resolver. Any regex we come up with will never fully "work". There will always be counter examples of perfectly fine URL patterns that will not be parsed correctly.

To illustrate, consider these proposed candidates and URL patterns they won't work for (paste e.g. to https://regex101.com/ to play with these):

  • r"\(\?P<(\w+)>.*\)": This is the currently used regex that matches too much, so e.g. for a URL pattern such as (?P<project>\w+)/product/(?P<pid>\w+) it greedily matches the whole string instead of two individual groups. This is what leads to us essentially erasing stuff after the first matched group if there's any closing parenthesis anywhere further in the string.
  • r"\(\?P<(\w+)>[^\)]+\)+": This is the original regex, which, while working fine for the multiple named groups case, as soon as there's an extra closing parenthesis in an unfortunate place, will match too little: (?P<project>\w+[()]+)/product/(?P<pid>\w+) (this is a very simplified version of the URL pattern from Django Integration complex url regex pattern cleanup #1527).
  • r"\(\?P<(\w+)>.*?\)(?=/|\$|$)": This is a proposal that covers both the Django CMS case and the multiple named groups case, but comes with a baked in assumption that any named group will be neatly in its own "container" ending with a slash or end of string. If someone has some static stuff appended to a named group, e.g. (?P<project>\w+)/product/(?P<pid>\d+)p, the named group will not be captured.

While this list is definitely not exhaustive, the point I want to make is that any regex we choose will have to come with some assumptions about what constitutes a "correct URL pattern", simply because regexes inherently can't solve this problem. In other words, any change we will make to the regex will introduce a regression, so we should approach any changes to this very carefully. (Or come up with a new approach to named group matching in the resolver.)

@sentrivana sentrivana removed a link to a pull request Oct 10, 2023
@salomvary
Copy link

👍 for the proposed course of action.

Would supporting "nice" transaction names for paths only, while re_paths would be exposed raw?

@sentrivana
Copy link
Contributor

@salomvary The problem is that paths can also contain "hidden" regexes, see e.g. #2446.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Status: No status
Development

Successfully merging a pull request may close this issue.

7 participants