-
Notifications
You must be signed in to change notification settings - Fork 587
fix(instrumentation-mysql2)!: Missing masking of sql queries #2732
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 6 commits
a369681
1c3dc7c
ea64bdc
0a2aad8
ee9d4dd
8a9c0e8
907b104
f47d8d4
963c343
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -23,6 +23,7 @@ import { | |
SEMATTRS_NET_PEER_PORT, | ||
} from '@opentelemetry/semantic-conventions'; | ||
import type * as mysqlTypes from 'mysql2'; | ||
import { MySQL2InstrumentationQueryMaskingHook } from './types'; | ||
|
||
type formatType = typeof mysqlTypes.format; | ||
|
||
|
@@ -107,22 +108,50 @@ function getJDBCString( | |
export function getDbStatement( | ||
query: string | Query | QueryOptions, | ||
format?: formatType, | ||
values?: any[] | ||
values?: any[], | ||
maskStatement: boolean = true, | ||
maskStatementHook: MySQL2InstrumentationQueryMaskingHook = defaultMaskingHook | ||
): string { | ||
if (!format) { | ||
return typeof query === 'string' ? query : query.sql; | ||
} | ||
if (typeof query === 'string') { | ||
return values ? format(query, values) : query; | ||
} else { | ||
// According to https://github.com/mysqljs/mysql#performing-queries | ||
// The values argument will override the values in the option object. | ||
return values || (query as QueryOptions).values | ||
? format(query.sql, values || (query as QueryOptions).values) | ||
: query.sql; | ||
const [querySql, queryValues] = | ||
typeof query === 'string' | ||
? [query, values] | ||
: [query.sql, hasValues(query) ? values || query.values : values]; | ||
try { | ||
if (maskStatement) { | ||
return maskStatementHook(querySql); | ||
} else if (format && queryValues) { | ||
return format(querySql, queryValues); | ||
} else { | ||
return querySql; | ||
} | ||
} catch (e) { | ||
return 'Could not determine the query due to an error in masking or formatting'; | ||
} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm assuming the rationale here is that we don't want to accidentally leak information by including the error message in the attribute value ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Excatly. Since there could be some stacktrace with the actual string (like 'abc' ist not valid), its better to have this generic errormessage. Also, since the user can provide its own maskingHook, we never know, if there is any problematic code, which forces the leakage of information trough stacktrace. Therefore I thought this is the best way, to prevent this from happening. I can still adjust the errormessage, if you think its not clear enough :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Current message seems perfect to me |
||
} | ||
|
||
/** | ||
* Replaces numeric values and quoted strings in the query with placeholders ('?'). | ||
* | ||
* - `\b\d+\b`: Matches whole numbers (integers) and replaces them with '?'. | ||
* - `(["'])(?:(?=(\\?))\2.)*?\1`: | ||
* - Matches quoted strings (both single `'` and double `"` quotes). | ||
* - Uses a lookahead `(?=(\\?))` to detect an optional backslash without consuming it immediately. | ||
* - Captures the optional backslash `\2` and ensures escaped quotes inside the string are handled correctly. | ||
* - Ensures that only complete quoted strings are replaced with '?'. | ||
* | ||
* This prevents accidental replacement of escaped quotes within strings and ensures that the | ||
* query structure remains intact while masking sensitive data. | ||
*/ | ||
function defaultMaskingHook(query: string): string { | ||
return query | ||
.replace(/\b\d+\b/g, '?') | ||
.replace(/(["'])(?:(?=(\\?))\2.)*?\1/g, '?'); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm no regex expert so this might be something obvious, but what's the reasoning for using a lookahead here rather than just a literal backslash ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Me neither, but at least my understanding of regex is that the lookahead ((?=(\?))) is used to detect an optional backslash without consuming the next character right away. This ensures that escaped quotes (like ") inside a string don’t cause the pattern to end prematurely. By capturing the optional backslash in a group (\2) and then matching it with \2., both escaped and regular characters are handled consistently. Using a direct backslash match (e.g., \?) would make it harder to distinguish between cases where a backslash is present or not, leading to a more complex and less readable regex. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Makes sense. Could you maybe just add a comment to describe the behaviour of the default hook so future contributors don't have to parse the regex to understand what's happening ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I added the following comment as js-doc at the defaultMaskingHook
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you ! |
||
} | ||
|
||
function hasValues(obj: any): obj is QueryOptions { | ||
return obj && typeof obj === 'object' && 'values' in obj; | ||
} | ||
|
||
/** | ||
* The span name SHOULD be set to a low cardinality value | ||
* representing the statement executed on the database. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the performance concerns around applying regex to large queries, I think this should not be on by default. In semconv the recommendation is to not include values from parameterized queries by default, which should remove the need to parse the string query text. This should only apply to non-parameterized queries. Now that the database semconv is stable, a lot of our instrumentations need to be updated and the parameterized value stripping (or rather not-including) should be included there.