Skip to content

fix(instrumentation-mysql2)!: Missing masking of sql queries #2732

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
3 changes: 3 additions & 0 deletions plugins/node/opentelemetry-instrumentation-mysql2/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,9 @@ You can set the following instrumentation options:
| ------- | ---- | ----------- |
| `responseHook` | `MySQL2InstrumentationExecutionResponseHook` (function) | Function for adding custom attributes from db response |
| `addSqlCommenterCommentToQueries` | `boolean` | If true, adds [sqlcommenter](https://github.com/open-telemetry/opentelemetry-sqlcommenter) specification compliant comment to queries with tracing context (default false). _NOTE: A comment will not be added to queries that already contain `--` or `/* ... */` in them, even if these are not actually part of comments_ |
| `maskStatement` | `boolean` | If true, masks the `db.statement` attribute in spans (default true) with the `maskStatementHook` |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the performance concerns around applying regex to large queries, I think this should not be on by default. In semconv the recommendation is to not include values from parameterized queries by default, which should remove the need to parse the string query text. This should only apply to non-parameterized queries. Now that the database semconv is stable, a lot of our instrumentations need to be updated and the parameterized value stripping (or rather not-including) should be included there.

| `maskStatementHook` | `MySQL2InstrumentationMaskStatementHook` (function) | Function for masking the `db.statement` attribute in spans Default: `return query.replace(/\b\d+\b/g, '?').replac(/(["'])(?:(?=(\\?))\2.)*?\1/g, '?');`|


## Semantic Conventions

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -137,13 +137,20 @@ export class MySQL2Instrumentation extends InstrumentationBase<MySQL2Instrumenta
} else if (arguments[2]) {
values = [_valuesOrCallback];
}

const { maskStatement, maskStatementHook, responseHook } =
thisPlugin.getConfig();
const span = thisPlugin.tracer.startSpan(getSpanName(query), {
kind: api.SpanKind.CLIENT,
attributes: {
...MySQL2Instrumentation.COMMON_ATTRIBUTES,
...getConnectionAttributes(this.config),
[SEMATTRS_DB_STATEMENT]: getDbStatement(query, format, values),
[SEMATTRS_DB_STATEMENT]: getDbStatement(
query,
format,
values,
maskStatement,
maskStatementHook
),
},
});

Expand All @@ -166,7 +173,6 @@ export class MySQL2Instrumentation extends InstrumentationBase<MySQL2Instrumenta
message: err.message,
});
} else {
const { responseHook } = thisPlugin.getConfig();
if (typeof responseHook === 'function') {
safeExecuteInTheMiddle(
() => {
Expand Down
19 changes: 19 additions & 0 deletions plugins/node/opentelemetry-instrumentation-mysql2/src/types.ts
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,26 @@ export interface MySQL2InstrumentationExecutionResponseHook {
(span: Span, responseHookInfo: MySQL2ResponseHookInformation): void;
}

export interface MySQL2InstrumentationQueryMaskingHook {
(query: string): string;
}

export interface MySQL2InstrumentationConfig extends InstrumentationConfig {
/**
* If true, the query will be masked before setting it as a span attribute, using the {@link maskStatementHook}.
*
* @default true
* @see maskStatementHook
*/
maskStatement?: boolean;

/**
* Hook that allows masking the query string before setting it as span attribute.
*
* @default (query: string) => query.replace(/\b\d+\b/g, '?').replace(/(["'])(?:(?=(\\?))\2.)*?\1/g, '?')
*/
maskStatementHook?: MySQL2InstrumentationQueryMaskingHook;

/**
* Hook that allows adding custom span attributes based on the data
* returned MySQL2 queries.
Expand Down
53 changes: 41 additions & 12 deletions plugins/node/opentelemetry-instrumentation-mysql2/src/utils.ts
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ import {
SEMATTRS_NET_PEER_PORT,
} from '@opentelemetry/semantic-conventions';
import type * as mysqlTypes from 'mysql2';
import { MySQL2InstrumentationQueryMaskingHook } from './types';

type formatType = typeof mysqlTypes.format;

Expand Down Expand Up @@ -107,22 +108,50 @@ function getJDBCString(
export function getDbStatement(
query: string | Query | QueryOptions,
format?: formatType,
values?: any[]
values?: any[],
maskStatement: boolean = true,
maskStatementHook: MySQL2InstrumentationQueryMaskingHook = defaultMaskingHook
): string {
if (!format) {
return typeof query === 'string' ? query : query.sql;
}
if (typeof query === 'string') {
return values ? format(query, values) : query;
} else {
// According to https://github.com/mysqljs/mysql#performing-queries
// The values argument will override the values in the option object.
return values || (query as QueryOptions).values
? format(query.sql, values || (query as QueryOptions).values)
: query.sql;
const [querySql, queryValues] =
typeof query === 'string'
? [query, values]
: [query.sql, hasValues(query) ? values || query.values : values];
try {
if (maskStatement) {
return maskStatementHook(querySql);
} else if (format && queryValues) {
return format(querySql, queryValues);
} else {
return querySql;
}
} catch (e) {
return 'Could not determine the query due to an error in masking or formatting';
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming the rationale here is that we don't want to accidentally leak information by including the error message in the attribute value ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excatly. Since there could be some stacktrace with the actual string (like 'abc' ist not valid), its better to have this generic errormessage. Also, since the user can provide its own maskingHook, we never know, if there is any problematic code, which forces the leakage of information trough stacktrace. Therefore I thought this is the best way, to prevent this from happening. I can still adjust the errormessage, if you think its not clear enough :)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current message seems perfect to me

}

/**
* Replaces numeric values and quoted strings in the query with placeholders ('?').
*
* - `\b\d+\b`: Matches whole numbers (integers) and replaces them with '?'.
* - `(["'])(?:(?=(\\?))\2.)*?\1`:
* - Matches quoted strings (both single `'` and double `"` quotes).
* - Uses a lookahead `(?=(\\?))` to detect an optional backslash without consuming it immediately.
* - Captures the optional backslash `\2` and ensures escaped quotes inside the string are handled correctly.
* - Ensures that only complete quoted strings are replaced with '?'.
*
* This prevents accidental replacement of escaped quotes within strings and ensures that the
* query structure remains intact while masking sensitive data.
*/
function defaultMaskingHook(query: string): string {
return query
.replace(/\b\d+\b/g, '?')
.replace(/(["'])(?:(?=(\\?))\2.)*?\1/g, '?');

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm no regex expert so this might be something obvious, but what's the reasoning for using a lookahead here rather than just a literal backslash ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me neither, but at least my understanding of regex is that the lookahead ((?=(\?))) is used to detect an optional backslash without consuming the next character right away. This ensures that escaped quotes (like ") inside a string don’t cause the pattern to end prematurely. By capturing the optional backslash in a group (\2) and then matching it with \2., both escaped and regular characters are handled consistently. Using a direct backslash match (e.g., \?) would make it harder to distinguish between cases where a backslash is present or not, leading to a more complex and less readable regex.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Could you maybe just add a comment to describe the behaviour of the default hook so future contributors don't have to parse the regex to understand what's happening ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the following comment as js-doc at the defaultMaskingHook

/**
 * Replaces numeric values and quoted strings in the query with placeholders ('?').
 *
 * - `\b\d+\b`: Matches whole numbers (integers) and replaces them with '?'.
 * - `(["'])(?:(?=(\\?))\2.)*?\1`:
 *   - Matches quoted strings (both single `'` and double `"` quotes).
 *   - Uses a lookahead `(?=(\\?))` to detect an optional backslash without consuming it immediately.
 *   - Captures the optional backslash `\2` and ensures escaped quotes inside the string are handled correctly.
 *   - Ensures that only complete quoted strings are replaced with '?'.
 *
 * This prevents accidental replacement of escaped quotes within strings and ensures that the
 * query structure remains intact while masking sensitive data.
 */

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you !

}

function hasValues(obj: any): obj is QueryOptions {
return obj && typeof obj === 'object' && 'values' in obj;
}

/**
* The span name SHOULD be set to a low cardinality value
* representing the statement executed on the database.
Expand Down
205 changes: 205 additions & 0 deletions plugins/node/opentelemetry-instrumentation-mysql2/test/mysql.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,9 @@ describe('mysql2', () => {
contextManager = new AsyncHooksContextManager().enable();
context.setGlobalContextManager(contextManager);
instrumentation.setTracerProvider(provider);
instrumentation.setConfig({
maskStatement: false,
});
instrumentation.enable();
connection = createConnection({
port,
Expand Down Expand Up @@ -1118,6 +1121,7 @@ describe('mysql2', () => {
responseHook: (span, responseHookInfo) => {
throw new Error('random failure!');
},
maskStatement: false,
};
instrumentation.setConfig(config);
});
Expand Down Expand Up @@ -1145,6 +1149,7 @@ describe('mysql2', () => {
JSON.stringify(responseHookInfo.queryResults)
);
},
maskStatement: false,
};
instrumentation.setConfig(config);
});
Expand Down Expand Up @@ -1215,6 +1220,203 @@ describe('mysql2', () => {
});
});
});
describe('#maskStatementHook', () => {
beforeEach(done => {
//create table user and insert data
rootConnection.query(
'CREATE TABLE user (id INT, name VARCHAR(255), age INT)',
() => {
rootConnection.query(
'INSERT INTO user (id, name, age) VALUES (1, "test", 35)',
done
);
}
);
});

afterEach(done => {
rootConnection.query('DROP TABLE user', done);
});
describe('default maskStatementHook', () => {
beforeEach(done => {
instrumentation.setConfig({
maskStatement: true,
});
memoryExporter.reset();
done();
});

it('should mask string and numbers in statements', done => {
const query =
"SELECT * FROM user WHERE name = 'test' AND age = 35 AND id = 1";
const maskedQuery =
'SELECT * FROM user WHERE name = ? AND age = ? AND id = ?';
const span = provider.getTracer('default').startSpan('test span');
context.with(trace.setSpan(context.active(), span), () => {
connection.query(query, (err, res: RowDataPacket[]) => {
assert.ifError(err);
assert.ok(res);
assert.strictEqual(res[0].name, 'test');
assert.strictEqual(res[0].age, 35);
assert.strictEqual(res[0].id, 1);
const spans = memoryExporter.getFinishedSpans();
assert.strictEqual(spans.length, 1);
assertSpan(spans[0], maskedQuery);
done();
});
});
});
});
describe('custom maskStatementHook', () => {
beforeEach(done => {
instrumentation.setConfig({
maskStatement: true,
maskStatementHook: query => {
return query
.replace(/\b\d+\b/g, '*')
.replace(/(["'])(?:(?=(\\?))\2.)*?\1/g, '*');
},
});
memoryExporter.reset();
done();
});

it('should mask string and numbers in statements', done => {
const query =
"SELECT * FROM user WHERE name = 'test' AND age = 35 AND id = 1";
const maskedQuery =
'SELECT * FROM user WHERE name = * AND age = * AND id = *';
const span = provider.getTracer('default').startSpan('test span');
context.with(trace.setSpan(context.active(), span), () => {
connection.query(query, (err, res: RowDataPacket[]) => {
assert.ifError(err);
assert.ok(res);
assert.strictEqual(res[0].name, 'test');
assert.strictEqual(res[0].age, 35);
assert.strictEqual(res[0].id, 1);
const spans = memoryExporter.getFinishedSpans();
assert.strictEqual(spans.length, 1);
assertSpan(spans[0], maskedQuery);
done();
});
});
});
});
describe('maskStatementHook with error', () => {
beforeEach(done => {
instrumentation.setConfig({
maskStatement: true,
maskStatementHook: () => {
throw new Error('random failure!');
},
});
memoryExporter.reset();
done();
});
it('should not affect the behavior of the query', done => {
const query =
"SELECT * FROM user WHERE name = 'test' AND age = 35 AND id = 1";
const errorQuery =
'Could not determine the query due to an error in masking or formatting';
const span = provider.getTracer('default').startSpan('test span');
context.with(trace.setSpan(context.active(), span), () => {
connection.query(query, (err, res: RowDataPacket[]) => {
assert.ifError(err);
assert.ok(res);
assert.strictEqual(res[0].name, 'test');
assert.strictEqual(res[0].age, 35);
assert.strictEqual(res[0].id, 1);
const spans = memoryExporter.getFinishedSpans();
assert.strictEqual(spans.length, 1);
assertSpan(spans[0], errorQuery);
done();
});
});
});
});
});
describe('#maskStatement', () => {
beforeEach(done => {
memoryExporter.reset();
done();
});

it('should mask query if maskStatement is true', done => {
instrumentation.setConfig({
maskStatement: true,
});
const query = 'SELECT 1+1 as solution';
const maskedQuery = 'SELECT ?+? as solution';
const span = provider.getTracer('default').startSpan('test span');
context.with(trace.setSpan(context.active(), span), () => {
connection.query(query, (err, res: RowDataPacket[]) => {
assert.ifError(err);
assert.ok(res);
assert.strictEqual(res[0].solution, 2);
const spans = memoryExporter.getFinishedSpans();
assert.strictEqual(spans.length, 1);
assertSpan(spans[0], maskedQuery);
done();
});
});
});
it('should return masked query, if values are present', done => {
instrumentation.setConfig({
maskStatement: true,
});
const query = 'SELECT ?+? as solution';
const maskedQuery = 'SELECT ?+? as solution';
const span = provider.getTracer('default').startSpan('test span');
context.with(trace.setSpan(context.active(), span), () => {
connection.query(query, [1, 1], (err, res: RowDataPacket[]) => {
assert.ifError(err);
assert.ok(res);
assert.strictEqual(res[0].solution, 2);
const spans = memoryExporter.getFinishedSpans();
assert.strictEqual(spans.length, 1);
assertSpan(spans[0], maskedQuery);
done();
});
});
});
it('should not mask query if maskStatement is false', done => {
instrumentation.setConfig({
maskStatement: false,
});
const query = 'SELECT 1+1 as solution';
const span = provider.getTracer('default').startSpan('test span');
context.with(trace.setSpan(context.active(), span), () => {
connection.query(query, (err, res: RowDataPacket[]) => {
assert.ifError(err);
assert.ok(res);
assert.strictEqual(res[0].solution, 2);
const spans = memoryExporter.getFinishedSpans();
assert.strictEqual(spans.length, 1);
assertSpan(spans[0], query);
done();
});
});
});
it('should return query with values, if values are present and maskStatement is false', done => {
instrumentation.setConfig({
maskStatement: false,
});
const query = 'SELECT ?+? as solution';
const queryWithValues = 'SELECT 1+1 as solution';
const span = provider.getTracer('default').startSpan('test span');
context.with(trace.setSpan(context.active(), span), () => {
connection.query(query, [1, 1], (err, res: RowDataPacket[]) => {
assert.ifError(err);
assert.ok(res);
assert.strictEqual(res[0].solution, 2);
const spans = memoryExporter.getFinishedSpans();
assert.strictEqual(spans.length, 1);
assertSpan(spans[0], queryWithValues);
done();
});
});
});
});
});
describe('promise API', () => {
let instrumentation: MySQL2Instrumentation;
Expand Down Expand Up @@ -1271,6 +1473,9 @@ describe('mysql2', () => {
contextManager = new AsyncHooksContextManager().enable();
context.setGlobalContextManager(contextManager);
instrumentation.setTracerProvider(provider);
instrumentation.setConfig({
maskStatement: false,
});
instrumentation.enable();
connection = await createConnection({
port,
Expand Down
Loading