-
Notifications
You must be signed in to change notification settings - Fork 25.3k
Handle unavailable MD5 in ES|QL #130158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle unavailable MD5 in ES|QL #130158
Changes from 2 commits
fd8bfc4
33b6d87
05c8901
249fe6b
b03fa6c
552424f
4169b5b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
pr: 130158 | ||
summary: Handle unavailable MD5 in ES|QL | ||
area: ES|QL | ||
type: bug | ||
issues: [] |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -9,6 +9,7 @@ | |
|
||
import org.elasticsearch.common.io.stream.NamedWriteableRegistry; | ||
import org.elasticsearch.common.io.stream.StreamInput; | ||
import org.elasticsearch.xpack.esql.VerificationException; | ||
import org.elasticsearch.xpack.esql.core.expression.Expression; | ||
import org.elasticsearch.xpack.esql.core.tree.NodeInfo; | ||
import org.elasticsearch.xpack.esql.core.tree.Source; | ||
|
@@ -18,13 +19,16 @@ | |
import org.elasticsearch.xpack.esql.expression.function.scalar.string.Hash.HashFunction; | ||
|
||
import java.io.IOException; | ||
import java.security.MessageDigest; | ||
import java.security.NoSuchAlgorithmException; | ||
import java.util.List; | ||
import java.util.concurrent.atomic.AtomicReference; | ||
|
||
public class Md5 extends AbstractHashFunction { | ||
|
||
public static final NamedWriteableRegistry.Entry ENTRY = new NamedWriteableRegistry.Entry(Expression.class, "MD5", Md5::new); | ||
|
||
private static final HashFunction MD5 = HashFunction.create("MD5"); | ||
private final AtomicReference<HashFunction> MD5 = new AtomicReference<>(); | ||
|
||
@FunctionInfo( | ||
returnType = "keyword", | ||
|
@@ -39,9 +43,23 @@ private Md5(StreamInput in) throws IOException { | |
super(in); | ||
} | ||
|
||
/** | ||
* As of Java 14, it is permissible for a JRE to ship without the {@code MD5} {@link MessageDigest}. | ||
* We want the "md5" function in ES|QL to fail at runtime on such platforms (rather than at startup) | ||
* so we build the {@link HashFunction} lazily. | ||
*/ | ||
@Override | ||
protected HashFunction getHashFunction() { | ||
return MD5; | ||
HashFunction function = MD5.get(); | ||
if (function == null) { | ||
try { | ||
function = new HashFunction("MD5", MessageDigest.getInstance("MD5")); | ||
MD5.compareAndSet(null, function); | ||
} catch (NoSuchAlgorithmException e) { | ||
throw new VerificationException("function 'md5' is not available on this platform: {}", e.getMessage()); | ||
} | ||
} | ||
return function; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would prefer if we do not try to initialize it for every query even if it is not available. May be something like:
and in Hash.java:
This would also mean we need to modify org.elasticsearch.xpack.esql.expression.function.scalar.string.AbstractHashFunction#toEvaluator to use different (warning/error only) hash evaluator when There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
So would I. I can handle it as you describe, the issue is that we lose the original exception (its only in the logs) and have to treat There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Possibly it worth introducing wrapper that captures exception and re-throws it when executed in new or existing constant evaluator. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since this is JVM specific, I am wondering if there are scenarios (like Sorry if I am mentioning something obvious, but thought on voicing it loud rather than just assuming this is already considered. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Simple There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
It doesn't. The hash is computed on the coordinator after each node replies back with 1000 (the default) rows. The coordinator will pick the final 1000 rows and then compute the This is how a physical plan would look like on the coordinator:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I introduced a |
||
} | ||
|
||
@Override | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
/* | ||
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one | ||
* or more contributor license agreements. Licensed under the Elastic License | ||
* 2.0; you may not use this file except in compliance with the Elastic License | ||
* 2.0. | ||
*/ | ||
|
||
package org.elasticsearch.xpack.esql.expression.function.scalar.string; | ||
|
||
import org.elasticsearch.test.ESTestCase; | ||
import org.elasticsearch.xpack.esql.VerificationException; | ||
import org.elasticsearch.xpack.esql.core.expression.Literal; | ||
import org.elasticsearch.xpack.esql.core.tree.Source; | ||
|
||
import java.security.Provider; | ||
import java.security.Security; | ||
|
||
import static org.hamcrest.Matchers.notNullValue; | ||
|
||
public class Md5UnavailableTests extends ESTestCase { | ||
tvernum marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
public void testMd5Unavailable() { | ||
assumeFalse("We run with different security providers in FIPS, and changing them at runtime is more complicated", inFipsJvm()); | ||
final Provider sunProvider = Security.getProvider("SUN"); | ||
final Md5 md5; | ||
try { | ||
Security.removeProvider("SUN"); | ||
md5 = new Md5(Source.EMPTY, Literal.NULL); | ||
expectThrows(VerificationException.class, md5::getHashFunction); | ||
} finally { | ||
Security.addProvider(sunProvider); | ||
} | ||
|
||
assertThat(md5.getHashFunction(), notNullValue()); | ||
} | ||
|
||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the same applicable for SHA functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is probably worth mentioning that this might not be available in certain JVMs at
elasticsearch/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/scalar/string/Md5.java
Line 31 in fd1be8c
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, as at Java 24, both SHA-1 and SHA-256 are still required
I could change those functions to work similarly, but I opted for the smallest change that solves the known problem. I actually started with a change in
HashFunction
that would be lenient for any algorithm, but switched to something MD5 specific.