Skip to content

QueryBatcher fails when using path range query #1283

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rjrudin opened this issue Jan 21, 2021 · 3 comments
Closed

QueryBatcher fails when using path range query #1283

rjrudin opened this issue Jan 21, 2021 · 3 comments

Comments

@rjrudin
Copy link
Contributor

rjrudin commented Jan 21, 2021

So we can address your issue, please include the following:

Version of MarkLogic Java Client API

5.3.2

Version of MarkLogic Server

10.0-5

Java version

Java 8 and 11

OS and version

N/A

Input: Some code to illustrate the problem, preferably in a state that can be independently reproduced on our end

Below is a sample program to expose the bug. I have a path range index set up correctly on "/root/nst:dateTime" with "nst" declared as a path namespace in the database. And I have 6 documents that match the query (this is all from a marklogic-nifi test). Using queryManager.search, I get back the expected 6 documents. Using QueryBatcher, I get an error due to the namespace prefix not being recognized.

package org.apache.nifi.marklogic.processor;

import com.marklogic.client.DatabaseClient;
import com.marklogic.client.DatabaseClientFactory;
import com.marklogic.client.datamovement.DataMovementManager;
import com.marklogic.client.datamovement.QueryBatcher;
import com.marklogic.client.io.StringHandle;
import com.marklogic.client.query.QueryManager;
import com.marklogic.client.query.StructuredQueryBuilder;
import com.marklogic.client.query.StructuredQueryDefinition;
import com.marklogic.client.util.EditableNamespaceContext;

import java.util.Arrays;

public class RangeIndexBug {

    public static void main(String[] args) {
        DatabaseClient client = DatabaseClientFactory.newClient("localhost", 8006, new DatabaseClientFactory.DigestAuthContext("admin", "admin"));
        QueryManager queryManager = client.newQueryManager();

        StructuredQueryBuilder queryBuilder = queryManager.newStructuredQueryBuilder();
        EditableNamespaceContext namespaceContext = new EditableNamespaceContext();
        namespaceContext.put("nst", "namespace-test");
        queryBuilder.setNamespaces(namespaceContext);

        StructuredQueryDefinition queryDef = queryBuilder.range(
                queryBuilder.pathIndex("/root/nst:dateTime"),
                "xs:dateTime", StructuredQueryBuilder.Operator.GT, "1999-01-01T00:00:00"
        );

        // Try a regular search
        String results = queryManager.search(queryDef, new StringHandle()).get();
        System.out.println("Search results: " + results);

        // Try a QueryBatcher
        DataMovementManager dmm = client.newDataMovementManager();
        QueryBatcher qb = dmm.newQueryBatcher(queryDef)
                .onUrisReady(batch -> System.out.println("Items: " + Arrays.asList(batch.getItems())))
                .onQueryFailure(failure -> System.out.println("Failure: " + failure.getMessage()));
        dmm.startJob(qb);
        qb.awaitCompletion();
        dmm.stopJob(qb);
    }
}

Actual output: What did you observe? What errors did you see? Can you attach the logs? (Java logs, MarkLogic logs)

Here's the output of the queryManager.search (just a snippet to verify I get data back):

<search:response snippet-format="snippet" total="6" start="1" page-length="10" xmlns:search="http://marklogic.com/appservices/search">
  <search:result index="1" uri="/PutMarkLogicTest/5.xml" path="fn:doc(&quot;/PutMarkLogicTest/5.xml&quot;)" score="0" confidence="0" fitness="0" href="/v1/documents?uri=%2FPutMarkLogicTest%2F5.xml" mimetype="application/xml" format="xml">
    <search:snippet>
      <search:match path="fn:doc(&quot;/PutMarkLogicTest/5.xml&quot;)/root/*:dateTime"><search:highlight>2000-01-01T00:00:00.000000</search:highlight></search:match>
    </search:snippet>
  </search:result>

And here's the error I got from using QueryBatcher:

[main] INFO com.marklogic.client.datamovement.impl.QueryBatcherImpl - (withForestConfig) Using forests on [localhost] hosts for "test-marklogic-nifi-content"
[main] WARN com.marklogic.client.datamovement.impl.QueryBatcherImpl - threadCount not set--defaulting to number of forests (1)
[main] INFO com.marklogic.client.datamovement.impl.QueryBatcherImpl - Starting job batchSize=1000, threadCount=1, onUrisReady listeners=2, failure listeners=4
Failure: com.marklogic.client.FailedRequestException: Local message: failed to apply resource at internal/uris: Internal Server Error. Server Message: XDMP-UNBPRFX: (err:XPST0081) Prefix nst has no namespace binding . See the MarkLogic server error log for further detail.

Expected output: What specifically did you expect to happen?

I expected QueryBatcher to find the same 6 documents

Alternatives: What else have you tried, actual/expected?

No workaround that I can find.

@ehennum
Copy link
Contributor

ehennum commented Jan 23, 2021

Good catch. Here's a guess as to what's going on.

Starting in 10.0-5, the Java API converts the query to a cts.query once during initialization instead of on every request.

In the com.marklogic.client.datamovement.impl.QueryBatcherImpl#QueryBatcherImpl() constructor on line 99,
the cts.query serialization is captured.

Somewhere, the conversion to the cts.query (possibly within the REST API internal endpoint) loses the namespace binding.

@ehennum
Copy link
Contributor

ehennum commented Jan 25, 2021

Based on investigation...

Initialization converts the Search API representation of a path range query to the cts representation, which is serialized to JSON before returning to the client.

A cts.pathRangeQuery() doesn't take namespace declarations, so it serializes to JSON without namespace declarations.

By contrast, a cts.pathReference() does take namespace declarations, which are serialized to JSON.

A cts.rangeQuery() takes a cts.pathReference(), so one way to fix the issue would be to modify the conversion from the Search API representation to the cts representation in this case. That approach, however, would risk introducing a backward incompatibility on a stable component.

Another way to solve the problem would be to serialize to XML if namespaces are used and to JSON otherwise. That approach, however, would add complexity to both the interface and implementation of the REST API.

An expedient solution is to use the original query for a structured query builder query with namespaces or for a raw query in XML format. The optimization will be skipped for such queries.

@ehennum
Copy link
Contributor

ehennum commented Jan 25, 2021

The fix also skips the optimization and uses the original query if the query refers to persisted options.

If the functional tests have a path range indexes with namespace, a good functional test would use a query batcher to get some results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants