-
Notifications
You must be signed in to change notification settings - Fork 72
Results missing when 2 forests failover #813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
When a host is being brought down or if it is in the process of shutting down, there might be scenarios where there is no response for a request. With Okhttp in place, we are getting an EOF exception indicating an end of stream on Connection. Hence we have to take that into account during DMSDK failover. We have added a new listener called NoResponseListener which is a subclass of HostAvailabilityListener and it would handle the empty responses from the server. The reason we create a new listener and not add the exception to the list of HostAvailabilityListener's exceptions is that we would retry automatically for listeners as well if we failed to apply a listener after a set of URIs have been retrieved from the server. This is undesirable in ApplyTransformListener where an empty response could mean anything - either for the batch the transform has been applied or the batch has not been processed yet. Since transforms need not be idempotent, we shouldn't do a blind retry. Hence we created the new listener and it would be registered by default to both QueryBatcher and WriteBatcher instances when they are created. For idempotent listeners like DeleteListener and even for WriteBatcher, we can do a blind retry. For WriteBatcher, since the NoResponseListener is registered by default, the listener would retry and get it written on the server. For idempotent listeners, the retry of applying the listeners is done by initializing the RetryListener of NoResponseListener and adding it to the onFailure listeners, similar to what we did for the HostAvailabilityListener and these should be added in the overrided initializeListener(QueryBatcher). For listeners like ApplyTransformListener, there are two cases -
|
Is there a way for a user to flag her |
If the transform is idempotent, the user has to know to retry in their failure handler. ApplyTransformListener listener = new ApplyTransformListener().withTransform(transform)
.withApplyResult(ApplyResult.REPLACE).onSuccess(batch -> {
success.addAndGet(batch.getItems().length);
}).onSkipped(batch -> {
skipped.addAndGet(batch.getItems().length);
});
QueryBatcher batcher = dmManager.newQueryBatcher(new StructuredQueryBuilder().collection("XmlTransform")).onUrisReady(listener).withBatchSize(10).withThreadCount(5);
NoResponseListener noResponseListener = NoResponseListener.getInstance(batcher);
if (noResponseListener != null) {
BatchFailureListener<QueryBatch> retryListener = noResponseListener.initializeRetryListener(listener);
if (retryListener != null) {
listener.onFailure(retryListener);
}
} |
Able to obtain all URIs after failover |
@jmakeig A sample implementation has been implemented in the DeleteListener @Override
public void initializeListener(QueryBatcher queryBatcher) {
HostAvailabilityListener hostAvailabilityListener = HostAvailabilityListener.getInstance(queryBatcher);
if ( hostAvailabilityListener != null ) {
BatchFailureListener<QueryBatch> retryListener = hostAvailabilityListener.initializeRetryListener(this);
if ( retryListener != null ) onFailure(retryListener);
}
NoResponseListener noResponseListener = NoResponseListener.getInstance(queryBatcher);
if ( noResponseListener != null ) {
BatchFailureListener<QueryBatch> noResponseRetryListener = noResponseListener.initializeRetryListener(this);
if ( noResponseRetryListener != null ) onFailure(noResponseRetryListener);
}
} For ApplyTransform alone, we have to do how Srinath has mentioned in #813 (comment) if the transform is idempotent. |
In the following test, I have 3 node cluster (rh7v-intel64-90-java-stress-1/2/4.marklogic.com) with a forest on each of the hosts and forests on hosts rh7v-intel64-90-java-stress-2/4.marklogic.com configured to failover to rh7v-intel64-90-java-stress-1.marklogic.com. When Query batcher is executed, I stop rh7v-intel64-90-java-stress-4.marklogic.com , the forest QBFailover-3 fails over to rh7v-intel64-90-java-stress-1.marklogic.com. After some time, I stop rh7v-intel64-90-java-stress-2.marklogic.com but before node timeout (30 seconds by default) elapses, I start rh7v-intel64-90-java-stress-4.marklogic.com. After 30 seconds, the forest QBFailover-2 on rh7v-intel64-90-java-stress-2.marklogic.com also fails over to rh7v-intel64-90-java-stress-1.marklogic.com. In this scenario, the total URIs returned is less than expected. The log is attached
TEST-com.marklogic.client.datamovement.functionaltests.QBFailover.txt
The text was updated successfully, but these errors were encountered: