-
Notifications
You must be signed in to change notification settings - Fork 63
[FLINK-32097][Connectors/Kinesis] Implement support for Kinesis deaggregation #188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
767f576
ddcb1a9
fce6d03
82c6ed2
e1e274f
156dc6b
ae629b4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -33,6 +33,14 @@ under the License. | |
<name>Flink : Connectors : AWS : Amazon Kinesis Data Streams Connector v2</name> | ||
<packaging>jar</packaging> | ||
|
||
<repositories> | ||
<!-- used for the kinesis aggregator dependency since it is not available in maven central --> | ||
<repository> | ||
<id>jitpack.io</id> | ||
<url>https://jitpack.io</url> | ||
</repository> | ||
</repositories> | ||
|
||
<dependencies> | ||
<dependency> | ||
<groupId>org.apache.flink</groupId> | ||
|
@@ -52,6 +60,11 @@ under the License. | |
<artifactId>kinesis</artifactId> | ||
</dependency> | ||
|
||
<dependency> | ||
<groupId>software.amazon.kinesis</groupId> | ||
<artifactId>amazon-kinesis-client</artifactId> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a shame, can we do this without KCL? It's a heavy weight dependency for the 1 class(?) needed for deaggregation. We have been talking about removing this dependency with the new connector for years, but never really solved this bit. Can we ask Kinesis to publish a separate library for this? Or, we could just copy that file (assuming it is still a small asking of code) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @dannycranmer |
||
</dependency> | ||
|
||
<dependency> | ||
<groupId>software.amazon.awssdk</groupId> | ||
<artifactId>arns</artifactId> | ||
|
@@ -102,6 +115,15 @@ under the License. | |
<scope>test</scope> | ||
</dependency> | ||
|
||
<dependency> | ||
<!-- the kinesis aggregator dependency since it is not available in maven central --> | ||
<!-- look into issue https://github.com/awslabs/kinesis-aggregation/issues/120 --> | ||
<groupId>com.github.awslabs.kinesis-aggregation</groupId> | ||
<artifactId>amazon-kinesis-aggregator</artifactId> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This has Apache 2.0 licence, all good so |
||
<version>2.0.3</version> | ||
<scope>test</scope> | ||
</dependency> | ||
|
||
<dependency> | ||
<groupId>nl.jqno.equalsverifier</groupId> | ||
<artifactId>equalsverifier</artifactId> | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -29,8 +29,8 @@ | |
|
||
import org.slf4j.Logger; | ||
import org.slf4j.LoggerFactory; | ||
import software.amazon.awssdk.services.kinesis.model.Record; | ||
import software.amazon.awssdk.services.kinesis.model.ResourceNotFoundException; | ||
import software.amazon.kinesis.retrieval.KinesisClientRecord; | ||
|
||
import javax.annotation.Nullable; | ||
|
||
|
@@ -41,7 +41,6 @@ | |
import java.util.Deque; | ||
import java.util.HashSet; | ||
import java.util.Iterator; | ||
import java.util.List; | ||
import java.util.Map; | ||
import java.util.Set; | ||
|
||
|
@@ -50,10 +49,10 @@ | |
/** Base implementation of the SplitReader for reading from KinesisShardSplits. */ | ||
@Internal | ||
public abstract class KinesisShardSplitReaderBase | ||
implements SplitReader<Record, KinesisShardSplit> { | ||
implements SplitReader<KinesisClientRecord, KinesisShardSplit> { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This change actually changes the interface exposed for the KinesisSource, so it would be a backwards incompatible change. Is there a way we can wrap this internally? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have been looking at the implications of not changing from
Apart from that I don't see any more issues. Either way I don't mind changing it back to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Lzgpom Can you elaborate on what the user experience would be with the current changes in PR? Will it break compilation for existing apps that already depend on KinesisSource without further code changes? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Lzgpom I think the gist is, can you please change the interface back to using |
||
|
||
private static final Logger LOG = LoggerFactory.getLogger(KinesisShardSplitReaderBase.class); | ||
private static final RecordsWithSplitIds<Record> INCOMPLETE_SHARD_EMPTY_RECORDS = | ||
private static final RecordsWithSplitIds<KinesisClientRecord> INCOMPLETE_SHARD_EMPTY_RECORDS = | ||
new KinesisRecordsWithSplitIds(Collections.emptyIterator(), null, false); | ||
|
||
private final Deque<KinesisShardSplitState> assignedSplits = new ArrayDeque<>(); | ||
|
@@ -65,7 +64,7 @@ protected KinesisShardSplitReaderBase(Map<String, KinesisShardMetrics> shardMetr | |
} | ||
|
||
@Override | ||
public RecordsWithSplitIds<Record> fetch() throws IOException { | ||
public RecordsWithSplitIds<KinesisClientRecord> fetch() throws IOException { | ||
KinesisShardSplitState splitState = assignedSplits.poll(); | ||
|
||
// When there are no assigned splits, return quickly | ||
|
@@ -103,7 +102,7 @@ public RecordsWithSplitIds<Record> fetch() throws IOException { | |
.get(splitState.getShardId()) | ||
.setMillisBehindLatest(recordBatch.getMillisBehindLatest()); | ||
|
||
if (recordBatch.getRecords().isEmpty()) { | ||
if (recordBatch.getDeaggregatedRecords().isEmpty()) { | ||
if (recordBatch.isCompleted()) { | ||
return new KinesisRecordsWithSplitIds( | ||
Collections.emptyIterator(), splitState.getSplitId(), true); | ||
|
@@ -115,12 +114,12 @@ public RecordsWithSplitIds<Record> fetch() throws IOException { | |
splitState.setNextStartingPosition( | ||
StartingPosition.continueFromSequenceNumber( | ||
recordBatch | ||
.getRecords() | ||
.get(recordBatch.getRecords().size() - 1) | ||
.getDeaggregatedRecords() | ||
.get(recordBatch.getDeaggregatedRecords().size() - 1) | ||
.sequenceNumber())); | ||
|
||
return new KinesisRecordsWithSplitIds( | ||
recordBatch.getRecords().iterator(), | ||
recordBatch.getDeaggregatedRecords().iterator(), | ||
splitState.getSplitId(), | ||
recordBatch.isCompleted()); | ||
} | ||
|
@@ -154,48 +153,20 @@ public void pauseOrResumeSplits( | |
splitsToResume.forEach(split -> pausedSplitIds.remove(split.splitId())); | ||
} | ||
|
||
/** | ||
* Dataclass to store a batch of Kinesis records with metadata. Used to pass Kinesis records | ||
* from the SplitReader implementation to the SplitReaderBase. | ||
*/ | ||
@Internal | ||
protected static class RecordBatch { | ||
private final List<Record> records; | ||
private final long millisBehindLatest; | ||
private final boolean completed; | ||
|
||
public RecordBatch(List<Record> records, long millisBehindLatest, boolean completed) { | ||
this.records = records; | ||
this.millisBehindLatest = millisBehindLatest; | ||
this.completed = completed; | ||
} | ||
|
||
public List<Record> getRecords() { | ||
return records; | ||
} | ||
|
||
public long getMillisBehindLatest() { | ||
return millisBehindLatest; | ||
} | ||
|
||
public boolean isCompleted() { | ||
return completed; | ||
} | ||
} | ||
|
||
/** | ||
* Implementation of {@link RecordsWithSplitIds} for sending Kinesis records from fetcher to the | ||
* SourceReader. | ||
*/ | ||
@Internal | ||
private static class KinesisRecordsWithSplitIds implements RecordsWithSplitIds<Record> { | ||
private static class KinesisRecordsWithSplitIds | ||
implements RecordsWithSplitIds<KinesisClientRecord> { | ||
|
||
private final Iterator<Record> recordsIterator; | ||
private final Iterator<KinesisClientRecord> recordsIterator; | ||
private final String splitId; | ||
private final boolean isComplete; | ||
|
||
public KinesisRecordsWithSplitIds( | ||
Iterator<Record> recordsIterator, String splitId, boolean isComplete) { | ||
Iterator<KinesisClientRecord> recordsIterator, String splitId, boolean isComplete) { | ||
this.recordsIterator = recordsIterator; | ||
this.splitId = splitId; | ||
this.isComplete = isComplete; | ||
|
@@ -209,7 +180,7 @@ public String nextSplit() { | |
|
||
@Nullable | ||
@Override | ||
public Record nextRecordFromSplit() { | ||
public KinesisClientRecord nextRecordFromSplit() { | ||
return recordsIterator.hasNext() ? recordsIterator.next() : null; | ||
} | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,79 @@ | ||
/* | ||
* Licensed to the Apache Software Foundation (ASF) under one | ||
* or more contributor license agreements. See the NOTICE file | ||
* distributed with this work for additional information | ||
* regarding copyright ownership. The ASF licenses this file | ||
* to you under the Apache License, Version 2.0 (the | ||
* "License"); you may not use this file except in compliance | ||
* with the License. You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
package org.apache.flink.connector.kinesis.source.reader; | ||
|
||
import org.apache.flink.annotation.Internal; | ||
import org.apache.flink.connector.kinesis.source.split.KinesisShardSplit; | ||
|
||
import software.amazon.awssdk.services.kinesis.model.Record; | ||
import software.amazon.kinesis.retrieval.AggregatorUtil; | ||
import software.amazon.kinesis.retrieval.KinesisClientRecord; | ||
|
||
import java.util.ArrayList; | ||
import java.util.List; | ||
|
||
/** | ||
* Dataclass to store a batch of Kinesis records with metadata. Used to pass Kinesis records from | ||
* the SplitReader implementation to the SplitReaderBase. | ||
* | ||
* <p>Input records are de-aggregated using KCL 3.x library. It is expected that AWS SDK v2.x | ||
* messages are converted to KCL 3.x {@link KinesisClientRecord}. | ||
*/ | ||
@Internal | ||
public class RecordBatch { | ||
private final List<KinesisClientRecord> deaggregatedRecords; | ||
private final long millisBehindLatest; | ||
private final boolean completed; | ||
|
||
public RecordBatch( | ||
final List<Record> records, | ||
final KinesisShardSplit subscribedShard, | ||
final long millisBehindLatest, | ||
final boolean completed) { | ||
this.deaggregatedRecords = deaggregateRecords(records, subscribedShard); | ||
this.millisBehindLatest = millisBehindLatest; | ||
this.completed = completed; | ||
} | ||
|
||
public List<KinesisClientRecord> getDeaggregatedRecords() { | ||
return deaggregatedRecords; | ||
} | ||
|
||
public long getMillisBehindLatest() { | ||
return millisBehindLatest; | ||
} | ||
|
||
public boolean isCompleted() { | ||
return completed; | ||
} | ||
|
||
private List<KinesisClientRecord> deaggregateRecords( | ||
final List<Record> records, final KinesisShardSplit subscribedShard) { | ||
final List<KinesisClientRecord> kinesisClientRecords = new ArrayList<>(); | ||
for (Record record : records) { | ||
kinesisClientRecords.add(KinesisClientRecord.fromRecord(record)); | ||
} | ||
|
||
final String startingHashKey = subscribedShard.getStartingHashKey(); | ||
final String endingHashKey = subscribedShard.getEndingHashKey(); | ||
|
||
return new AggregatorUtil() | ||
.deaggregate(kinesisClientRecords, startingHashKey, endingHashKey); | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -23,7 +23,7 @@ | |
import org.apache.flink.connector.kinesis.source.KinesisStreamsSource; | ||
import org.apache.flink.util.Collector; | ||
|
||
import software.amazon.awssdk.services.kinesis.model.Record; | ||
import software.amazon.kinesis.retrieval.KinesisClientRecord; | ||
|
||
import java.io.IOException; | ||
import java.io.Serializable; | ||
|
@@ -60,7 +60,7 @@ default void open(DeserializationSchema.InitializationContext context) throws Ex | |
* @param output the identifier of the shard the record was sent to | ||
* @throws IOException exception when deserializing record | ||
*/ | ||
void deserialize(Record record, String stream, String shardId, Collector<T> output) | ||
void deserialize(KinesisClientRecord record, String stream, String shardId, Collector<T> output) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a breaking change to the public interface There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 @Lzgpom Have we considered alternatives to changing the interface? I can see that Hong suggested wrapping internally. Have we considered alternatives such as having a separate KinesisClientRecordDeserializationSchema? |
||
throws IOException; | ||
|
||
static <T> KinesisDeserializationSchema<T> of(DeserializationSchema<T> deserializationSchema) { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to work with the owner of the repo to get this published instead?