diff --git a/CHANGELOG.md b/CHANGELOG.md index fb673393..c1053d8e 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -12,6 +12,12 @@ This project follows semantic versioning. - Bidirectional collections have a new `ends(with:)` method that matches the behavior of the standard library's `starts(with:)` method. ([#224]) +- Sequences that are already sorted can use the `countSortedDuplicates` and + `deduplicateSorted` methods, with eager and lazy versions. + The former returns each unique value paired with the count of + that value's occurances. + The latter returns each unique value, + turning a possibly non-decreasing sequence to a strictly-increasing one. diff --git a/Guides/README.md b/Guides/README.md index d4894882..effbfbb1 100644 --- a/Guides/README.md +++ b/Guides/README.md @@ -25,6 +25,7 @@ These guides describe the design and intention behind the APIs included in the ` #### Subsetting operations - [`compacted()`](https://github.com/apple/swift-algorithms/blob/main/Guides/Compacted.md): Drops the `nil`s from a sequence or collection, unwrapping the remaining elements. +- [`deduplicateSorted()`, `deduplicateSorted(by:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/SortedDuplicates.md): Given an already-sorted sequence and the sorting predicate, reduce all runs of a unique value to a single element each. Has eager and lazy variants. - [`partitioned(by:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/Partition.md): Returns the elements in a sequence or collection that do and do not match a given predicate. - [`randomSample(count:)`, `randomSample(count:using:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/RandomSampling.md): Randomly selects a specific number of elements from a collection. - [`randomStableSample(count:)`, `randomStableSample(count:using:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/RandomSampling.md): Randomly selects a specific number of elements from a collection, preserving their original relative order. @@ -42,6 +43,7 @@ These guides describe the design and intention behind the APIs included in the ` - [`adjacentPairs()`](https://github.com/apple/swift-algorithms/blob/main/Guides/AdjacentPairs.md): Lazily iterates over tuples of adjacent elements. - [`chunked(by:)`, `chunked(on:)`, `chunks(ofCount:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/Chunked.md): Eager and lazy operations that break a collection into chunks based on either a binary predicate or when the result of a projection changes or chunks of a given count. +- [`countSortedDuplicates()`, `countSortedDuplicates(by:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/SortedDuplicates.md): Given an already-sorted sequence and the sorting predicate, return each unique value, pairing each with the number of occurances. Has eager and lazy variants. - [`firstNonNil(_:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/FirstNonNil.md): Returns the first non-`nil` result from transforming a sequence's elements. - [`grouped(by:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/Grouped.md): Group up elements using the given closure, returning a Dictionary of those groups, keyed by the results of the closure. - [`indexed()`](https://github.com/apple/swift-algorithms/blob/main/Guides/Indexed.md): Iterate over tuples of a collection's indices and elements. diff --git a/Guides/SortedDuplicates.md b/Guides/SortedDuplicates.md new file mode 100644 index 00000000..31d0142d --- /dev/null +++ b/Guides/SortedDuplicates.md @@ -0,0 +1,65 @@ +# Sorted Duplicates +[[Source](https://github.com/apple/swift-algorithms/blob/main/Sources/Algorithms/SortedDuplicates.swift) | + [Tests](https://github.com/apple/swift-algorithms/blob/main/Tests/SwiftAlgorithmsTests/SortedDuplicatesTests.swift)] + +Being a given a sequence that is already sorted, recognize each run of +identical values. +Use that to determine the length of each identical-value run of +identical values. +Or filter out the duplicate values by removing all occurances of +a given value besides the first. + +```swift +// Put examples here +``` + +## Detailed Design + +```swift +extension Sequence { + public func countSortedDuplicates( + by areInIncreasingOrder: (Element, Element) throws -> Bool + ) rethrows -> [(value: Element, count: Int)] + + public func deduplicateSorted( + by areInIncreasingOrder: (Element, Element) throws -> Bool + ) rethrows -> [Element] +} + +extension Sequence where Self.Element : Comparable { + public func countSortedDuplicates() -> [(value: Element, count: Int)] + + public func deduplicateSorted() -> [Element] +} + +extension LazySequenceProtocol { + public func countSortedDuplicates( + by areInIncreasingOrder: @escaping (Element, Element) -> Bool + ) -> LazyCountDuplicatesSequence + + public func deduplicateSorted( + by areInIncreasingOrder: @escaping (Element, Element) -> Bool + ) -> some (Sequence & LazySequenceProtocol) +} + +extension LazySequenceProtocol where Self.Element : Comparable { + public func countSortedDuplicates() + -> LazyCountDuplicatesSequence + + public func deduplicateSorted() + -> some (Sequence & LazySequenceProtocol) +} + +public struct LazyCountDuplicatesSequence + : LazySequenceProtocol +{ /*...*/ } + +public struct CountDuplicatesIterator + : IteratorProtocol +{ /*...*/ } +``` + +### Complexity + +Calling the lazy methods, those defined on `LazySequenceProtocol`, is O(_1_). +Calling the eager methods, those returning an array, is O(_n_). diff --git a/Sources/Algorithms/Documentation.docc/Filtering.md b/Sources/Algorithms/Documentation.docc/Filtering.md index 85073ab8..7eb23a5a 100644 --- a/Sources/Algorithms/Documentation.docc/Filtering.md +++ b/Sources/Algorithms/Documentation.docc/Filtering.md @@ -21,6 +21,14 @@ let withNoNils = array.compacted() // Array(withNoNils) == [10, 30, 2, 3, 5] ``` +The `deduplicateSorted()` methods remove consecutive elements of the same equivalence class from an already sorted sequence, turning a possibly non-decreasing sequence to a strictly-increasing one. The sorting predicate can be supplied. + +```swift +let numbers = [0, 1, 2, 2, 2, 3, 5, 6, 6, 9, 10, 10] +let deduplicated = numbers.deduplicateSorted() +// Array(deduplicated) == [0, 1, 2, 3, 5, 6, 9, 10] +``` + ## Topics ### Uniquing Elements @@ -34,6 +42,13 @@ let withNoNils = array.compacted() - ``Swift/Collection/compacted()`` - ``Swift/Sequence/compacted()`` +### Removing Duplicates from a Sorted Sequence + +- ``Swift/Sequence/deduplicateSorted(by:)`` +- ``Swift/Sequence/deduplicateSorted()`` +- ``Swift/LazySequenceProtocol/deduplicateSorted(by:)`` +- ``Swift/LazySequenceProtocol/deduplicateSorted()`` + ### Supporting Types - ``UniquedSequence`` diff --git a/Sources/Algorithms/Documentation.docc/Keying.md b/Sources/Algorithms/Documentation.docc/Keying.md index aa296161..8625f19f 100644 --- a/Sources/Algorithms/Documentation.docc/Keying.md +++ b/Sources/Algorithms/Documentation.docc/Keying.md @@ -12,3 +12,15 @@ Convert a sequence to a dictionary, providing keys to individual elements or to ### Grouping Elements by Key - ``Swift/Sequence/grouped(by:)`` + +### Counting each Element in a Sorted Sequence + +- ``Swift/Sequence/countSortedDuplicates(by:)`` +- ``Swift/Sequence/countSortedDuplicates()`` +- ``Swift/LazySequenceProtocol/countSortedDuplicates(by:)`` +- ``Swift/LazySequenceProtocol/countSortedDuplicates()`` + +### Supporting Types + +- ``LazyCountDuplicatesSequence`` +- ``CountDuplicatesIterator`` diff --git a/Sources/Algorithms/SortedDuplicates.swift b/Sources/Algorithms/SortedDuplicates.swift new file mode 100644 index 00000000..663006fd --- /dev/null +++ b/Sources/Algorithms/SortedDuplicates.swift @@ -0,0 +1,281 @@ +//===----------------------------------------------------------------------===// +// +// This source file is part of the Swift Algorithms open source project +// +// Copyright (c) 2025 Apple Inc. and the Swift project authors +// Licensed under Apache License v2.0 with Runtime Library Exception +// +// See https://swift.org/LICENSE.txt for license information +// +//===----------------------------------------------------------------------===// + +extension Sequence { + /// Assuming this sequence is already sorted along the given predicate, + /// return a collection of the given type, + /// storing the first occurance of each unique element value in + /// this sequence paired with its total number of occurances. + /// + /// - Precondition: This sequence must be finite, + /// and be sorted according to the given predicate. + /// + /// - Parameter type: A reference to the returned collection's type. + /// - Parameter areInIncreasingOrder: The sorting predicate. + /// - Returns: A collection of pairs, + /// one for each element equivalence class present in this sequence, + /// in order of appearance. + /// The first member is the value of the earliest element for + /// an equivalence class. + /// The second member is the number of occurances of that + /// equivalence class. + /// + /// - Complexity: O(`n`), where *n* is the length of this sequence. + @usableFromInline + func countSortedDuplicates( + storingIn type: T.Type, + by areInIncreasingOrder: (Element, Element) throws -> Bool + ) rethrows -> T + where T: RangeReplaceableCollection, T.Element == (value: Element, count: Int) + { + try withoutActuallyEscaping(areInIncreasingOrder) { + let sequence = LazyCountDuplicatesSequence(self, by: $0) + var iterator = sequence.makeIterator() + var result = T() + result.reserveCapacity(sequence.underestimatedCount) + while let element = try iterator.throwingNext() { + result.append(element) + } + return result + } + } + + /// Assuming this sequence is already sorted along the given predicate, + /// return an array of each unique element paired with its number of + /// occurances. + /// + /// - Precondition: This sequence must be finite, + /// and be sorted according to the given predicate. + /// + /// - Parameter areInIncreasingOrder: The sorting predicate. + /// - Returns: An array of pairs, + /// one for each element equivalence class present in this sequence, + /// in order of appearance. + /// The first member is the value of the earliest element for + /// an equivalence class. + /// The second member is the number of occurances of that + /// equivalence class. + /// + /// - Complexity: O(`n`), where *n* is the length of this sequence. + @inlinable + public func countSortedDuplicates( + by areInIncreasingOrder: (Element, Element) throws -> Bool + ) rethrows -> [(value: Element, count: Int)] { + try countSortedDuplicates(storingIn: Array.self, by: areInIncreasingOrder) + } + + /// Assuming this sequence is already sorted along the given predicate, + /// return an array of each unique element, by equivalence class. + /// + /// - Precondition: This sequence must be finite, + /// and be sorted according to the given predicate. + /// + /// - Parameter areInIncreasingOrder: The sorting predicate. + /// + /// - Returns: An array with the earliest element in this sequence for + /// each equivalence class. + /// + /// - Complexity: O(`n`), where *n* is the length of this sequence. + @inlinable + public func deduplicateSorted( + by areInIncreasingOrder: (Element, Element) throws -> Bool + ) rethrows -> [Element] { + try countSortedDuplicates(by: areInIncreasingOrder).map(\.value) + } +} + +extension Sequence where Element: Comparable { + /// Assuming this sequence is already sorted, + /// return an array of each unique value paired with its number of + /// occurances. + /// + /// - Precondition: This sequence must be finite and sorted. + /// + /// - Returns: An array of pairs, + /// one for each unique element value in this sequence, + /// in order of appearance. + /// The first member is the earliest element for a value. + /// The second member is the count of that value's occurances. + /// + /// - Complexity: O(`n`), where *n* is the length of this sequence. + @inlinable + public func countSortedDuplicates() -> [(value: Element, count: Int)] { + countSortedDuplicates(by: <) + } + + /// Assuming this sequence is already sorted, + /// return an array of the first elements of each unique value. + /// + /// - Precondition: This sequence must be finite and sorted. + /// + /// - Returns: An array with the earliest element in this sequence for + /// each value. + /// + /// - Complexity: O(`n`), where *n* is the length of this sequence. + @inlinable + public func deduplicateSorted() -> [Element] { + deduplicateSorted(by: <) + } +} + +extension LazySequenceProtocol { + /// Assuming this sequence is already sorted along the given predicate, + /// return a sequence that will lazily generate each unique + /// element paired with its number of occurances. + /// + /// - Precondition: This squence is sorted according to the given predicate, + /// and cannot end with an infinite run of a single equivalence class. + /// + /// - Parameter areInIncreasingOrder: The sorting predicate. + /// + /// - Returns: A sequence that lazily generates the first element of + /// each equivalence class present in this sequence paired with + /// the number of occurances for that class. + @inlinable + public func countSortedDuplicates( + by areInIncreasingOrder: @escaping (Element, Element) -> Bool + ) -> LazyCountDuplicatesSequence { + .init(elements, by: areInIncreasingOrder) + } + + /// Assuming this sequence is already sorted along the given predicate, + /// return a sequence that will lazily vend each unique element. + /// + /// - Precondition: This squence is sorted according to the given predicate, + /// and cannot end with an infinite run of a single equivalence class. + /// + /// - Parameter areInIncreasingOrder: The sorting predicate. + /// + /// - Returns: A sequence that lazily generates the first element of + /// each equivalence class present in this sequence. + @inlinable + public func deduplicateSorted( + by areInIncreasingOrder: @escaping (Element, Element) -> Bool + ) -> some (Sequence & LazySequenceProtocol) { + countSortedDuplicates(by: areInIncreasingOrder).lazy.map(\.value) + } +} + +extension LazySequenceProtocol where Element: Comparable { + /// Assuming this sequence is already sorted, + /// return an array of each unique value paired with its number of + /// occurances. + /// + /// - Precondition: This sequence is sorted, + /// and cannot end with an infinite run of a single value. + /// + /// - Returns: A sequence that lazily generates the first element of + /// each value paired with the count of that value's occurances. + @inlinable + public func countSortedDuplicates() -> LazyCountDuplicatesSequence { + countSortedDuplicates(by: <) + } + + /// Assuming this sequence is already sorted, + /// return a sequence that will lazily vend each unique value. + /// + /// - Precondition: This sequence is sorted, + /// and cannot end with an infinite run of a single value. + /// + /// - Returns: A sequence that lazily generates the first element of + /// each value. + @inlinable + public func deduplicateSorted() -> some ( + Sequence & LazySequenceProtocol + ) { + deduplicateSorted(by: <) + } +} + +// MARK: - Sequence + +/// Lazily vends the count of each run of duplicate values from +/// a sorted source. +public struct LazyCountDuplicatesSequence { + /// The predicate for which `base` is sorted by. + let areInIncreasingOrder: (Base.Element, Base.Element) throws -> Bool + /// The source of elements, which must be sorted by `areInIncreasingOrder`. + var base: Base + + /// Creates a sequence based on the given sequence, + /// which must be sorted by the given predicate, + /// that'll vend each unique element value and that value's appearance count. + @usableFromInline + init( + _ base: Base, + by areInIncreasingOrder: @escaping (Base.Element, Base.Element) throws -> + Bool + ) { + self.base = base + self.areInIncreasingOrder = areInIncreasingOrder + } +} + +extension LazyCountDuplicatesSequence: LazySequenceProtocol { + public var underestimatedCount: Int { + base.underestimatedCount.signum() + } + + public func makeIterator() -> CountDuplicatesIterator { + .init(base.makeIterator(), by: areInIncreasingOrder) + } +} + +// MARK: - Iterator + +/// Vends the count of each run of duplicate values from a sorted source. +public struct CountDuplicatesIterator { + /// The predicate for which `base` is sorted by. + let areInIncreasingOrder: (Base.Element, Base.Element) throws -> Bool + /// The source of elements, which must be sorted by `areInIncreasingOrder`. + var base: Base + /// The last element read, for comparisons. + var mostRecent: Base.Element? + + /// Creates an iterator based on the given iterator, + /// whose virtual sequence must be sorted by the given predicate, + /// which counts the length of each run of duplicate values. + init( + _ base: Base, + by areInIncreasingOrder: @escaping (Base.Element, Base.Element) throws -> + Bool + ) { + self.base = base + self.areInIncreasingOrder = areInIncreasingOrder + } +} + +extension CountDuplicatesIterator: IteratorProtocol { + public mutating func next() -> (value: Base.Element, count: Int)? { + // NOTE: This method is called only when the predicate isn't `throw`-ing, + // so the forced `try` is OK. + try! throwingNext() + } + + /// Extracts the next element that isn't equivalent to + /// the last unique one extracted. + mutating func throwingNext() throws -> Element? { + mostRecent = mostRecent ?? base.next() + guard let last = mostRecent else { return nil } + + var count = 1 + while let current = base.next() { + if try areInIncreasingOrder(last, current) { + mostRecent = current + return (last, count) + } else { + count += 1 + } + } + mostRecent = nil + return (last, count) + } +} diff --git a/Tests/SwiftAlgorithmsTests/SortedDuplicatesTests.swift b/Tests/SwiftAlgorithmsTests/SortedDuplicatesTests.swift new file mode 100644 index 00000000..824b7300 --- /dev/null +++ b/Tests/SwiftAlgorithmsTests/SortedDuplicatesTests.swift @@ -0,0 +1,91 @@ +//===----------------------------------------------------------------------===// +// +// This source file is part of the Swift Algorithms open source project +// +// Copyright (c) 2025 Apple Inc. and the Swift project authors +// Licensed under Apache License v2.0 with Runtime Library Exception +// +// See https://swift.org/LICENSE.txt for license information +// +//===----------------------------------------------------------------------===// + +import XCTest + +@testable import Algorithms + +final class SortedDuplicatesTests: XCTestCase { + /// Test counting over an empty sequence. + func testEmpty() { + let emptyString = "" + let emptyStringCounts = emptyString.countSortedDuplicates() + expectEqualCollections(emptyStringCounts.map(\.value), []) + expectEqualCollections(emptyStringCounts.map(\.count), []) + expectEqualCollections(emptyString.deduplicateSorted(), []) + + let lazyEmptyStringCounts = emptyString.lazy.countSortedDuplicates() + expectEqualSequences(lazyEmptyStringCounts.map(\.value), []) + expectEqualSequences(lazyEmptyStringCounts.map(\.count), []) + expectEqualSequences(emptyString.lazy.deduplicateSorted(), []) + } + + /// Test counting over a single-element sequence. + func testSingle() { + let aString = "a" + let aStringCounts = aString.countSortedDuplicates() + let aStringValues = aString.deduplicateSorted() + expectEqualCollections(aStringCounts.map(\.value), ["a"]) + expectEqualCollections(aStringCounts.map(\.count), [1]) + expectEqualCollections(aStringValues, ["a"]) + + let lazyAStringCounts = aString.lazy.countSortedDuplicates() + expectEqualSequences(lazyAStringCounts.map(\.value), ["a"]) + expectEqualSequences(lazyAStringCounts.map(\.count), [1]) + expectEqualSequences(aString.lazy.deduplicateSorted(), ["a"]) + } + + /// Test counting over a repeated element. + func testRepeat() { + let count = 20 + let letters = repeatElement("b" as Character, count: count) + let lettersCounts = letters.countSortedDuplicates() + let lazyLettersCounts = letters.lazy.countSortedDuplicates() + expectEqualCollections(lettersCounts.map(\.value), ["b"]) + expectEqualCollections(lettersCounts.map(\.count), [count]) + expectEqualCollections(letters.deduplicateSorted(), ["b"]) + expectEqualSequences(lazyLettersCounts.map(\.value), ["b"]) + expectEqualSequences(lazyLettersCounts.map(\.count), [count]) + expectEqualSequences(letters.lazy.deduplicateSorted(), ["b"]) + } + + /// Test multiple elements. + func testMultiple() { + let sample = "Xacccddffffxzz" + let sampleCounts = sample.countSortedDuplicates() + let expected: [(value: Character, count: Int)] = [ + ("X", 1), + ("a", 1), + ("c", 3), + ("d", 2), + ("f", 4), + ("x", 1), + ("z", 2), + ] + expectEqualCollections(sampleCounts.map(\.value), expected.map(\.0)) + expectEqualCollections(sampleCounts.map(\.count), expected.map(\.1)) + expectEqualCollections(sample.deduplicateSorted(), "Xacdfxz") + + let lazySampleCounts = sample.lazy.countSortedDuplicates() + expectEqualSequences(lazySampleCounts.map(\.value), expected.map(\.0)) + expectEqualSequences(lazySampleCounts.map(\.count), expected.map(\.1)) + expectEqualSequences(sample.lazy.deduplicateSorted(), "Xacdfxz") + } + + /// Test the example code from the Overview. + func testOverviewExample() { + let numbers = [0, 1, 2, 2, 2, 3, 5, 6, 6, 9, 10, 10] + let deduplicated = numbers.deduplicateSorted() + // Array(deduplicated) == [0, 1, 2, 3, 5, 6, 9, 10] + + expectEqualSequences(deduplicated, [0, 1, 2, 3, 5, 6, 9, 10]) + } +}