Skip to content

Non Sendable Hub.Config #172

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
piotrkowalczuk opened this issue Feb 6, 2025 · 4 comments
Open

Non Sendable Hub.Config #172

piotrkowalczuk opened this issue Feb 6, 2025 · 4 comments
Labels
hub Related to the Swift -> Hugging Face Hub integration

Comments

@piotrkowalczuk
Copy link

The current implementation of Hub.Config can't be marked as Sendable, which restricts all types referencing Config from being passed across concurrency boundaries. This limitation prevents efficient use of Tokenizer in structured concurrency contexts, such as Task {}, within actors or SwiftUI. Therefore causes friction while adopting Swift 6, or some iOS features like AppIntent which requires passing concurrency boundaries.

Observations

  • Config internally uses Dictionary that in combination with NSString is inherently non Sendable.
  • Unlike NSString, String enforces unwanted unicode canonical equivalence.
@pcuenca
Copy link
Member

pcuenca commented Feb 6, 2025

Unlike NSString, String enforces unwanted unicode canonical equivalence.

Exactly, we had to move to NSString for that reason.

How would you suggest to go about this?

@piotrkowalczuk
Copy link
Author

piotrkowalczuk commented Feb 6, 2025

One way to approach it would be to introduce a custom type as an "in-place" replacement for NSString:

public struct BinaryDistinctString: Equatable, Hashable, Sendable, CustomStringConvertible, ExpressibleByStringLiteral {
    private let value: [UInt8]

    public var str: NSString {
        return NSString(data: Data(self.value), encoding: String.Encoding.utf8.rawValue)!
    }

    public var description: String {
        return String(self.str)
    }

    public init(_ str: NSString) {
        if let data = str.data(using: String.Encoding.utf8.rawValue) {
            self.value = [UInt8](data)

            return
        }

        self.value = []
    }

    public init(_ str: String) {
        self.init(str as NSString)
    }

    public init(stringLiteral value: String) {
        self.init(value)
    }

    public static func == (lhs: BinaryDistinctString, rhs: BinaryDistinctString) -> Bool {
        return lhs.value == rhs.value
    }
}

extension BinaryDistinctString: Collection {
    // Collection Protocol Conformance
    public typealias Index = Int
    public typealias Element = String

    public var startIndex: Index { value.startIndex }
    public var endIndex: Index { value.endIndex }

    public func index(after i: Index) -> Index {
        return value.index(after: i)
    }

    public subscript(position: Index) -> Element {
        return Element(value[position])
    }
}

this in combination with some minor changes to Config

  • new constructors
  • implements Sendable protocol
  • dictionary storesSendable instead of Any under BinaryDistinctString instead of NSString
@dynamicMemberLookup
public struct Config: Sendable {
    public private(set) var dictionary: [BinaryDistinctString: Sendable]

    public init(_ data: Data) throws {
        self.dictionary = try Config.jsonObjectWithBinaryDistinctKeys(from: data)
    }

    public init(_ dictionary: [NSString: Any]) {
        if let dict = Config.convertToBinaryDistinctKeys(dictionary as Any) as? [BinaryDistinctString: Any] {
            self.dictionary = dict
            return
        }

        self.dictionary = [:]
    }

    public init(_ dictionary: [BinaryDistinctString: Any]) {
        self.dictionary = dictionary
    }
    
    // ...

    private static func convertToBinaryDistinctKeys(_ object: Any) -> Any {
        if let dict = object as? [NSString: Any] {
            return Dictionary(uniqueKeysWithValues: dict.map { (BinaryDistinctString($0.key), convertToBinaryDistinctKeys($0.value)) })
        } else if let array = object as? [Any] {
            return array.map { convertToBinaryDistinctKeys($0) }
        } else {
            return object  // Keep primitive values (String, Int, Bool, etc.) unchanged
        }
    }

    private static func jsonObjectWithBinaryDistinctKeys(from data: Data) throws -> [BinaryDistinctString: Any] {
        let jsonObject = try JSONSerialization.jsonObject(with: data, options: [])
        guard let dict = convertToBinaryDistinctKeys(jsonObject) as? [BinaryDistinctString: Any] else {
            throw NSError(domain: "Invalid JSON structure", code: -1, userInfo: nil)
        }
        return dict
    }
}

allowed testConfigUnicode to pass

    func testConfigUnicode() {
        // These are two different characters
        let json = "{\"vocab\": {\"à\": 1, \"à\": 2}}"
        let data = json.data(using: .utf8)
        let dict = try! JSONSerialization.jsonObject(with: data!, options: []) as! [NSString: Any]
        let config = Config(dict)

        let vocab_nsdict = config.dictionary["vocab"] as! NSDictionary
        let vocab_nsstring = config.dictionary["vocab"] as! [BinaryDistinctString: Int]
        let vocab = config.vocab!.dictionary

        XCTAssertEqual(vocab_nsdict.count, 2)
        XCTAssertEqual(vocab_nsstring.count, 2)
        XCTAssertEqual(vocab.count, 2)
    }

On top of that:

  • it would be necessary to replace inheritance with composition in many places. Only the final class can conform to Sendable.
  • The Downloader would have to switch from the CurrentValueSubject to AsyncStream and get the async/await API.

WIP: https://github.com/piotrkowalczuk/swift-transformers/tree/sendable-config

WDYT?

@pcuenca
Copy link
Member

pcuenca commented Feb 11, 2025

cc @FL33TW00D, @greenrazer, this is quite interesting.

In fact, these guidelines look like half a PR already :) Would you like to flesh them out in a PR @piotrkowalczuk? It's of course totally ok if not.

This would most probably mean breaking API changes, we can do it as part of a 0.2.0 release maybe (which would also potentially include #168).

@FL33TW00D
Copy link
Collaborator

@piotrkowalczuk Great ideas here! Would be happy to review a PR doing this.

@FL33TW00D FL33TW00D added the hub Related to the Swift -> Hugging Face Hub integration label Feb 20, 2025
@piotrkowalczuk piotrkowalczuk mentioned this issue Mar 7, 2025
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hub Related to the Swift -> Hugging Face Hub integration
Projects
None yet
Development

No branches or pull requests

3 participants