Skip to content

feature: inspect first/outer "kind" without full decode #440

@extemporalgenome

Description

@extemporalgenome

Is your feature request related to a problem? Please describe.

A nice property of the json.RawMessage design is that it's fairly trivial to safely inspect the broad kind of JSON data with:

// A properly decoded json.RawMessage always
// starts with a non-space token byte.
switch theRawMessage[0] {
case '{': // object
case '[': // array
case '"': // string
case 'n': // null
case 'f': // false
case 't': // true
default:  // number
}

This can also be done by inspecting leading bytes of a cbor.RawMessage, but there are many more leading bytes, and they're much less memorable (i.e. the application would need to implement a partial CBOR decoder to work around this package not providing kind detection as a cheap capability).

Decoding into any to just check the kind is often undesirable because:

  1. It's expensive, especially in terms of garbage.
  2. The contract is not stable, and hard to exhaustively account using type assertions, since DecOptions can yield uint64 vs int64 variations, many possible map and slice combinations, etc. Use of reflect provides more stability, but is unwieldy.

Describe the solution you'd like

Introduce a cbor.Kind type, with values like cbor.KindInt. It's unclear if distinctions between int vs uint vs big int, or the different size variants, should be represented, though bit field style constants (i.e. cbor.KindNumber = cbor.KindInt | cbor.KindFloat | ..., cbor.KindInt = cbor.KindInt8 | ...), or helper methods (func (Kind) IsNumber() bool) could solve for this.

A func DetectKind([]byte) (Kind, error) function could be used to obtain a Kind value. If there's a const KindInvalid Kind = 0 available, then such a function would not need to return an error.

A companion DetectTagKind function which returns a (uint64, Kind) (or similar), may also be useful.

Describe alternatives you've considered

It seems there is a branch or effort to expose a streaming tokenizer. If so, that could provide equivalent functionality, where the above case would merely involve a peek at the next token, potentially followed by a normal decode or token consumption.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions