Skip to content

What is count in a change object? #210

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rococode opened this issue Dec 23, 2017 · 4 comments · Fixed by #458
Closed

What is count in a change object? #210

rococode opened this issue Dec 23, 2017 · 4 comments · Fixed by #458

Comments

@rococode
Copy link

rococode commented Dec 23, 2017

What do those counts mean? Unless I'm missing something it doesn't seem to be mentioned in the spec at all. In the image above I'm using word diff.

Edit: Oh, it seems it's a count of the type that was diffed? So in this case it's a count of "words" (except spaces are counted too) and for a diffChars it counts characters. What's a potential use case for that info?

@Dzenly
Copy link

Dzenly commented Nov 11, 2018

I also was trying to analyze it. It seems that 'count' is the tokens count.

Tokens can be such as:

  • word
  • whitespaces groups
  • punctuation signs
  • etc.

E.g. for diffWordsWithSpace you can set breakpoint to

node_modules/diff/lib/diff/base.js:38

on the string:

var newLen = newString.length,

and see newString content, it keeps tokens.

@ExplodingCabbage
Copy link
Collaborator

ExplodingCabbage commented Jan 2, 2024

What's a potential use case for that info?

You can add up the numbers of added/removed tokens up to show summary stats like GitHub does on pull requests:

image

It's also useful for understanding how the underlying algorithm works (in particular how it is tokenizing stuff), especially if you're getting results that intuitively seem like a sub-optimal diff! For instance, in the issue description you figured out from these numbers that diffWords treats spaces between words as tokens (something which I think is kinda broken behaviour, tbh, with bad consequences beyond the count values).

I can also imagine hypothetical uses like showing in some kind of review tool that the user has so far reviewed (or scrolled past) x% of all inserted content and y% of all deleted content, though I've not seen an example of such functionality in the wild.

@ExplodingCabbage
Copy link
Collaborator

Anyway, you're right that the docs don't mention this, and that seems bad. We should document count. I'll probably do this and #149 in one swoop since both of these issues depend upon documenting a more fundamental point: that each diffing function exposed by the library has its own way of splitting the input into "tokens" and that the core diffing algorithm then acts on the series of tokens. It probably makes sense to write up all of that in one go.

@ExplodingCabbage
Copy link
Collaborator

By the way, @rococode, I'd love any feedback you have on the updates to the README I've made to address this. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants