Feature request: intelligent diff #16

callumacrae · 2013-02-22T18:37:05Z

Hi,

Thanks for the great library! I have one problem with it, though. With larger changes, this library just falls to pieces. For example, running this:

var start = 'This is some test copy which is designed to demo stuff as text is deleted and modifyd. Make a suggestion here.',
    end = 'Hello, please do not copy this text';

JsDiff.diffWords(start, end);

Produces this:

In the majority of cases, I want either word diffs or character diffs, but as larger changes are made (which isn't as common), I find it unpleasant. A character replacement here is even worse.

Would it possible to add an "intelligent" diff mode which will detect the density of diffs in a word or sentence and run, say, a sentence diff on that sentence instead?

Thanks a lot!

harley · 2013-10-11T23:28:11Z

I think this can be done by:

having a function diffSentences (split by . -- only applicable in natural languages, of course)
having a hybrid diffWordsOrSentences mode: if two sentences don't have any word in common (we can change this threshold to n words), use diffSentences, otherwise use diffWords for that sentence.

thoughts?

matanox · 2014-08-06T18:51:58Z

@harleyttd splitting sentences is more versatile than just that single test, and different types of text warrant different tactics for it.

From experience, I think this should be handled by user code.
Having said that, trying Google's diff library may fit anybody's taste. Otherwise, in user code, check how many changes, and decide whether to output a line diff or a word diff according to some arbitrary or original threshold, after tokenizing sentences if so desired.

captbaritone · 2015-04-21T21:05:19Z

Google's diff library to solved this problem for me, but the link above is bad:

https://code.google.com/p/google-diff-match-patch/

LNFWebsite · 2015-06-01T22:23:23Z

I agree... This would be a monumental feature.

I've been using Google's Diff Match Patch, however, ran into problems where it omitted real changes that were made between comparison texts.

JSDiff is built much better IMO, but falls short of Google's intelligent semantic cleanup...

LNFWebsite · 2015-07-08T22:10:32Z

Kevin, could you give us an idea of when and if you will be adding this to jsDiff? I'd appreciate it.

kpdecker · 2015-07-11T06:38:40Z

@HelpingHand1 this isn't really on my radar right now. I'd be glad to look at pull requests if someone wanted to implement, of course.

joallard · 2015-11-30T05:10:53Z

Did anyone take this up? There are probably already algorithms to look at we could inspire ourselves from, Git off the top of my head.

LNFWebsite · 2015-12-01T21:30:58Z

I haven't. In the meantime, I've given my users the ability to manually choose which type of diff they would like to see, but this feature would be optimal...

hubgit · 2016-10-04T23:08:37Z

In cases like this, would it make sense to merge consecutive word diffs of the same type, including the whitespace/punctuation between them?

johnloven · 2019-06-12T11:52:48Z

I've solved this by re-arranging the resulting array of diffs to chain diffs of the same type.

function diffWordsPretty(words1, words2) {
    const uglyDiffs = diffWords(words1, words2);
    let result = [];
    let added = [];
    let removed = [];
    uglyDiffs.forEach(diff => {
        if (diff.value.split("").every(c => c === " ") && !diff.removed && !diff.added) {
            added.push({ ...diff, added: true });
            removed.push({ ...diff, removed: true });
        } else if (diff.added) {
            added.push(diff);
        } else if (diff.removed) {
            removed.push(diff);
        } else {
            result = [ ...result, ...removed, ...added, diff ];
            added = [];
            removed = [];
        }
    });
    result = [ ...result, ...removed, ...added ];
    return result;
}

ExplodingCabbage · 2023-12-18T19:22:28Z

Hmm. I've recently offered to help out maintaining jsdiff and am looking through old issues. I think this issue is really two issues in one.

On the one hand we have a clear observation of a problem I agree is real and would be nice to fix: word diffs of texts that have been substantially rewritten are a nightmare to read because they alternate between word removed, word added, word removed, word added etc; it'd be nicer if we got a chain of word removals followed by a chain of word additions. (@johnloven presents a workaround to achieve this.)

On the other hand, we have the titular request for a diff that automatically decides between character-level, word-level, or sentence-level diffs based on the density of changes.

I think my answer to the actual, specific request in this issue is "no", since:

we don't have a clear algorithm for how it would work. @harley's suggestion upthread isn't clear to me; in order for it to work I guess we would have to diff sentence by sentence? But then what happens if a sentence is inserted in the middle of an otherwise unchanged text? Seems we'd get horrible outcomes...
I'm not personally interested in devising one and upthread @kpdecker indicated he wasn't either
nobody has stepped up to provide one in a decade
having any significantly innovative algorithms in this library would expand its scope beyond basically being a Myers diff implementation with a bunch of config options and wrapper methods, and I don't want to be responsible for maintaining something with that bigger and more ambiguous scope; I'd rather anyone who wants to do anything algorithmically innovative makes their own library

For that reason, I'm going to close this issue. However, I wonder if we can nonetheless fix the problem that motivated this feature request, perhaps by baking logic like @johnloven's into jsdiff. I'll think about it some more...

ExplodingCabbage · 2023-12-18T20:01:59Z

FWIW it looks to me at a glance like @gdavoianb's proposal at #160 is very similar in nature to @johnloven's in this thread, and aims at solving the very same problem that this issue was motivated by. I haven't studied it in detail yet, but I will try to find time to do so in due course.

ExplodingCabbage closed this as completed Dec 18, 2023

ExplodingCabbage closed this as not planned Won't fix, can't repro, duplicate, stale Dec 18, 2023

ExplodingCabbage added the diffWords behaviour label Dec 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: intelligent diff #16

Feature request: intelligent diff #16

callumacrae commented Feb 22, 2013

harley commented Oct 11, 2013

matanox commented Aug 6, 2014

captbaritone commented Apr 21, 2015

LNFWebsite commented Jun 1, 2015

LNFWebsite commented Jul 8, 2015

kpdecker commented Jul 11, 2015

joallard commented Nov 30, 2015

LNFWebsite commented Dec 1, 2015

hubgit commented Oct 4, 2016 •

edited

Loading

johnloven commented Jun 12, 2019

ExplodingCabbage commented Dec 18, 2023

ExplodingCabbage commented Dec 18, 2023

Feature request: intelligent diff #16

Feature request: intelligent diff #16

Comments

callumacrae commented Feb 22, 2013

harley commented Oct 11, 2013

matanox commented Aug 6, 2014

captbaritone commented Apr 21, 2015

LNFWebsite commented Jun 1, 2015

LNFWebsite commented Jul 8, 2015

kpdecker commented Jul 11, 2015

joallard commented Nov 30, 2015

LNFWebsite commented Dec 1, 2015

hubgit commented Oct 4, 2016 • edited Loading

johnloven commented Jun 12, 2019

ExplodingCabbage commented Dec 18, 2023

ExplodingCabbage commented Dec 18, 2023

hubgit commented Oct 4, 2016 •

edited

Loading