Skip to content

Feature request: intelligent diff #16

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
callumacrae opened this issue Feb 22, 2013 · 12 comments
Closed

Feature request: intelligent diff #16

callumacrae opened this issue Feb 22, 2013 · 12 comments

Comments

@callumacrae
Copy link

Hi,

Thanks for the great library! I have one problem with it, though. With larger changes, this library just falls to pieces. For example, running this:

var start = 'This is some test copy which is designed to demo stuff as text is deleted and modifyd. Make a suggestion here.',
    end = 'Hello, please do not copy this text';

JsDiff.diffWords(start, end);

Produces this:

Screen Shot 2013-02-22 at 18 34 06

In the majority of cases, I want either word diffs or character diffs, but as larger changes are made (which isn't as common), I find it unpleasant. A character replacement here is even worse.

Would it possible to add an "intelligent" diff mode which will detect the density of diffs in a word or sentence and run, say, a sentence diff on that sentence instead?

Thanks a lot!

@harley
Copy link

harley commented Oct 11, 2013

I think this can be done by:

  • having a function diffSentences (split by . -- only applicable in natural languages, of course)
  • having a hybrid diffWordsOrSentences mode: if two sentences don't have any word in common (we can change this threshold to n words), use diffSentences, otherwise use diffWords for that sentence.

thoughts?

@matanox
Copy link

matanox commented Aug 6, 2014

@harleyttd splitting sentences is more versatile than just that single test, and different types of text warrant different tactics for it.

From experience, I think this should be handled by user code.
Having said that, trying Google's diff library may fit anybody's taste. Otherwise, in user code, check how many changes, and decide whether to output a line diff or a word diff according to some arbitrary or original threshold, after tokenizing sentences if so desired.

@captbaritone
Copy link

Google's diff library to solved this problem for me, but the link above is bad:

https://code.google.com/p/google-diff-match-patch/

@LNFWebsite
Copy link

I agree... This would be a monumental feature.

I've been using Google's Diff Match Patch, however, ran into problems where it omitted real changes that were made between comparison texts.

JSDiff is built much better IMO, but falls short of Google's intelligent semantic cleanup...

@LNFWebsite
Copy link

Kevin, could you give us an idea of when and if you will be adding this to jsDiff? I'd appreciate it.

@kpdecker
Copy link
Owner

@HelpingHand1 this isn't really on my radar right now. I'd be glad to look at pull requests if someone wanted to implement, of course.

@joallard
Copy link

Did anyone take this up? There are probably already algorithms to look at we could inspire ourselves from, Git off the top of my head.

@LNFWebsite
Copy link

I haven't. In the meantime, I've given my users the ability to manually choose which type of diff they would like to see, but this feature would be optimal...

@hubgit
Copy link

hubgit commented Oct 4, 2016

In cases like this, would it make sense to merge consecutive word diffs of the same type, including the whitespace/punctuation between them?

@johnloven
Copy link

I've solved this by re-arranging the resulting array of diffs to chain diffs of the same type.

function diffWordsPretty(words1, words2) {
    const uglyDiffs = diffWords(words1, words2);
    let result = [];
    let added = [];
    let removed = [];
    uglyDiffs.forEach(diff => {
        if (diff.value.split("").every(c => c === " ") && !diff.removed && !diff.added) {
            added.push({ ...diff, added: true });
            removed.push({ ...diff, removed: true });
        } else if (diff.added) {
            added.push(diff);
        } else if (diff.removed) {
            removed.push(diff);
        } else {
            result = [ ...result, ...removed, ...added, diff ];
            added = [];
            removed = [];
        }
    });
    result = [ ...result, ...removed, ...added ];
    return result;
}

@ExplodingCabbage
Copy link
Collaborator

Hmm. I've recently offered to help out maintaining jsdiff and am looking through old issues. I think this issue is really two issues in one.

On the one hand we have a clear observation of a problem I agree is real and would be nice to fix: word diffs of texts that have been substantially rewritten are a nightmare to read because they alternate between word removed, word added, word removed, word added etc; it'd be nicer if we got a chain of word removals followed by a chain of word additions. (@johnloven presents a workaround to achieve this.)

On the other hand, we have the titular request for a diff that automatically decides between character-level, word-level, or sentence-level diffs based on the density of changes.

I think my answer to the actual, specific request in this issue is "no", since:

  • we don't have a clear algorithm for how it would work. @harley's suggestion upthread isn't clear to me; in order for it to work I guess we would have to diff sentence by sentence? But then what happens if a sentence is inserted in the middle of an otherwise unchanged text? Seems we'd get horrible outcomes...
  • I'm not personally interested in devising one and upthread @kpdecker indicated he wasn't either
  • nobody has stepped up to provide one in a decade
  • having any significantly innovative algorithms in this library would expand its scope beyond basically being a Myers diff implementation with a bunch of config options and wrapper methods, and I don't want to be responsible for maintaining something with that bigger and more ambiguous scope; I'd rather anyone who wants to do anything algorithmically innovative makes their own library

For that reason, I'm going to close this issue. However, I wonder if we can nonetheless fix the problem that motivated this feature request, perhaps by baking logic like @johnloven's into jsdiff. I'll think about it some more...

@ExplodingCabbage ExplodingCabbage closed this as not planned Won't fix, can't repro, duplicate, stale Dec 18, 2023
@ExplodingCabbage
Copy link
Collaborator

FWIW it looks to me at a glance like @gdavoianb's proposal at #160 is very similar in nature to @johnloven's in this thread, and aims at solving the very same problem that this issue was motivated by. I haven't studied it in detail yet, but I will try to find time to do so in due course.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants