diff --git a/README.md b/README.md
index 72fe9cafb..2960ef599 100644
--- a/README.md
+++ b/README.md
@@ -36,16 +36,14 @@ Broadly, jsdiff's diff functions all take an old text and a new text and perform
Options
* `ignoreCase`: If `true`, the uppercase and lowercase forms of a character are considered equal. Defaults to `false`.
-* `Diff.diffWords(oldStr, newStr[, options])` - diffs two blocks of text, treating each word and each word separator (punctuation, newline, or run of whitespace) as a token.
-
- (Whitespace-only tokens are automatically treated as equal to each other, so changes like changing a space to a newline or a run of multiple spaces will be ignored.)
+* `Diff.diffWords(oldStr, newStr[, options])` - diffs two blocks of text, treating each word and each punctuation mark as a token. Whitespace is ignored when computing the diff (but preserved as far as possible in the final change objects).
Returns a list of [change objects](#change-objects).
Options
* `ignoreCase`: Same as in `diffChars`. Defaults to false.
-* `Diff.diffWordsWithSpace(oldStr, newStr[, options])` - same as `diffWords`, except whitespace-only tokens are not automatically considered equal, so e.g. changing a space to a tab is considered a change.
+* `Diff.diffWordsWithSpace(oldStr, newStr[, options])` - diffs two blocks of text, treating each word, punctuation mark, newline, or run of (non-newline) whitespace as a token.
* `Diff.diffLines(oldStr, newStr[, options])` - diffs two blocks of text, treating each line as a token.
@@ -184,6 +182,7 @@ For even more customisation of the diffing behavior, you can create a `new Diff.
* `removeEmpty(array)`: called on the arrays of tokens returned by `tokenize` and can be used to modify them. Defaults to stripping out falsey tokens, such as empty strings. `diffArrays` overrides this to simply return the `array`, which means that falsey values like empty strings can be handled like any other token by `diffArrays`.
* `equals(left, right, options)`: called to determine if two tokens (one from the old string, one from the new string) should be considered equal. Defaults to comparing them with `===`.
* `join(tokens)`: gets called with an array of consecutive tokens that have either all been added, all been removed, or are all common. Needs to join them into a single value that can be used as the `value` property of the [change object](#change-objects) for these tokens. Defaults to simply returning `tokens.join('')`.
+* `postProcess(changeObjects)`: gets called at the end of the algorithm with the [change objects](#change-objects) produced, and can do final cleanups on them. Defaults to simply returning `changeObjects` unchanged.
### Change Objects
Many of the methods above return change objects. These objects consist of the following fields:
diff --git a/release-notes.md b/release-notes.md
index 08b41a06b..aff7d3461 100644
--- a/release-notes.md
+++ b/release-notes.md
@@ -4,17 +4,25 @@
[Commits](https://github.com/kpdecker/jsdiff/compare/master...v6.0.0-staging)
-- [#435](https://github.com/kpdecker/jsdiff/pull/435) Fix `parsePatch` handling of control characters. `parsePatch` used to interpret various unusual control characters - namely vertical tabs, form feeds, lone carriage returns without a line feed, and EBCDIC NELs - as line breaks when parsing a patch file. This was inconsistent with the behavior of both JsDiff's own `diffLines` method and also the Unix `diff` and `patch` utils, which all simply treat those control characters as ordinary characters. The result of this discrepancy was that some well-formed patches - produced either by `diff` or by JsDiff itself and handled properly by the `patch` util - would be wrongly parsed by `parsePatch`, with the effect that it would disregard the remainder of a hunk after encountering one of these control characters.
+- [#497](https://github.com/kpdecker/jsdiff/pull/497) **`diffWords` behavior has been radically changed.** Previously, even with `ignoreWhitespace: true`, runs of whitespace were tokens, which led to unhelpful and unintuitive diffing behavior in typical texts. Specifically, even when two texts contained overlapping passages, `diffWords` would sometimes choose to delete all the words from the old text and insert them anew in their new positions in order to avoid having to delete or insert whitespace tokens. Whitespace sequences are no longer tokens as of this release, which affects both the generated diffs and the `count`s.
+
+ Runs of whitespace are still tokens in `diffWordsWithSpace`.
+
+ As part of the changes to `diffWords`, **a new `.postProcess` method has been added on the base `Diff` type**, which can be overridden in custom `Diff` implementations.
+
+ **`diffLines` with `ignoreWhitespace: true` will no longer ignore the insertion or deletion of entire extra lines of whitespace at the end of the text**. Previously, these would not show up as insertions or deletions, as a side effect of a hack in the base diffing algorithm meant to help ignore whitespace in `diffWords`. More generally, **the undocumented special handling in the core algorithm for ignored terminals has been removed entirely.** (This special case behavior used to rewrite the final two change objects in a scenario where the final change object was an addition or deletion and its `value` was treated as equal to the empty string when compared using the diff object's `.equals` method.)
+
- [#500](https://github.com/kpdecker/jsdiff/pull/500) **`diffChars` now diffs Unicode code points** instead of UTF-16 code units.
-- [#439](https://github.com/kpdecker/jsdiff/pull/439) Prefer diffs that order deletions before insertions. When faced with a choice between two diffs with an equal total edit distance, the Myers diff algorithm generally prefers one that does deletions before insertions rather than insertions before deletions. For instance, when diffing `abcd` against `acbd`, it will prefer a diff that says to delete the `b` and then insert a new `b` after the `c`, over a diff that says to insert a `c` before the `b` and then delete the existing `c`. JsDiff deviated from the published Myers algorithm in a way that led to it having the opposite preference in many cases, including that example. This is now fixed, meaning diffs output by JsDiff will more accurately reflect what the published Myers diff algorithm would output.
-- [#455](https://github.com/kpdecker/jsdiff/pull/455) The `added` and `removed` properties of change objects are now guaranteed to be set to a boolean value. (Previously, they would be set to `undefined` or omitted entirely instead of setting them to false.)
+- [#435](https://github.com/kpdecker/jsdiff/pull/435) **Fix `parsePatch` handling of control characters.** `parsePatch` used to interpret various unusual control characters - namely vertical tabs, form feeds, lone carriage returns without a line feed, and EBCDIC NELs - as line breaks when parsing a patch file. This was inconsistent with the behavior of both JsDiff's own `diffLines` method and also the Unix `diff` and `patch` utils, which all simply treat those control characters as ordinary characters. The result of this discrepancy was that some well-formed patches - produced either by `diff` or by JsDiff itself and handled properly by the `patch` util - would be wrongly parsed by `parsePatch`, with the effect that it would disregard the remainder of a hunk after encountering one of these control characters.
+- [#439](https://github.com/kpdecker/jsdiff/pull/439) **Prefer diffs that order deletions before insertions.** When faced with a choice between two diffs with an equal total edit distance, the Myers diff algorithm generally prefers one that does deletions before insertions rather than insertions before deletions. For instance, when diffing `abcd` against `acbd`, it will prefer a diff that says to delete the `b` and then insert a new `b` after the `c`, over a diff that says to insert a `c` before the `b` and then delete the existing `c`. JsDiff deviated from the published Myers algorithm in a way that led to it having the opposite preference in many cases, including that example. This is now fixed, meaning diffs output by JsDiff will more accurately reflect what the published Myers diff algorithm would output.
+- [#455](https://github.com/kpdecker/jsdiff/pull/455) **The `added` and `removed` properties of change objects are now guaranteed to be set to a boolean value.** (Previously, they would be set to `undefined` or omitted entirely instead of setting them to false.)
- [#464](https://github.com/kpdecker/jsdiff/pull/464) Specifying `{maxEditLength: 0}` now sets a max edit length of 0 instead of no maximum.
-- [#460](https://github.com/kpdecker/jsdiff/pull/460) Added `oneChangePerToken` option.
-- [#467](https://github.com/kpdecker/jsdiff/pull/467) When passing a `comparator(left, right)` to `diffArrays`, values from the old array will now consistently be passed as the first argument (`left`) and values from the new array as the second argument (`right`). Previously this was almost (but not quite) always the other way round.
-- [#480](https://github.com/kpdecker/jsdiff/pull/480) Passing `maxEditLength` to `createPatch` & `createTwoFilesPatch` now works properly (i.e. returns undefined if the max edit distance is exceeded; previous behavior was to crash with a `TypeError` if the edit distance was exceeded).
-- [#486](https://github.com/kpdecker/jsdiff/pull/486) The `ignoreWhitespace` option of `diffLines` behaves more sensibly now. `value`s in returned change objects now include leading/trailing whitespace even when `ignoreWhitespace` is used, just like how with `ignoreCase` the `value`s still reflect the case of one of the original texts instead of being all-lowercase. `ignoreWhitespace` is also now compatible with `newlineIsToken`. Finally, `diffTrimmedLines` is deprecated (and removed from the docs) in favour of using `diffLines` with `ignoreWhitespace: true`; the two are, and always have been, equivalent.
-- [#490](https://github.com/kpdecker/jsdiff/pull/490) When calling diffing functions in async mode by passing a `callback` option, the diff result will now be passed as the *first* argument to the callback instead of the second. (Previously, the first argument was never used at all and would always have value `undefined`.)
-- [#489](github.com/kpdecker/jsdiff/pull/489) `this.options` no longer exists on `Diff` objects. Instead, `options` is now passed as an argument to methods that rely on options, like `equals(left, right, options)`. This fixes a race condition in async mode, where diffing behaviour could be changed mid-execution if a concurrent usage of the same `Diff` instances overwrote its `options`.
+- [#460](https://github.com/kpdecker/jsdiff/pull/460) **Added `oneChangePerToken` option.**
+- [#467](https://github.com/kpdecker/jsdiff/pull/467) **Consistent ordering of arguments to `comparator(left, right)`.** Values from the old array will now consistently be passed as the first argument (`left`) and values from the new array as the second argument (`right`). Previously this was almost (but not quite) always the other way round.
+- [#480](https://github.com/kpdecker/jsdiff/pull/480) **Passing `maxEditLength` to `createPatch` & `createTwoFilesPatch` now works properly** (i.e. returns undefined if the max edit distance is exceeded; previous behavior was to crash with a `TypeError` if the edit distance was exceeded).
+- [#486](https://github.com/kpdecker/jsdiff/pull/486) **The `ignoreWhitespace` option of `diffLines` behaves more sensibly now.** `value`s in returned change objects now include leading/trailing whitespace even when `ignoreWhitespace` is used, just like how with `ignoreCase` the `value`s still reflect the case of one of the original texts instead of being all-lowercase. `ignoreWhitespace` is also now compatible with `newlineIsToken`. Finally, **`diffTrimmedLines` is deprecated** (and removed from the docs) in favour of using `diffLines` with `ignoreWhitespace: true`; the two are, and always have been, equivalent.
+- [#490](https://github.com/kpdecker/jsdiff/pull/490) **When calling diffing functions in async mode by passing a `callback` option, the diff result will now be passed as the *first* argument to the callback instead of the second.** (Previously, the first argument was never used at all and would always have value `undefined`.)
+- [#489](github.com/kpdecker/jsdiff/pull/489) **`this.options` no longer exists on `Diff` objects.** Instead, `options` is now passed as an argument to methods that rely on options, like `equals(left, right, options)`. This fixes a race condition in async mode, where diffing behaviour could be changed mid-execution if a concurrent usage of the same `Diff` instances overwrote its `options`.
## v5.2.0
diff --git a/src/diff/base.js b/src/diff/base.js
index cc798a246..d58b5797e 100644
--- a/src/diff/base.js
+++ b/src/diff/base.js
@@ -11,6 +11,7 @@ Diff.prototype = {
let self = this;
function done(value) {
+ value = self.postProcess(value, options);
if (callback) {
setTimeout(function() { callback(value); }, 0);
return true;
@@ -41,7 +42,7 @@ Diff.prototype = {
let newPos = this.extractCommon(bestPath[0], newString, oldString, 0, options);
if (bestPath[0].oldPos + 1 >= oldLen && newPos + 1 >= newLen) {
// Identity per the equality and tokenizer
- return done(buildValues(self, bestPath[0].lastComponent, newString, oldString, self.useLongestToken, options));
+ return done(buildValues(self, bestPath[0].lastComponent, newString, oldString, self.useLongestToken));
}
// Once we hit the right edge of the edit graph on some diagonal k, we can
@@ -105,7 +106,7 @@ Diff.prototype = {
if (basePath.oldPos + 1 >= oldLen && newPos + 1 >= newLen) {
// If we have hit the end of both strings, then we are done
- return done(buildValues(self, basePath.lastComponent, newString, oldString, self.useLongestToken, options));
+ return done(buildValues(self, basePath.lastComponent, newString, oldString, self.useLongestToken));
} else {
bestPath[diagonalPath] = basePath;
if (basePath.oldPos + 1 >= oldLen) {
@@ -209,10 +210,13 @@ Diff.prototype = {
},
join(chars) {
return chars.join('');
+ },
+ postProcess(changeObjects) {
+ return changeObjects;
}
};
-function buildValues(diff, lastComponent, newString, oldString, useLongestToken, options) {
+function buildValues(diff, lastComponent, newString, oldString, useLongestToken) {
// First we convert our linked list of components in reverse order to an
// array in the right order:
const components = [];
@@ -256,22 +260,5 @@ function buildValues(diff, lastComponent, newString, oldString, useLongestToken,
}
}
- // Special case handle for when one terminal is ignored (i.e. whitespace).
- // For this case we merge the terminal into the prior string and drop the change.
- // This is only available for string mode.
- let finalComponent = components[componentLen - 1];
- if (
- componentLen > 1
- && typeof finalComponent.value === 'string'
- && (
- (finalComponent.added && diff.equals('', finalComponent.value, options))
- ||
- (finalComponent.removed && diff.equals(finalComponent.value, '', options))
- )
- ) {
- components[componentLen - 2].value += finalComponent.value;
- components.pop();
- }
-
return components;
}
diff --git a/src/diff/word.js b/src/diff/word.js
index 1319d93a0..ba8c682ea 100644
--- a/src/diff/word.js
+++ b/src/diff/word.js
@@ -1,5 +1,5 @@
import Diff from './base';
-import {generateOptions} from '../util/params';
+import { longestCommonPrefix, longestCommonSuffix, replacePrefix, replaceSuffix, removePrefix, removeSuffix, maximumOverlap } from '../util/string';
// Based on https://en.wikipedia.org/wiki/Latin_script_in_Unicode
//
@@ -21,12 +21,32 @@ import {generateOptions} from '../util/params';
// Latin Extended Additional, 1E00–1EFF
const extendedWordChars = 'a-zA-Z\\u{C0}-\\u{FF}\\u{D8}-\\u{F6}\\u{F8}-\\u{2C6}\\u{2C8}-\\u{2D7}\\u{2DE}-\\u{2FF}\\u{1E00}-\\u{1EFF}';
-// A token is any of the following:
-// * A newline (with or without a carriage return)
-// * A run of word characters
-// * A run of whitespace
-// * A single character that doesn't belong to any of the above categories (and is therefore considered punctuation)
-const tokenizeRegex = new RegExp(`\\r?\\n|[${extendedWordChars}]+|[^\\S\\r\\n]+|[^${extendedWordChars}]`, 'ug');
+// Each token is one of the following:
+// - A punctuation mark plus the surrounding whitespace
+// - A word plus the surrounding whitespace
+// - Pure whitespace (but only in the special case where this the entire text
+// is just whitespace)
+//
+// We have to include surrounding whitespace in the tokens because the two
+// alternative approaches produce horribly broken results:
+// * If we just discard the whitespace, we can't fully reproduce the original
+// text from the sequence of tokens and any attempt to render the diff will
+// get the whitespace wrong.
+// * If we have separate tokens for whitespace, then in a typical text every
+// second token will be a single space character. But this often results in
+// the optimal diff between two texts being a perverse one that preserves
+// the spaces between words but deletes and reinserts actual common words.
+// See https://github.com/kpdecker/jsdiff/issues/160#issuecomment-1866099640
+// for an example.
+//
+// Keeping the surrounding whitespace of course has implications for .equals
+// and .join, not just .tokenize.
+
+// This regex does NOT fully implement the tokenization rules described above.
+// Instead, it gives runs of whitespace their own "token". The tokenize method
+// then handles stitching whitespace tokens onto adjacent word or punctuation
+// tokens.
+const tokenizeIncludingWhitespace = new RegExp(`[${extendedWordChars}]+|\\s+|[^${extendedWordChars}]`, 'ug');
export const wordDiff = new Diff();
wordDiff.equals = function(left, right, options) {
@@ -34,24 +54,225 @@ wordDiff.equals = function(left, right, options) {
left = left.toLowerCase();
right = right.toLowerCase();
}
- // The comparisons to the empty string are needed PURELY to signal to
- // buildValues that the whitespace token should be ignored. The empty string
- // will never be a token (removeEmpty removes it) but buildValues uses empty
- // string comparisons to test for ignored tokens and we need to handle that
- // query here.
- const leftIsWhitespace = (left === '' || (/^\s+$/).test(left));
- const rightIsWhitespace = (right === '' || (/^\s+$/).test(right));
- return left === right || (options.ignoreWhitespace && leftIsWhitespace && rightIsWhitespace);
+
+ return left.trim() === right.trim();
};
+
wordDiff.tokenize = function(value) {
- return value.match(tokenizeRegex) || [];
+ let parts = value.match(tokenizeIncludingWhitespace) || [];
+ const tokens = [];
+ let prevPart = null;
+ parts.forEach(part => {
+ if ((/\s/).test(part)) {
+ if (prevPart == null) {
+ tokens.push(part);
+ } else {
+ tokens.push(tokens.pop() + part);
+ }
+ } else if ((/\s/).test(prevPart)) {
+ if (tokens[tokens.length - 1] == prevPart) {
+ tokens.push(tokens.pop() + part);
+ } else {
+ tokens.push(prevPart + part);
+ }
+ } else {
+ tokens.push(part);
+ }
+
+ prevPart = part;
+ });
+ return tokens;
+};
+
+wordDiff.join = function(tokens) {
+ // Tokens being joined here will always have appeared consecutively in the
+ // same text, so we can simply strip off the leading whitespace from all the
+ // tokens except the first (and except any whitespace-only tokens - but such
+ // a token will always be the first and only token anyway) and then join them
+ // and the whitespace around words and punctuation will end up correct.
+ return tokens.map((token, i) => {
+ if (i == 0) {
+ return token;
+ } else {
+ return token.replace((/^\s+/), '');
+ }
+ }).join('');
+};
+
+wordDiff.postProcess = function(changes, options) {
+ if (!changes || options.oneChangePerToken) {
+ return changes;
+ }
+
+ let lastKeep = null;
+ // Change objects representing any insertion or deletion since the last
+ // "keep" change object. There can be at most one of each.
+ let insertion = null;
+ let deletion = null;
+ changes.forEach(change => {
+ if (change.added) {
+ insertion = change;
+ } else if (change.removed) {
+ deletion = change;
+ } else {
+ if (insertion || deletion) { // May be false at start of text
+ dedupeWhitespaceInChangeObjects(lastKeep, deletion, insertion, change);
+ }
+ lastKeep = change;
+ insertion = null;
+ deletion = null;
+ }
+ });
+ if (insertion || deletion) {
+ dedupeWhitespaceInChangeObjects(lastKeep, deletion, insertion, null);
+ }
+ return changes;
};
export function diffWords(oldStr, newStr, options) {
- options = generateOptions(options, {ignoreWhitespace: true});
+ // This option has never been documented and never will be (it's clearer to
+ // just call `diffWordsWithSpace` directly if you need that behavior), but
+ // has existed in jsdiff for a long time, so we retain support for it here
+ // for the sake of backwards compatibility.
+ if (options?.ignoreWhitespace != null && !options.ignoreWhitespace) {
+ return diffWordsWithSpace(oldStr, newStr, options);
+ }
+
return wordDiff.diff(oldStr, newStr, options);
}
+function dedupeWhitespaceInChangeObjects(startKeep, deletion, insertion, endKeep) {
+ // Before returning, we tidy up the leading and trailing whitespace of the
+ // change objects to eliminate cases where trailing whitespace in one object
+ // is repeated as leading whitespace in the next.
+ // Below are examples of the outcomes we want here to explain the code.
+ // I=insert, K=keep, D=delete
+ // 1. diffing 'foo bar baz' vs 'foo baz'
+ // Prior to cleanup, we have K:'foo ' D:' bar ' K:' baz'
+ // After cleanup, we want: K:'foo ' D:'bar ' K:'baz'
+ //
+ // 2. Diffing 'foo bar baz' vs 'foo qux baz'
+ // Prior to cleanup, we have K:'foo ' D:' bar ' I:' qux ' K:' baz'
+ // After cleanup, we want K:'foo ' D:'bar' I:'qux' K:' baz'
+ //
+ // 3. Diffing 'foo\nbar baz' vs 'foo baz'
+ // Prior to cleanup, we have K:'foo ' D:'\nbar ' K:' baz'
+ // After cleanup, we want K'foo' D:'\nbar' K:' baz'
+ //
+ // 4. Diffing 'foo baz' vs 'foo\nbar baz'
+ // Prior to cleanup, we have K:'foo\n' I:'\nbar ' K:' baz'
+ // After cleanup, we ideally want K'foo' I:'\nbar' K:' baz'
+ // but don't actually manage this currently (the pre-cleanup change
+ // objects don't contain enough information to make it possible).
+ //
+ // 5. Diffing 'foo bar baz' vs 'foo baz'
+ // Prior to cleanup, we have K:'foo ' D:' bar ' K:' baz'
+ // After cleanup, we want K:'foo ' D:' bar ' K:'baz'
+ //
+ // Our handling is unavoidably imperfect in the case where there's a single
+ // indel between keeps and the whitespace has changed. For instance, consider
+ // diffing 'foo\tbar\nbaz' vs 'foo baz'. Unless we create an extra change
+ // object to represent the insertion of the space character (which isn't even
+ // a token), we have no way to avoid losing information about the texts'
+ // original whitespace in the result we return. Still, we do our best to
+ // output something that will look sensible if we e.g. print it with
+ // insertions in green and deletions in red.
+
+ // Between two "keep" change objects (or before the first or after the last
+ // change object), we can have either:
+ // * A "delete" followed by an "insert"
+ // * Just an "insert"
+ // * Just a "delete"
+ // We handle the three cases separately.
+ if (deletion && insertion) {
+ const oldWsPrefix = deletion.value.match(/^\s*/)[0];
+ const oldWsSuffix = deletion.value.match(/\s*$/)[0];
+ const newWsPrefix = insertion.value.match(/^\s*/)[0];
+ const newWsSuffix = insertion.value.match(/\s*$/)[0];
+
+ if (startKeep) {
+ const commonWsPrefix = longestCommonPrefix(oldWsPrefix, newWsPrefix);
+ startKeep.value = replaceSuffix(startKeep.value, newWsPrefix, commonWsPrefix);
+ deletion.value = removePrefix(deletion.value, commonWsPrefix);
+ insertion.value = removePrefix(insertion.value, commonWsPrefix);
+ }
+ if (endKeep) {
+ const commonWsSuffix = longestCommonSuffix(oldWsSuffix, newWsSuffix);
+ endKeep.value = replacePrefix(endKeep.value, newWsSuffix, commonWsSuffix);
+ deletion.value = removeSuffix(deletion.value, commonWsSuffix);
+ insertion.value = removeSuffix(insertion.value, commonWsSuffix);
+ }
+ } else if (insertion) {
+ // The whitespaces all reflect what was in the new text rather than
+ // the old, so we essentially have no information about whitespace
+ // insertion or deletion. We just want to dedupe the whitespace.
+ // We do that by having each change object keep its trailing
+ // whitespace and deleting duplicate leading whitespace where
+ // present.
+ if (startKeep) {
+ insertion.value = insertion.value.replace(/^\s*/, '');
+ }
+ if (endKeep) {
+ endKeep.value = endKeep.value.replace(/^\s*/, '');
+ }
+ // otherwise we've got a deletion and no insertion
+ } else if (startKeep && endKeep) {
+ const newWsFull = endKeep.value.match(/^\s*/)[0],
+ delWsStart = deletion.value.match(/^\s*/)[0],
+ delWsEnd = deletion.value.match(/\s*$/)[0];
+
+ // Any whitespace that comes straight after startKeep in both the old and
+ // new texts, assign to startKeep and remove from the deletion.
+ const newWsStart = longestCommonPrefix(newWsFull, delWsStart);
+ deletion.value = removePrefix(deletion.value, newWsStart);
+
+ // Any whitespace that comes straight before endKeep in both the old and
+ // new texts, and hasn't already been assigned to startKeep, assign to
+ // endKeep and remove from the deletion.
+ const newWsEnd = longestCommonSuffix(
+ removePrefix(newWsFull, newWsStart),
+ delWsEnd
+ );
+ deletion.value = removeSuffix(deletion.value, newWsEnd);
+ endKeep.value = replacePrefix(endKeep.value, newWsFull, newWsEnd);
+
+ // If there's any whitespace from the new text that HASN'T already been
+ // assigned, assign it to the start:
+ startKeep.value = replaceSuffix(
+ startKeep.value,
+ newWsFull,
+ newWsFull.slice(0, newWsFull.length - newWsEnd.length)
+ );
+ } else if (endKeep) {
+ // We are at the start of the text. Preserve all the whitespace on
+ // endKeep, and just remove whitespace from the end of deletion to the
+ // extent that it overlaps with the start of endKeep.
+ const endKeepWsPrefix = endKeep.value.match(/^\s*/)[0];
+ const deletionWsSuffix = deletion.value.match(/\s*$/)[0];
+ const overlap = maximumOverlap(deletionWsSuffix, endKeepWsPrefix);
+ deletion.value = removeSuffix(deletion.value, overlap);
+ } else if (startKeep) {
+ // We are at the END of the text. Preserve all the whitespace on
+ // startKeep, and just remove whitespace from the start of deletion to
+ // the extent that it overlaps with the end of startKeep.
+ const startKeepWsSuffix = startKeep.value.match(/\s*$/)[0];
+ const deletionWsPrefix = deletion.value.match(/^\s*/)[0];
+ const overlap = maximumOverlap(startKeepWsSuffix, deletionWsPrefix);
+ deletion.value = removePrefix(deletion.value, overlap);
+ }
+}
+
+
+export const wordWithSpaceDiff = new Diff();
+wordWithSpaceDiff.tokenize = function(value) {
+ // Slightly different to the tokenizeIncludingWhitespace regex used above in
+ // that this one treats each individual newline as a distinct tokens, rather
+ // than merging them into other surrounding whitespace. This was requested
+ // in https://github.com/kpdecker/jsdiff/issues/180 &
+ // https://github.com/kpdecker/jsdiff/issues/211
+ const regex = new RegExp(`(\\r?\\n)|[${extendedWordChars}]+|[^\\S\\n\\r]+|[^${extendedWordChars}]`, 'ug');
+ return value.match(regex) || [];
+};
export function diffWordsWithSpace(oldStr, newStr, options) {
- return wordDiff.diff(oldStr, newStr, options);
+ return wordWithSpaceDiff.diff(oldStr, newStr, options);
}
diff --git a/src/util/string.js b/src/util/string.js
new file mode 100644
index 000000000..bcb0ae92b
--- /dev/null
+++ b/src/util/string.js
@@ -0,0 +1,88 @@
+export function longestCommonPrefix(str1, str2) {
+ let i;
+ for (i = 0; i < str1.length && i < str2.length; i++) {
+ if (str1[i] != str2[i]) {
+ return str1.slice(0, i);
+ }
+ }
+ return str1.slice(0, i);
+}
+
+export function longestCommonSuffix(str1, str2) {
+ let i;
+
+ // Unlike longestCommonPrefix, we need a special case to handle all scenarios
+ // where we return the empty string since str1.slice(-0) will return the
+ // entire string.
+ if (!str1 || !str2 || str1[str1.length - 1] != str2[str2.length - 1]) {
+ return '';
+ }
+
+ for (i = 0; i < str1.length && i < str2.length; i++) {
+ if (str1[str1.length - (i + 1)] != str2[str2.length - (i + 1)]) {
+ return str1.slice(-i);
+ }
+ }
+ return str1.slice(-i);
+}
+
+export function replacePrefix(string, oldPrefix, newPrefix) {
+ if (string.slice(0, oldPrefix.length) != oldPrefix) {
+ throw Error(`string ${JSON.stringify(string)} doesn't start with prefix ${JSON.stringify(oldPrefix)}; this is a bug`);
+ }
+ return newPrefix + string.slice(oldPrefix.length);
+}
+
+export function replaceSuffix(string, oldSuffix, newSuffix) {
+ if (!oldSuffix) {
+ return string + newSuffix;
+ }
+
+ if (string.slice(-oldSuffix.length) != oldSuffix) {
+ throw Error(`string ${JSON.stringify(string)} doesn't end with suffix ${JSON.stringify(oldSuffix)}; this is a bug`);
+ }
+ return string.slice(0, -oldSuffix.length) + newSuffix;
+}
+
+export function removePrefix(string, oldPrefix) {
+ return replacePrefix(string, oldPrefix, '');
+}
+
+export function removeSuffix(string, oldSuffix) {
+ return replaceSuffix(string, oldSuffix, '');
+}
+
+export function maximumOverlap(string1, string2) {
+ return string2.slice(0, overlapCount(string1, string2));
+}
+
+// Nicked from https://stackoverflow.com/a/60422853/1709587
+function overlapCount(a, b) {
+ // Deal with cases where the strings differ in length
+ let startA = 0;
+ if (a.length > b.length) { startA = a.length - b.length; }
+ let endB = b.length;
+ if (a.length < b.length) { endB = a.length; }
+ // Create a back-reference for each index
+ // that should be followed in case of a mismatch.
+ // We only need B to make these references:
+ let map = Array(endB);
+ let k = 0; // Index that lags behind j
+ map[0] = 0;
+ for (let j = 1; j < endB; j++) {
+ if (b[j] == b[k]) {
+ map[j] = map[k]; // skip over the same character (optional optimisation)
+ } else {
+ map[j] = k;
+ }
+ while (k > 0 && b[j] != b[k]) { k = map[k]; }
+ if (b[j] == b[k]) { k++; }
+ }
+ // Phase 2: use these references while iterating over A
+ k = 0;
+ for (let i = startA; i < a.length; i++) {
+ while (k > 0 && a[i] != b[k]) { k = map[k]; }
+ if (a[i] == b[k]) { k++; }
+ }
+ return k;
+}
diff --git a/test/convert/dmp.js b/test/convert/dmp.js
index f6ab6965d..c86c08423 100644
--- a/test/convert/dmp.js
+++ b/test/convert/dmp.js
@@ -1,12 +1,12 @@
import {convertChangesToDMP} from '../../lib/convert/dmp';
-import {diffWords} from '../../lib/diff/word';
+import {diffChars} from '../../lib/diff/character';
import {expect} from 'chai';
describe('convertToDMP', function() {
it('should output diff-match-patch format', function() {
- const diffResult = diffWords('New Value ', 'New ValueMoreData ');
+ const diffResult = diffChars('New Value ', 'New ValueMoreData ');
- expect(convertChangesToDMP(diffResult)).to.eql([[0, 'New '], [-1, 'Value'], [1, 'ValueMoreData'], [0, ' ']]);
+ expect(convertChangesToDMP(diffResult)).to.eql([[0, 'New '], [1, ' '], [0, 'Value'], [1, 'MoreData'], [0, ' '], [-1, ' ']]);
});
});
diff --git a/test/diff/line.js b/test/diff/line.js
index 894514ba7..1461b264d 100644
--- a/test/diff/line.js
+++ b/test/diff/line.js
@@ -157,6 +157,15 @@ describe('diff/line', function() {
expect(convertChangesToXML(diffResult4)).to.equal('line\n value\nline');
});
+ it('should not ignore the insertion or deletion of lines of whitespace at the end', function() {
+ const finalChange = diffLines('foo\nbar\n', 'foo\nbar\n \n \n \n', {ignoreWhitespace: true}).pop();
+ expect(finalChange.count).to.equal(3);
+ expect(finalChange.added).to.equal(true);
+
+ const finalChange2 = diffLines('foo\nbar\n\n', 'foo\nbar\n', {ignoreWhitespace: true}).pop();
+ expect(finalChange2.removed).to.equal(true);
+ });
+
it('should keep leading and trailing whitespace in the output', function() {
function stringify(value) {
return JSON.stringify(value, null, 2);
@@ -204,6 +213,19 @@ describe('diff/line', function() {
);
expect(convertChangesToXML(diffResult)).to.equal('line1\nline2\n \n\nline4\n \n');
});
+
+ it('supports async mode by passing a function as the options argument', function(done) {
+ diffTrimmedLines(
+ 'line\r\nold value \r\nline',
+ 'line \r\nnew value\r\nline',
+ function(diffResult) {
+ expect(convertChangesToXML(diffResult)).to.equal(
+ 'line \r\nold value \r\nnew value\r\nline'
+ );
+ done();
+ }
+ );
+ });
});
describe('#diffLinesNL', function() {
diff --git a/test/diff/word.js b/test/diff/word.js
index ee055561c..708ea5552 100644
--- a/test/diff/word.js
+++ b/test/diff/word.js
@@ -5,41 +5,26 @@ import {expect} from 'chai';
describe('WordDiff', function() {
describe('#tokenize', function() {
- it('should give words, punctuation marks, newlines, and runs of whitespace their own token', function() {
+ it('should give each word & punctuation mark its own token, including leading and trailing whitespace', function() {
expect(
wordDiff.tokenize(
'foo bar baz jurídica wir üben bla\t\t \txyzáxyz \n\n\n animá-los\r\n\r\n(wibbly wobbly)().'
)
).to.deep.equal([
- 'foo',
- ' ',
- 'bar',
- ' ',
- 'baz',
- ' ',
- 'jurídica',
- ' ',
- 'wir',
- ' ',
- 'üben',
- ' ',
- 'bla',
- '\t\t \t',
- 'xyzáxyz',
- ' ',
- '\n',
- '\n',
- '\n',
- ' ',
- 'animá',
+ 'foo ',
+ ' bar ',
+ ' baz ',
+ ' jurídica ',
+ ' wir ',
+ ' üben ',
+ ' bla\t\t \t',
+ '\t\t \txyzáxyz \n\n\n ',
+ ' \n\n\n animá',
'-',
- 'los',
- '\r\n',
- '\r\n',
- '(',
- 'wibbly',
- ' ',
- 'wobbly',
+ 'los\r\n\r\n',
+ '\r\n\r\n(',
+ 'wibbly ',
+ ' wobbly',
')',
'(',
')',
@@ -49,29 +34,111 @@ describe('WordDiff', function() {
});
describe('#diffWords', function() {
- it('should diff whitespace', function() {
- const diffResult = diffWords('New Value', 'New ValueMoreData');
- expect(convertChangesToXML(diffResult)).to.equal('New ValueValueMoreData');
+ it("should ignore whitespace changes between tokens that aren't added or deleted", function() {
+ const diffResult = diffWords('New Value', 'New \n \t Value');
+ expect(convertChangesToXML(diffResult)).to.equal('New \n \t Value');
});
- it('should diff multiple whitespace values', function() {
- const diffResult = diffWords('New Value ', 'New ValueMoreData ');
- expect(convertChangesToXML(diffResult)).to.equal('New ValueValueMoreData ');
- });
+ describe('whitespace changes that border inserted/deleted tokens should be included in the diff as far as is possible...', function() {
+ it('(add+del at end of text)', function() {
+ const diffResult = diffWords('New Value ', 'New ValueMoreData ');
+ expect(convertChangesToXML(diffResult)).to.equal('New Value ValueMoreData ');
+ });
+
+ it('(add+del in middle of text)', function() {
+ const diffResult = diffWords('New Value End', 'New ValueMoreData End');
+ expect(convertChangesToXML(diffResult)).to.equal('New Value ValueMoreData End');
+ });
+
+ it('(add+del at start of text)', function() {
+ const diffResult = diffWords('\tValue End', ' ValueMoreData End');
+ expect(convertChangesToXML(diffResult)).to.equal('\tValue ValueMoreData End');
+ });
+
+ it('(add at start of text)', function() {
+ const diffResult = diffWords('\t Value', 'More Value');
+ // Preferable would be:
+ // 'More Value'
+ // But this is hard to achieve without adding something like the
+ // .oldValue property I contemplate in
+ // https://github.com/kpdecker/jsdiff/pull/219#issuecomment-1858246490
+ // to change objects returned by the base diffing algorithm. The CO
+ // cleanup done by diffWords simply doesn't have enough information to
+ // return the ideal result otherwise.
+ expect(convertChangesToXML(diffResult)).to.equal('More Value');
+ });
+
+ it('(del at start of text)', function() {
+ const diffResult = diffWords('More Value', '\t Value');
+ expect(convertChangesToXML(diffResult)).to.equal('More \t Value');
+ });
+
+ it('(add in middle of text)', function() {
+ const diffResult = diffWords('Even Value', 'Even More Value');
+ // Preferable would be:
+ // 'Even More Value'
+ // But this is hard to achieve without adding something like the
+ // .oldValue property I contemplate in
+ // https://github.com/kpdecker/jsdiff/pull/219#issuecomment-1858246490
+ // to change objects returned by the base diffing algorithm. The CO
+ // cleanup done by diffWords simply doesn't have enough information to
+ // return the ideal result otherwise.
+ expect(convertChangesToXML(diffResult)).to.equal('Even More Value');
+ });
+
+ it('(del in middle of text)', function() {
+ const diffResult = diffWords('Even More Value', 'Even Value');
+ expect(convertChangesToXML(diffResult)).to.equal('Even More Value');
+
+ // Rules around how to split up the whitespace between the start and
+ // end "keep" change objects are subtle, as shown by the three examples
+ // below:
+ const diffResult2 = diffWords('foo\nbar baz', 'foo baz');
+ expect(convertChangesToXML(diffResult2)).to.equal('foo\nbar baz');
+
+ const diffResult3 = diffWords('foo bar baz', 'foo baz');
+ expect(convertChangesToXML(diffResult3)).to.equal('foo bar baz');
+
+ const diffResult4 = diffWords('foo\nbar baz', 'foo\n baz');
+ expect(convertChangesToXML(diffResult4)).to.equal('foo\nbar baz');
+ });
+
+ it('(add at end of text)', function() {
+ const diffResult = diffWords('Foo\n', 'Foo Bar\n');
+ // Preferable would be:
+ // 'Foo Bar\n'
+ // But this is hard to achieve without adding something like the
+ // .oldValue property I contemplate in
+ // https://github.com/kpdecker/jsdiff/pull/219#issuecomment-1858246490
+ // to change objects returned by the base diffing algorithm. The CO
+ // cleanup done by diffWords simply doesn't have enough information to
+ // return the ideal result otherwise.
+ expect(convertChangesToXML(diffResult)).to.equal('Foo Bar\n');
+ });
- // Diff on word boundary
- it('should diff on word boundaries', function() {
- let diffResult = diffWords('New :Value:Test', 'New ValueMoreData ');
- expect(convertChangesToXML(diffResult)).to.equal('New :Value:TestValueMoreData ');
+ it('(del at end of text)', function() {
+ const diffResult = diffWords('Foo Bar', 'Foo ');
+ expect(convertChangesToXML(diffResult)).to.equal('Foo Bar');
+ });
+ });
- diffResult = diffWords('New Value:Test', 'New Value:MoreData ');
- expect(convertChangesToXML(diffResult)).to.equal('New Value:TestMoreData ');
+ it('should skip postprocessing of change objects in one-change-object-per-token mode', function() {
+ const diffResult = diffWords('Foo Bar', 'Foo Baz', {oneChangePerToken: true});
+ expect(convertChangesToXML(diffResult)).to.equal(
+ 'Foo Bar Baz'
+ );
+ });
- diffResult = diffWords('New Value-Test', 'New Value:MoreData ');
- expect(convertChangesToXML(diffResult)).to.equal('New Value-Test:MoreData ');
+ it('should respect options.ignoreCase', function() {
+ const diffResult = diffWords('foo bar baz', 'FOO BAR QUX', {ignoreCase: true});
+ expect(convertChangesToXML(diffResult)).to.equal(
+ 'FOO BAR bazQUX'
+ );
+ });
- diffResult = diffWords('New Value', 'New Value:MoreData ');
- expect(convertChangesToXML(diffResult)).to.equal('New Value:MoreData ');
+ it('should treat punctuation characters as tokens', function() {
+ let diffResult = diffWords('New:Value:Test', 'New,Value,More,Data ');
+ expect(convertChangesToXML(diffResult)).to.equal('New:,Value:Test,More,Data ');
});
// Diff without changes
@@ -99,35 +166,24 @@ describe('WordDiff', function() {
expect(convertChangesToXML(diffResult)).to.equal('New Value');
});
- // With without anchor (the Heckel algorithm error case)
- it('should diff when there is no anchor value', function() {
- const diffResult = diffWords('New Value New Value', 'Value Value New New');
- expect(convertChangesToXML(diffResult)).to.equal('NewValue Value New ValueNew');
- });
-
it('should include count with identity cases', function() {
expect(diffWords('foo', 'foo')).to.eql([{value: 'foo', count: 1, removed: false, added: false}]);
- expect(diffWords('foo bar', 'foo bar')).to.eql([{value: 'foo bar', count: 3, removed: false, added: false}]);
+ expect(diffWords('foo bar', 'foo bar')).to.eql([{value: 'foo bar', count: 2, removed: false, added: false}]);
});
it('should include count with empty cases', function() {
expect(diffWords('foo', '')).to.eql([{value: 'foo', count: 1, added: false, removed: true}]);
- expect(diffWords('foo bar', '')).to.eql([{value: 'foo bar', count: 3, added: false, removed: true}]);
+ expect(diffWords('foo bar', '')).to.eql([{value: 'foo bar', count: 2, added: false, removed: true}]);
expect(diffWords('', 'foo')).to.eql([{value: 'foo', count: 1, added: true, removed: false}]);
- expect(diffWords('', 'foo bar')).to.eql([{value: 'foo bar', count: 3, added: true, removed: false}]);
+ expect(diffWords('', 'foo bar')).to.eql([{value: 'foo bar', count: 2, added: true, removed: false}]);
});
it('should ignore whitespace', function() {
- expect(diffWords('hase igel fuchs', 'hase igel fuchs')).to.eql([{ count: 5, value: 'hase igel fuchs', removed: false, added: false }]);
- expect(diffWords('hase igel fuchs', 'hase igel fuchs\n')).to.eql([{ count: 5, value: 'hase igel fuchs\n', removed: false, added: false }]);
- expect(diffWords('hase igel fuchs\n', 'hase igel fuchs')).to.eql([{ count: 5, value: 'hase igel fuchs\n', removed: false, added: false }]);
- expect(diffWords('hase igel fuchs', 'hase igel\nfuchs')).to.eql([{ count: 5, value: 'hase igel\nfuchs', removed: false, added: false }]);
- expect(diffWords('hase igel\nfuchs', 'hase igel fuchs')).to.eql([{ count: 5, value: 'hase igel fuchs', removed: false, added: false }]);
- });
-
- it('should diff whitespace with flag', function() {
- const diffResult = diffWords('New Value', 'New ValueMoreData', {ignoreWhitespace: false});
- expect(convertChangesToXML(diffResult)).to.equal('New Value ValueMoreData');
+ expect(diffWords('hase igel fuchs', 'hase igel fuchs')).to.eql([{ count: 3, value: 'hase igel fuchs', removed: false, added: false }]);
+ expect(diffWords('hase igel fuchs', 'hase igel fuchs\n')).to.eql([{ count: 3, value: 'hase igel fuchs\n', removed: false, added: false }]);
+ expect(diffWords('hase igel fuchs\n', 'hase igel fuchs')).to.eql([{ count: 3, value: 'hase igel fuchs', removed: false, added: false }]);
+ expect(diffWords('hase igel fuchs', 'hase igel\nfuchs')).to.eql([{ count: 3, value: 'hase igel\nfuchs', removed: false, added: false }]);
+ expect(diffWords('hase igel\nfuchs', 'hase igel fuchs')).to.eql([{ count: 3, value: 'hase igel fuchs', removed: false, added: false }]);
});
it('should diff with only whitespace', function() {
@@ -137,72 +193,21 @@ describe('WordDiff', function() {
diffResult = diffWords(' ', '');
expect(convertChangesToXML(diffResult)).to.equal(' ');
});
- });
- describe('#diffWords - async', function() {
- it('should diff whitespace', function(done) {
- diffWords('New Value', 'New ValueMoreData', function(diffResult) {
- expect(convertChangesToXML(diffResult)).to.equal('New ValueValueMoreData');
+ it('should support async mode', function(done) {
+ diffWords('New Value', 'New \n \t Value', function(diffResult) {
+ expect(convertChangesToXML(diffResult)).to.equal('New \n \t Value');
done();
});
});
- it('should diff multiple whitespace values', function(done) {
- diffWords('New Value ', 'New ValueMoreData ', function(diffResult) {
- expect(convertChangesToXML(diffResult)).to.equal('New ValueValueMoreData ');
- done();
- });
- });
-
- // Diff on word boundary
- it('should diff on word boundaries', function(done) {
- diffWords('New :Value:Test', 'New ValueMoreData ', function(diffResult) {
- expect(convertChangesToXML(diffResult)).to.equal('New :Value:TestValueMoreData ');
- done();
- });
- });
-
- // Diff without changes
- it('should handle identity', function(done) {
- diffWords('New Value', 'New Value', function(diffResult) {
- expect(convertChangesToXML(diffResult)).to.equal('New Value');
- done();
- });
- });
- it('should handle empty', function(done) {
- diffWords('', '', function(diffResult) {
- expect(convertChangesToXML(diffResult)).to.equal('');
- done();
- });
- });
- it('should diff has identical content', function(done) {
- diffWords('New Value', 'New Value', function(diffResult) {
- expect(convertChangesToXML(diffResult)).to.equal('New Value');
- done();
- });
- });
-
- // Empty diffs
- it('should diff empty new content', function(done) {
- diffWords('New Value', '', function(diffResult) {
- expect(diffResult.length).to.equal(1);
- expect(convertChangesToXML(diffResult)).to.equal('New Value');
- done();
- });
- });
- it('should diff empty old content', function(done) {
- diffWords('', 'New Value', function(diffResult) {
- expect(convertChangesToXML(diffResult)).to.equal('New Value');
- done();
- });
- });
-
- // With without anchor (the Heckel algorithm error case)
- it('should diff when there is no anchor value', function(done) {
- diffWords('New Value New Value', 'Value Value New New', function(diffResult) {
- expect(convertChangesToXML(diffResult)).to.equal('NewValue Value New ValueNew');
- done();
- });
+ it('calls #diffWordsWithSpace if you pass ignoreWhitespace: false', function() {
+ const diffResult = diffWords(
+ 'foo bar',
+ 'foo\tbar',
+ {ignoreWhitespace: false}
+ );
+ expect(convertChangesToXML(diffResult)).to.equal('foo \tbar');
});
});
@@ -263,6 +268,17 @@ describe('WordDiff', function() {
});
});
+ // With without anchor (the Heckel algorithm error case)
+ it('should diff when there is no anchor value', function() {
+ const diffResult = diffWordsWithSpace('New Value New Value', 'Value Value New New');
+ expect(convertChangesToXML(diffResult)).to.equal('NewValue Value New ValueNew');
+ });
+
+ it('should handle empty', function() {
+ const diffResult = diffWordsWithSpace('', '');
+ expect(convertChangesToXML(diffResult)).to.equal('');
+ });
+
describe('case insensitivity', function() {
it("is considered when there's a difference", function() {
const diffResult = diffWordsWithSpace('new value', 'New ValueMoreData', {ignoreCase: true});
diff --git a/test/util/string.js b/test/util/string.js
new file mode 100644
index 000000000..df151b322
--- /dev/null
+++ b/test/util/string.js
@@ -0,0 +1,90 @@
+import {longestCommonPrefix, longestCommonSuffix, replacePrefix, replaceSuffix, removePrefix, removeSuffix, maximumOverlap} from '../../lib/util/string';
+import {expect} from 'chai';
+
+describe('#longestCommonPrefix', function() {
+ it('finds the longest common prefix', function() {
+ expect(longestCommonPrefix('food', 'foolish')).to.equal('foo');
+ expect(longestCommonPrefix('foolish', 'food')).to.equal('foo');
+ expect(longestCommonPrefix('foolish', 'foo')).to.equal('foo');
+ expect(longestCommonPrefix('foo', 'foolish')).to.equal('foo');
+ expect(longestCommonPrefix('foo', '')).to.equal('');
+ expect(longestCommonPrefix('', 'foo')).to.equal('');
+ expect(longestCommonPrefix('', '')).to.equal('');
+ expect(longestCommonPrefix('foo', 'bar')).to.equal('');
+ });
+});
+
+describe('#longestCommonSuffix', function() {
+ it('finds the longest common suffix', function() {
+ expect(longestCommonSuffix('bumpy', 'grumpy')).to.equal('umpy');
+ expect(longestCommonSuffix('grumpy', 'bumpy')).to.equal('umpy');
+ expect(longestCommonSuffix('grumpy', 'umpy')).to.equal('umpy');
+ expect(longestCommonSuffix('umpy', 'grumpy')).to.equal('umpy');
+ expect(longestCommonSuffix('foo', '')).to.equal('');
+ expect(longestCommonSuffix('', 'foo')).to.equal('');
+ expect(longestCommonSuffix('', '')).to.equal('');
+ expect(longestCommonSuffix('foo', 'bar')).to.equal('');
+ });
+});
+
+describe('#replacePrefix', function() {
+ it('replaces a prefix on a string with a different prefix', function() {
+ expect((replacePrefix('food', 'foo', 'gla'))).to.equal('glad');
+ expect((replacePrefix('food', '', 'good '))).to.equal('good food');
+ });
+
+ it("throws if the prefix to remove isn't present", function() {
+ // eslint-disable-next-line dot-notation
+ expect(() => replacePrefix('food', 'drin', 'goo')).to.throw();
+ });
+});
+
+describe('#replaceSuffix', function() {
+ it('replaces a suffix on a string with a different suffix', function() {
+ expect((replaceSuffix('bangle', 'gle', 'jo'))).to.equal('banjo');
+ expect((replaceSuffix('bun', '', 'gle'))).to.equal('bungle');
+ });
+
+ it("throws if the suffix to remove isn't present", function() {
+ // eslint-disable-next-line dot-notation
+ expect(() => replaceSuffix('food', 'ool', 'ondle')).to.throw();
+ });
+});
+
+describe('#removePrefix', function() {
+ it('removes a prefix', function() {
+ expect(removePrefix('inconceivable', 'in')).to.equal('conceivable');
+ expect(removePrefix('inconceivable', '')).to.equal('inconceivable');
+ expect(removePrefix('inconceivable', 'inconceivable')).to.equal('');
+ });
+
+ it("throws if the prefix to remove isn't present", function() {
+ // eslint-disable-next-line dot-notation
+ expect(() => removePrefix('food', 'dr')).to.throw();
+ });
+});
+
+describe('#removeSuffix', function() {
+ it('removes a suffix', function() {
+ expect(removeSuffix('counterfactual', 'factual')).to.equal('counter');
+ expect(removeSuffix('counterfactual', '')).to.equal('counterfactual');
+ expect(removeSuffix('counterfactual', 'counterfactual')).to.equal('');
+ });
+
+ it("throws if the suffix to remove isn't present", function() {
+ // eslint-disable-next-line dot-notation
+ expect(() => removeSuffix('food', 'dr')).to.throw();
+ });
+});
+
+describe('#maximumOverlap', function() {
+ it('finds the maximum overlap between the end of one string and the start of the other', function() {
+ expect(maximumOverlap('qwertyuiop', 'uiopasdfgh')).to.equal('uiop');
+ expect(maximumOverlap('qwertyuiop', 'qwertyuiop')).to.equal('qwertyuiop');
+ expect(maximumOverlap('qwertyuiop', 'asdfghjkl')).to.equal('');
+ expect(maximumOverlap('qwertyuiop', '')).to.equal('');
+ expect(maximumOverlap('uiopasdfgh', 'qwertyuiop')).to.equal('');
+ expect(maximumOverlap('x ', ' x')).to.equal(' ');
+ expect(maximumOverlap('', '')).to.equal('');
+ });
+});