Duplo Analyser
ActionsTags
(1)⚡️ Lightning fast duplicate code detection! Supports all text formats with special handling of comments for common languages.
Duplo Analyser is a GitHub Action for detecting duplicate code blocks in your repository. It scans source files and identifies similar code snippets based on configurable parameters. In case any duplicate blocks are found the action will fail the build.
🔋 This action is powered by Duplo - the fastest (?) duplicate detector on GitHub.
Add the following step to your GitHub Actions workflow:
- name: Run Duplo Analyser
uses: dlidstrom/duplo-analyser@v2
with:
directory: '.'
include-pattern: '.*'
minimum-lines: "10"
minimum-line-length: "3"
max-files: "100"
ignore-preprocessor-directives: "true"
🔹 Input Name | 📝 Description | 🏷️ Default |
---|---|---|
directory |
📂 Top directory from which to search for files. Only used with include-pattern . |
. |
include-pattern |
🔍 Regular expression for including filenames (case-insensitive). Mutually exlusive with file-list . |
.* |
exclude-pattern |
🚫 Regular expression for excluding filenames (case-insensitive). Only used with include-pattern . |
.^ |
file-list |
📝 File with filenames to analyse. Mutually exclusive with include-pattern . |
'' |
minimum-lines |
📏 Minimum number of lines required for duplicate detection | 10 |
minimum-line-length |
✂️ Minimum number of characters per line (shorter lines are ignored) | 3 |
max-files |
📊 Maximum number of files to report (useful for large duplicate sets) | 100 |
ignore-preprocessor-directives |
🛑 Removes preprocessor directives before duplicate detection | true |
version |
📌 Version of Duplo to use | v2.0.1 |
name: Detect Duplicate Code
on: [push, pull_request]
jobs:
duplication-check:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Run Duplo Analyser
uses: dlidstrom/duplo-analyser@v2
with: # optionally override the defaults
directory: '.'
include-pattern: '.*'
minimum-lines: "10"
minimum-line-length: "3"
max-files: "100"
ignore-preprocessor-directives: "true"
Sample include patterns (partial match is sufficient):
- C/C++:
'\.(h|cpp)$'
- JavaScript:
'\.js$'
- or any other extension you need
The OR (|
) operator only works inside groups ()
. Excluding files works in the same fashion.
The
grep
utility is used on all platforms, using posix-extended syntax.
name: Detect Duplicate Code
on: [push, pull_request]
jobs:
duplication-check:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Run Duplo Analyser
uses: dlidstrom/duplo-analyser@v2
with:
file-list: 'files.lst'
Using the file list option is useful when only analysing a specific subset of files, for example the files that are changed in a PR. This allows for having stricter rules for the files that are actively worked on.
The action prints duplicate code blocks to the workflow logs, allowing you to identify and refactor repeated code.
Sample:
Loading and hashing files ... 2 done.
tests/Quake2/g_chase.c(137)
tests/Quake2/g_chase.c(113)
int i;
edict_t *e;
if (!ent->client->chase_target)
return;
i = ent->client->chase_target - g_edicts;
do {
tests/Quake2/g_chase.c found: 1 block(s)
Configuration:
Number of files: 1
Minimal block size: 4
Minimal characters in line: 3
Ignore preprocessor directives: 0
Ignore same filenames: 0
Results:
Lines of code: 96
Duplicate lines of code: 6
Total 1 duplicate block(s) found.
- 🖥️ Platform-Specific Setup:
- Sets the executable path based on the operating system.
- 📦 Caching:
- Caches the Duplo binary to speed up future runs.
- 📥 Downloading Duplo (If Not Cached):
- Fetches the specified version of Duplo and unzips it.
- 📂 File Analysis:
- Uses
find
to locate files based on include/exclude patterns. - Runs Duplo on the matching files.
- Uses
This action is open-source and available under the MIT License.
Duplo Analyser is not certified by GitHub. It is provided by a third-party and is governed by separate terms of service, privacy policy, and support documentation.