Skip to content

dlidstrom/duplo-analyser

Use this GitHub action with your project
Add this Action to an existing workflow or create a new one
View on Marketplace

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 

Repository files navigation

🚀 Duplo Analyser - GitHub Action

⚡️ Lightning fast duplicate code detection! Supports all text formats with special handling of comments for common languages.

test

📝 Overview

Duplo Analyser is a GitHub Action for detecting duplicate code blocks in your repository. It scans source files and identifies similar code snippets based on configurable parameters. In case any duplicate blocks are found the action will fail the build.

🔋 This action is powered by Duplo - the fastest (?) duplicate detector on GitHub.

🚀 Usage

Add the following step to your GitHub Actions workflow:

- name: Run Duplo Analyser
  uses: dlidstrom/duplo-analyser@v2
  with:
    directory: '.'
    include-pattern: '.*'
    minimum-lines: "10"
    minimum-line-length: "3"
    max-files: "100"
    ignore-preprocessor-directives: "true"

🔧 Inputs

🔹 Input Name 📝 Description 🏷️ Default
directory 📂 Top directory from which to search for files. Only used with include-pattern. .
include-pattern 🔍 Regular expression for including filenames (case-insensitive). Mutually exlusive with file-list. .*
exclude-pattern 🚫 Regular expression for excluding filenames (case-insensitive). Only used with include-pattern. .^
file-list 📝 File with filenames to analyse. Mutually exclusive with include-pattern. ''
minimum-lines 📏 Minimum number of lines required for duplicate detection 10
minimum-line-length ✂️ Minimum number of characters per line (shorter lines are ignored) 3
max-files 📊 Maximum number of files to report (useful for large duplicate sets) 100
ignore-preprocessor-directives 🛑 Removes preprocessor directives before duplicate detection true
version 📌 Version of Duplo to use v2.0.1

🔄 Example Workflow

🔍 Using regular expressions

name: Detect Duplicate Code
on: [push, pull_request]

jobs:
  duplication-check:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout repository
        uses: actions/checkout@v4

      - name: Run Duplo Analyser
        uses: dlidstrom/duplo-analyser@v2
        with: # optionally override the defaults
          directory: '.'
          include-pattern: '.*'
          minimum-lines: "10"
          minimum-line-length: "3"
          max-files: "100"
          ignore-preprocessor-directives: "true"

Sample include patterns (partial match is sufficient):

  • C/C++: '\.(h|cpp)$'
  • JavaScript: '\.js$' - or any other extension you need

The OR (|) operator only works inside groups (). Excluding files works in the same fashion.

The grep utility is used on all platforms, using posix-extended syntax.

📝 Using file list

name: Detect Duplicate Code
on: [push, pull_request]

jobs:
  duplication-check:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout repository
        uses: actions/checkout@v4

      - name: Run Duplo Analyser
        uses: dlidstrom/duplo-analyser@v2
        with:
          file-list: 'files.lst'

Using the file list option is useful when only analysing a specific subset of files, for example the files that are changed in a PR. This allows for having stricter rules for the files that are actively worked on.

📤 Output

The action prints duplicate code blocks to the workflow logs, allowing you to identify and refactor repeated code.

Sample:

Loading and hashing files ... 2 done.

tests/Quake2/g_chase.c(137)
tests/Quake2/g_chase.c(113)
	int i;
	edict_t *e;
	if (!ent->client->chase_target)
		return;
	i = ent->client->chase_target - g_edicts;
	do {

tests/Quake2/g_chase.c found: 1 block(s)
Configuration:
  Number of files: 1
  Minimal block size: 4
  Minimal characters in line: 3
  Ignore preprocessor directives: 0
  Ignore same filenames: 0

Results:
  Lines of code: 96
  Duplicate lines of code: 6
  Total 1 duplicate block(s) found.

🛠️ How It Works

  1. 🖥️ Platform-Specific Setup:
    • Sets the executable path based on the operating system.
  2. 📦 Caching:
    • Caches the Duplo binary to speed up future runs.
  3. 📥 Downloading Duplo (If Not Cached):
    • Fetches the specified version of Duplo and unzips it.
  4. 📂 File Analysis:
    • Uses find to locate files based on include/exclude patterns.
    • Runs Duplo on the matching files.

📜 License

This action is open-source and available under the MIT License.

About

⚡️ Super fast duplicated blocks finder for GitHub Actions

Resources

License

Stars

Watchers

Forks

Packages

No packages published