Skip to content

Conversation

@devin-ai-integration
Copy link
Contributor

fix(security): remediate 35 BLOCKER severity SonarQube issues

Summary

This PR fixes all 35 BLOCKER severity issues identified by SonarQube scan in the USPTO-Patent-Public-Data repository. The changes address critical security vulnerabilities, resource leaks, bugs, and code quality issues across 13 files.

Issue Categories Fixed:

  • 🔴 14 Security Vulnerabilities (8 GitHub Actions command injection + 6 XXE)
  • 🟠 11 Bugs (10 resource leaks + 1 infinite loop)
  • 🟡 10 Code Quality Issues (8 test assertions + 1 naming conflict + 1 incorrect operator)

Key Changes:

  1. GitHub Actions Security (.github/workflows/sonarcloud.yml): Sanitized all user-controlled variables to prevent command injection attacks
  2. XXE Protection (5 Java files): Added security features to all SAXReader instances to prevent XML External Entity attacks
  3. Resource Management (ExtractPatent.java, Look.java): Wrapped DumpFile objects in try-with-resources blocks to prevent file descriptor leaks
  4. Bug Fixes: Fixed infinite loop condition in JsonMapper.java and incorrect bitwise operator in TitleRule.java
  5. Test Improvements: Added proper assertions to 8 test methods across 3 test files
  6. Code Quality: Renamed conflicting field name in TransformerCli.java

Review & Testing Checklist for Human

⚠️ This PR contains significant refactoring and security changes that require careful review:

  • Test GitHub Actions workflow - Manually trigger a build to verify workflow changes don't break CI/CD (especially Devin auto-remediation functionality)
  • Verify resource leak fixes - The ExtractPatent.java and Look.java files were substantially refactored. Test CLI tools with various inputs:
    # Test ExtractPatent with APS format
    java -jar BulkDownloader/target/BulkDownloader.jar extract --aps --input sample.zip --output /tmp/output
    
    # Test Look command
    java -jar BulkDownloader/target/BulkDownloader.jar look --input sample.zip --limit 10
  • Validate XXE protections - Confirm XML parsing still works for legitimate use cases while blocking external entities. Test with sample patent XML files from various formats (Redbook, PAP, SGML).
  • Run SonarQube scan - Verify all 35 BLOCKER issues are actually resolved and no new issues were introduced:
    mvn clean verify sonar:sonar
  • Review test assertions - Check if the added assertions in test files are meaningful enough to catch regressions (they're currently quite basic - assertNotNull and simple contains() checks)

Recommended Test Plan:

  1. Trigger GitHub Actions workflow on this PR to verify workflow changes work
  2. Run CLI tools (ExtractPatent, Look) with real patent data files
  3. Run full test suite: mvn clean test
  4. Run SonarQube analysis and verify Quality Gate passes
  5. Check that existing functionality (patent parsing, transformation, extraction) still works correctly

Notes

  • Testing: Unit tests pass (only 4 pre-existing failures unrelated to these changes). No new test failures introduced.
  • XXE Fix Detail: Initially used disallow-doctype-decl feature but it was too restrictive and broke existing tests. Final implementation uses 3 security features: external-general-entities=false, external-parameter-entities=false, load-external-dtd=false.
  • Resource Leak Refactoring: The ExtractPatent.java and Look.java changes involved significant code restructuring to properly scope DumpFile objects within try-with-resources blocks. Logic flow changed but behavior should be equivalent.
  • Jira Documentation: Full details documented in MBA-750

Session Info:

- Fix 8 GitHub Actions command injection vulnerabilities (githubactions:S7630)
  - Sanitize user-controlled inputs using environment variables in workflow

- Fix 6 XML External Entity (XXE) vulnerabilities (java:S2755)
  - Disable external entity processing in all SAXReader instances
  - Files: Dom4JParser.java, CpcMasterReader.java, CpcXmlParser.java, MathML.java

- Fix 10 resource leak bugs (java:S2095)
  - Wrap DumpFile objects in try-with-resources blocks
  - Files: ExtractPatent.java (6 leaks), Look.java (4 leaks)

- Fix 1 infinite loop bug (java:S2189)
  - Correct loop condition in JsonMapper.java

- Fix 8 test assertion issues (java:S2699)
  - Add proper assertions to test methods
  - Files: FormattedTextCustomizeTest.java, JsonMapperTest.java, PatentParserTest.java

- Fix 1 naming conflict (java:S1845)
  - Rename field 'stdout' to 'stdoutEnabled' in TransformerCli.java

- Fix 1 incorrect operator (java:S2178)
  - Change bitwise OR to logical OR in TitleRule.java

All fixes verified with mvn clean test. No new test failures introduced.

Co-Authored-By: Jake Cosme <jake@cognition.ai>
@devin-ai-integration
Copy link
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants