User Story
As a maintainer,
I want robust test coverage for code blocks containing escaped backticks
so that the Markdown parser reliably handles edge cases without false positives.
Background
The current regex pattern in mdextractor/__init__.py uses r"```(?:\w+\s+)?(.*?)```" with re.DOTALL, which may prematurely close code blocks containing legitimate backticks (e.g., echo "```"). This creates maintenance risks:
- The
test_nested_code_blocks unit test demonstrates incorrect parsing of inner backticks
- Real-world code snippets with escaped backticks could be truncated
- Language specifier detection might interfere with content extraction
Acceptance Criteria
User Story
As a maintainer,
I want robust test coverage for code blocks containing escaped backticks
so that the Markdown parser reliably handles edge cases without false positives.
Background
The current regex pattern in
mdextractor/__init__.pyusesr"```(?:\w+\s+)?(.*?)```"withre.DOTALL, which may prematurely close code blocks containing legitimate backticks (e.g.,echo "```"). This creates maintenance risks:test_nested_code_blocksunit test demonstrates incorrect parsing of inner backticksAcceptance Criteria
tests/test_mdextractor.pyverifying:\``,``, and```` sequences