Problem
When using Azure Document Intelligence via markitdown,
DocumentAnalysisFeature.FORMULAS is always enabled, even when formula recognition is not required.
This behavior leads to degraded recognition accuracy, especially for documents that do not contain mathematical formulas.
The relevant code is here:
https://github.com/microsoft/markitdown/blob/main/packages/markitdown/src/markitdown/converters/_doc_intel_converter.py#L232
# _doc_intel_converter.py
features=[
DocumentAnalysisFeature.FORMULAS,
DocumentAnalysisFeature.TABLES,
]
Currently, FORMULAS is unconditionally included in the features list, making it impossible to disable.
Steps to Reproduce
- Configure Azure Document Intelligence and enable it in
markitdown
- Analyze a document that does not contain mathematical formulas
- Observe the extracted text / structure quality
- Compare results with and without the
FORMULAS feature enabled
Expected Behavior