Skip to content

parse_report_email incorrectly attempts to parse HTML bodies as DMARC reports (missing support for text/html) #626

@sis6326r

Description

@sis6326r

parse_report_email() does not handle text/html MIME parts.
As a result, Microsoft’s DMARC “Preview” emails — which contain an HTML body and no actual DMARC XML in the main message — cause parsedmarc to try parsing the HTML as an aggregate report XML. This results in:

InvalidAggregateReport: Missing field: 'feedback'
ParserError: Message ... is not a valid DMARC report

The error is thrown before actual DMARC attachments (.xml.gz) are processed, so the whole email is incorrectly classified as invalid and moved to “Archive/Invalid”.

Environment

  • Python: 3.x
  • OS: Windows / Linux (reproduced in Docker python:3.13-slim)
  • parsedmarc version: 18.19.0
  • Mail sender: Microsoft enterprise.protection.outlook.com
  • Mail subject examples:
[Preview] Report Domain: example.com Submitter: enterprise.protection.outlook.com Report-ID: ...

Steps to Reproduce

1. Receive a Microsoft DMARC “Preview” email.

These emails contain:

  • an HTML body with a human-readable summary
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<div style="font-family:Segoe UI; font-size:14px;">This is a DMARC aggregate report from Microsoft Corporation. For Emails received between 2025-11-26 00:00:00 UTC to 2025-11-27 00:00:00 UTC.</><br><br>
You're receiving this email because you have included your email address in the 'rua' tag of your DMARC record in DNS for netige.pl. Please remove your email address from the 'rua' tag if you don't want to receive this email.<br><br>
<div style="font-family:Segoe UI; font-size:12px; color:#666666;">Please do not respond to this e-mail. This mailbox is not monitored and you will not receive a response. For any feedback/suggestions, kindly mail to [email protected].<br><br>Microsoft respects your privacy. Review our Online Services 
<a href="https://privacy.microsoft.com/en-us/privacystatement">Privacy Statement</a>.<br>
One Microsoft Way, Redmond, WA, USA 98052.
</>
  • optional attachments with the real DMARC .xml.gz report

2. Run:

parsedmarc.parse_report_email(eml_content)

3. Observe that payload for one MIME part is HTML, e.g.:

<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<div>This is a DMARC aggregate report from Microsoft Corporation...</div>

4. parse_report_email eventually reaches the else: block and tries to parse this HTML as XML › Missing field: 'feedback'.

Expected Behavior

  • parse_report_email should ignore text/html parts of multipart emails.
  • It should keep scanning MIME parts until it finds the real DMARC XML or .xml.gz file.

Actual Behavior

  • The HTML preview body is treated as potential DMARC XML.
  • parse_aggregate_report_xml throws:
InvalidAggregateReport: Missing field: 'feedback'
  • The whole email is incorrectly rejected as invalid.

Root Cause

parse_report_email() is missing a condition for:

elif content_type == "text/html":
    pass

Therefore HTML is sent into the fallback XML/JSON parsing path.

Proposed Fix

Add explicit handling of text/html:

elif content_type == "text/html":
    # HTML bodies (e.g. Microsoft preview messages) are not DMARC reports
    logger.debug("Skipping HTML body in DMARC email preview")
    continue

Why This Fix Is Necessary

Microsoft widely sends DMARC preview emails that contain:

  • HTML summary
  • a zipped DMARC XML report

Without ignoring HTML, parsedmarc rejects valid DMARC report emails.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions