Skip to content

Use MathML attributes for PDFs read in Adobe Acrobat #17984

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 6, 2025

Conversation

NSoiffer
Copy link
Contributor

Link to issue number:

This fixes #17980

Summary of the issue:

For PDF, the code was not grabbing MathML attributes needed for speech.

Description of user facing changes

The speech for math, and rarely the braille (e.g., bevelled fractions in Nemeth), was not always correct due to attributes not being picked up and the defaults being used.

Description of development approach

Unfortunately, the PDF interface does not allow grabbing all the attributes. Instead, one must ask for each attribute individually. Most attributes don't affect speech or braille, so it is not necessary to get them. I looked at what MathCAT used and added those. In particular, the following are picked up:

		id = node.GetID()
		if id:
			yield f' id="{id}"'
		yield getMathMLAttributes(node, ["intent", "arg"])
		if tag == "mi" or tag == "mn" or tag == "mo" or tag == "mtext":
			yield getMathMLAttributes(node, ["mathvariant"])
		elif tag == "mfenced":
			yield getMathMLAttributes(node, ["open", "close", "separators"])
		elif tag == "menclose":
			yield getMathMLAttributes(node, ["notation"])
		elif tag == "annotation-xml" or tag == "annotation":
			yield getMathMLAttributes(node, ["encoding"])
		elif tag == "ms":
			yield getMathMLAttributes(node, ["open", "close"])

Note: intent and arg are new to MathML 4 and are aimed at improving speech for math.

Testing strategy:

There is a test file in the issue. It tests intent. I asked David Carlisle, a LaTeX developer, to generate some (fully tagged) PDF examples that use more than intent (which is what pdftex will generate on its own in some cases. He gave me a sample that displays poorly, but has some attrs hacked into it. There is a log statement (at debug level) that shows the MathML that is gathered up. The MathML picked was correct. The speech was also as expected given the MathML.

I don't know how one would create a unit or system test for this. The issue contains a PDF example which can be tested (issue describes the expected result).

Known issues with pull request:

None.

Code Review Checklist:

  • Documentation:
    • Change log entry
    • User Documentation
    • Developer / Technical Documentation
    • Context sensitive help for GUI changes
  • Testing:
    • Unit tests
    • System (end to end) tests
    • Manual testing
  • UX of all users considered:
    • Speech
    • Braille
    • Low Vision
    • Different web browsers
    • Localization in other languages / culture than English
  • API is compatible with existing add-ons.
  • Security precautions taken.

@coderabbitai summary

@NSoiffer NSoiffer requested a review from a team as a code owner April 18, 2025 04:50
@NSoiffer NSoiffer requested a review from seanbudd April 18, 2025 04:50
@AppVeyorBot
Copy link

See test results for failed build of commit d7c86a33bf

@codeofdusk
Copy link
Contributor

codeofdusk commented Apr 20, 2025

@NSoiffer Should this target beta (i.e. is it a release-blocking bug in #17276)?

@NSoiffer
Copy link
Contributor Author

It is a bug that results in incorrect speech and possibly incorrect braille if one of the attributes for the elements listed above is present. For example, if someone uses

<mi mathvariant="double-struck">C</mi>

then instead of hearing either "the complex numbers" or "double struck cap C", they will just hear "cap C". Similarly, the braille will be wrong as it will be missing a script indicator. Or as in the bug report, they won't hear the equation label as a label ("line 2 with label 2 ...") and I think they will hear the label as just data which is very confusing.

My feeling this is a pretty serious bug and should be part of the beta.

@codeofdusk
Copy link
Contributor

In which case, please change the base branch on GitHub and rebase your branch accordingly.

@SaschaCowley SaschaCowley added the conceptApproved Similar 'triaged' for issues, PR accepted in theory, implementation needs review. label Apr 21, 2025
@seanbudd seanbudd changed the title Pdf math Use MathML attributes for Pdf's read by adobe reader Apr 22, 2025
@seanbudd seanbudd marked this pull request as draft April 22, 2025 22:57
@NSoiffer NSoiffer changed the base branch from master to beta April 22, 2025 23:07
@NSoiffer
Copy link
Contributor Author

I think I have this rebased to the beta and made all the changes suggested. This is at the edge of my github knowledge, so hopefully it is ok.

@AppVeyorBot
Copy link

See test results for failed build of commit e82e2d78b3

@seanbudd
Copy link
Member

This has been retargeted to beta, but not rebased off beta. It includes commits from master which cannot be merged. You need to drop all the commits from master in the rebase

@seanbudd seanbudd added this to the 2025.1 milestone Apr 24, 2025
@SaschaCowley
Copy link
Member

@NSoiffer do you need help correctly rebasing this?

@davidcarlisle
Copy link

davidcarlisle commented Apr 29, 2025

@NSoiffer do you need help correctly rebasing this?

Neil is traveling and mostly offline for a while, if you can handle this at your side I'm sure it would be appreciated.

@seanbudd please see #17984 (comment) where any changes are pre-authorized.

I have tested Neil's change and confirm it works but don't have an nvda build environment set up so can't easily help clear any remaining issues on this PR.

NSoiffer added 2 commits May 2, 2025 14:19
Updated adobeAcrobat.py with suggestions as per the PR
Updated changes.md as per the PR

Fingers crossed I got this right...
… a static class method.

Added a few more comments.
@SaschaCowley SaschaCowley changed the title Use MathML attributes for Pdf's read by adobe reader Use MathML attributes for PDFs read in Adobe Acrobat May 2, 2025
@SaschaCowley SaschaCowley marked this pull request as ready for review May 2, 2025 04:23
@SaschaCowley SaschaCowley requested a review from seanbudd May 2, 2025 04:23
@seanbudd seanbudd merged commit 182c0c7 into nvaccess:beta May 6, 2025
5 checks passed
@davidcarlisle
Copy link

@SaschaCowley @seanbudd thanks for picking this up in Neil's absence. With 2025.1 we will be able to make mathematical PDF that may be read in acrobat and foxit with the mathematical content read to the same level as web pages, this PR is a last missing piece for acrobat. I'm so happy to see it merged.

seanbudd pushed a commit that referenced this pull request May 6, 2025
Blocked by #17984

Summary of the issue:
In #17276, NVDA now treats the value of formular nodes in Adobe Acrobat as mathml, with out any real validation.
In PDF 2.0 documents, this is no doubt an okay assumption, but for PDF 1.7 documents generated by Microsoft Word, this is now causing Microsoft word generated math speech alternative text to be processed by mathCAT, resulting in broken or junk navigation, as Microsoft Word is exposing its math speech text as the value of the node.
However, at the same time Microsoft has also introduced a new custom mathml attribute it is exposing on formula nodes in PDFs generated from Microsoft Word, that contains real mathMl which is suitable for MathCAT.
NVDA should make use of this new custom attribute if it exists.

Description of user facing changes
In Adobe Acrobat, NVDA can now read and interact with Math equations in PDF documents generated by Microsoft word.

Description of development approach
AcrobatNode NVDAObject's mathml property: first try and fetch Microsoft Office's custom mathMl custom attribute if it exists. Otherwise fallback to using the node's value or descendants.
NSoiffer added a commit to NSoiffer/nvda that referenced this pull request Jul 19, 2025
…ce`. It isn't used in speech (although maybe it should trigger a pause if wide), but it is used in some braille notations as a signal that this is a "fill in the blank" space.

I also added the elementary math attributes used in MathPlayer. Neither MathCAT nor Access8Math currently support the elementary math notations, but it is on the list of things to implement for MathCAT.

Note: potentially this could go into the beta, but I can't get the beta to build on my machine so I can't test the fix there.
NSoiffer added a commit to NSoiffer/nvda that referenced this pull request Jul 22, 2025
…on `mspace`. It isn't used in speech (although maybe it should trigger a pause if wide), but it is used in some braille notations as a signal that this is a "fill in the blank" space."

This branch was based on the wrong branch

This reverts commit 6a6a2a8.
seanbudd pushed a commit that referenced this pull request Jul 24, 2025
Link to issue number:

Closes #17984 (again)
Summary of the issue:

Acrobat's interface doesn't allow code to get all the attributes; they need to be queried individually. I missed an attribute that is relevant for some braille math codes.
Description of user facing changes:

Affects some math braille code output, such as Nemeth code.
Description of developer facing changes:

None.
Description of development approach:

Added some more cases for other math elements
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
conceptApproved Similar 'triaged' for issues, PR accepted in theory, implementation needs review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Math in PDF is not picking up attributes that affect speech
6 participants