Use MathML attributes for PDFs read in Adobe Acrobat #17984

NSoiffer · 2025-04-18T04:50:35Z

Link to issue number:

This fixes #17980

Summary of the issue:

For PDF, the code was not grabbing MathML attributes needed for speech.

Description of user facing changes

The speech for math, and rarely the braille (e.g., bevelled fractions in Nemeth), was not always correct due to attributes not being picked up and the defaults being used.

Description of development approach

Unfortunately, the PDF interface does not allow grabbing all the attributes. Instead, one must ask for each attribute individually. Most attributes don't affect speech or braille, so it is not necessary to get them. I looked at what MathCAT used and added those. In particular, the following are picked up:

		id = node.GetID()
		if id:
			yield f' id="{id}"'
		yield getMathMLAttributes(node, ["intent", "arg"])
		if tag == "mi" or tag == "mn" or tag == "mo" or tag == "mtext":
			yield getMathMLAttributes(node, ["mathvariant"])
		elif tag == "mfenced":
			yield getMathMLAttributes(node, ["open", "close", "separators"])
		elif tag == "menclose":
			yield getMathMLAttributes(node, ["notation"])
		elif tag == "annotation-xml" or tag == "annotation":
			yield getMathMLAttributes(node, ["encoding"])
		elif tag == "ms":
			yield getMathMLAttributes(node, ["open", "close"])

Note: intent and arg are new to MathML 4 and are aimed at improving speech for math.

Testing strategy:

There is a test file in the issue. It tests intent. I asked David Carlisle, a LaTeX developer, to generate some (fully tagged) PDF examples that use more than intent (which is what pdftex will generate on its own in some cases. He gave me a sample that displays poorly, but has some attrs hacked into it. There is a log statement (at debug level) that shows the MathML that is gathered up. The MathML picked was correct. The speech was also as expected given the MathML.

I don't know how one would create a unit or system test for this. The issue contains a PDF example which can be tested (issue describes the expected result).

Known issues with pull request:

None.

Code Review Checklist:

Documentation:
- Change log entry
- User Documentation
- Developer / Technical Documentation
- Context sensitive help for GUI changes
Testing:
- Unit tests
- System (end to end) tests
- Manual testing
UX of all users considered:
- Speech
- Braille
- Low Vision
- Different web browsers
- Localization in other languages / culture than English
API is compatible with existing add-ons.
Security precautions taken.

@coderabbitai summary

AppVeyorBot · 2025-04-18T05:31:00Z

PASS: Translation comments check.
PASS: License check.
PASS: Unit tests.
FAIL: System tests (tags: installer NVDA). See test results for more information.
Build (for testing PR): https://ci.appveyor.com/api/buildjobs/84xae4qigf6drxwq/artifacts/output/nvda_snapshot_pr17984-36103,d7c86a33.exe
CI timing (mins):
INIT 0.0,
INSTALL_START 1.2,
INSTALL_END 0.9,
BUILD_START 0.0,
BUILD_END 18.0,
TESTSETUP_START 0.0,
TESTSETUP_END 0.4,
TEST_START 0.0,
TEST_END 18.5,
FINISH_END 0.1

See test results for failed build of commit d7c86a33bf

codeofdusk · 2025-04-20T09:21:14Z

@NSoiffer Should this target beta (i.e. is it a release-blocking bug in #17276)?

NSoiffer · 2025-04-21T05:13:13Z

It is a bug that results in incorrect speech and possibly incorrect braille if one of the attributes for the elements listed above is present. For example, if someone uses

<mi mathvariant="double-struck">C</mi>

then instead of hearing either "the complex numbers" or "double struck cap C", they will just hear "cap C". Similarly, the braille will be wrong as it will be missing a script indicator. Or as in the bug report, they won't hear the equation label as a label ("line 2 with label 2 ...") and I think they will hear the label as just data which is very confusing.

My feeling this is a pretty serious bug and should be part of the beta.

codeofdusk · 2025-04-21T05:35:06Z

In which case, please change the base branch on GitHub and rebase your branch accordingly.

user_docs/en/changes.md

source/NVDAObjects/IAccessible/adobeAcrobat.py

NSoiffer · 2025-04-22T23:42:43Z

I think I have this rebased to the beta and made all the changes suggested. This is at the edge of my github knowledge, so hopefully it is ok.

AppVeyorBot · 2025-04-23T00:30:22Z

PASS: Translation comments check.
PASS: License check.
PASS: Unit tests.
FAIL: System tests (tags: installer NVDA). See test results for more information.
Build (for testing PR): https://ci.appveyor.com/api/buildjobs/0tjadl64aup472s2/artifacts/output/nvda_snapshot_pr17984-36160,e82e2d78.exe
CI timing (mins):
INIT 0.0,
INSTALL_START 1.3,
INSTALL_END 1.0,
BUILD_START 0.0,
BUILD_END 19.6,
TESTSETUP_START 0.0,
TESTSETUP_END 0.4,
TEST_START 0.0,
TEST_END 19.2,
FINISH_END 0.2

See test results for failed build of commit e82e2d78b3

seanbudd · 2025-04-24T05:41:04Z

This has been retargeted to beta, but not rebased off beta. It includes commits from master which cannot be merged. You need to drop all the commits from master in the rebase

SaschaCowley · 2025-04-29T06:40:27Z

@NSoiffer do you need help correctly rebasing this?

davidcarlisle · 2025-04-29T18:26:41Z

@NSoiffer do you need help correctly rebasing this?

Neil is traveling and mostly offline for a while, if you can handle this at your side I'm sure it would be appreciated.

@seanbudd please see #17984 (comment) where any changes are pre-authorized.

I have tested Neil's change and confirm it works but don't have an nvda build environment set up so can't easily help clear any remaining issues on this PR.

Updated adobeAcrobat.py with suggestions as per the PR Updated changes.md as per the PR Fingers crossed I got this right...

… a static class method. Added a few more comments.

Co-authored-by: Sean Budd <[email protected]>

source/NVDAObjects/IAccessible/adobeAcrobat.py

davidcarlisle · 2025-05-06T00:54:03Z

@SaschaCowley @seanbudd thanks for picking this up in Neil's absence. With 2025.1 we will be able to make mathematical PDF that may be read in acrobat and foxit with the mathematical content read to the same level as web pages, this PR is a last missing piece for acrobat. I'm so happy to see it merged.

Blocked by #17984 Summary of the issue: In #17276, NVDA now treats the value of formular nodes in Adobe Acrobat as mathml, with out any real validation. In PDF 2.0 documents, this is no doubt an okay assumption, but for PDF 1.7 documents generated by Microsoft Word, this is now causing Microsoft word generated math speech alternative text to be processed by mathCAT, resulting in broken or junk navigation, as Microsoft Word is exposing its math speech text as the value of the node. However, at the same time Microsoft has also introduced a new custom mathml attribute it is exposing on formula nodes in PDFs generated from Microsoft Word, that contains real mathMl which is suitable for MathCAT. NVDA should make use of this new custom attribute if it exists. Description of user facing changes In Adobe Acrobat, NVDA can now read and interact with Math equations in PDF documents generated by Microsoft word. Description of development approach AcrobatNode NVDAObject's mathml property: first try and fetch Microsoft Office's custom mathMl custom attribute if it exists. Otherwise fallback to using the node's value or descendants.

…ce`. It isn't used in speech (although maybe it should trigger a pause if wide), but it is used in some braille notations as a signal that this is a "fill in the blank" space. I also added the elementary math attributes used in MathPlayer. Neither MathCAT nor Access8Math currently support the elementary math notations, but it is on the list of things to implement for MathCAT. Note: potentially this could go into the beta, but I can't get the beta to build on my machine so I can't test the fix there.

…on `mspace`. It isn't used in speech (although maybe it should trigger a pause if wide), but it is used in some braille notations as a signal that this is a "fill in the blank" space." This branch was based on the wrong branch This reverts commit 6a6a2a8.

Link to issue number: Closes #17984 (again) Summary of the issue: Acrobat's interface doesn't allow code to get all the attributes; they need to be queried individually. I missed an attribute that is relevant for some braille math codes. Description of user facing changes: Affects some math braille code output, such as Nemeth code. Description of developer facing changes: None. Description of development approach: Added some more cases for other math elements

NSoiffer requested a review from a team as a code owner April 18, 2025 04:50

NSoiffer requested a review from seanbudd April 18, 2025 04:50

SaschaCowley added the conceptApproved Similar 'triaged' for issues, PR accepted in theory, implementation needs review. label Apr 21, 2025

seanbudd changed the title ~~Pdf math~~ Use MathML attributes for Pdf's read by adobe reader Apr 22, 2025

seanbudd reviewed Apr 22, 2025

View reviewed changes

seanbudd marked this pull request as draft April 22, 2025 22:57

NSoiffer changed the base branch from master to beta April 22, 2025 23:07

NSoiffer force-pushed the pdf-math branch from 6d39a56 to 9fdfdf2 Compare April 22, 2025 23:40

seanbudd added this to the 2025.1 milestone Apr 24, 2025

NSoiffer added 2 commits May 2, 2025 14:19

Rebased file on beta branch.

0c440fe

Updated adobeAcrobat.py with suggestions as per the PR Updated changes.md as per the PR Fingers crossed I got this right...

Moved getMathMLAttributes out from being a nested function to being…

85a7335

… a static class method. Added a few more comments.

SaschaCowley force-pushed the pdf-math branch from d7031be to 85a7335 Compare May 2, 2025 04:20

Update source/NVDAObjects/IAccessible/adobeAcrobat.py

3a75019

Co-authored-by: Sean Budd <[email protected]>

SaschaCowley changed the title ~~Use MathML attributes for Pdf's read by adobe reader~~ Use MathML attributes for PDFs read in Adobe Acrobat May 2, 2025

SaschaCowley marked this pull request as ready for review May 2, 2025 04:23

SaschaCowley requested a review from seanbudd May 2, 2025 04:23

seanbudd reviewed May 2, 2025

View reviewed changes

source/NVDAObjects/IAccessible/adobeAcrobat.py Outdated Show resolved Hide resolved

source/NVDAObjects/IAccessible/adobeAcrobat.py Outdated Show resolved Hide resolved

source/NVDAObjects/IAccessible/adobeAcrobat.py Show resolved Hide resolved

Apply suggestions from code review

1577b3e

seanbudd mentioned this pull request May 5, 2025

AdobeAcrobat: support custom Microsoft Office mathml attribute. #18056

Merged

5 tasks

seanbudd approved these changes May 6, 2025

View reviewed changes

seanbudd merged commit 182c0c7 into nvaccess:beta May 6, 2025
5 checks passed

NSoiffer mentioned this pull request Jul 19, 2025

additional fix to #17984 (MathML attrs in PDF) #18508

Merged

5 tasks

Uh oh!

Use MathML attributes for PDFs read in Adobe Acrobat #17984

Use MathML attributes for PDFs read in Adobe Acrobat #17984

Uh oh!

Conversation

NSoiffer commented Apr 18, 2025

Link to issue number:

Summary of the issue:

Description of user facing changes

Description of development approach

Testing strategy:

Known issues with pull request:

Code Review Checklist:

Uh oh!

AppVeyorBot commented Apr 18, 2025

Uh oh!

codeofdusk commented Apr 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NSoiffer commented Apr 21, 2025

Uh oh!

codeofdusk commented Apr 21, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

NSoiffer commented Apr 22, 2025

Uh oh!

AppVeyorBot commented Apr 23, 2025

Uh oh!

seanbudd commented Apr 24, 2025

Uh oh!

SaschaCowley commented Apr 29, 2025

Uh oh!

davidcarlisle commented Apr 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

davidcarlisle commented May 6, 2025

Uh oh!

Uh oh!

codeofdusk commented Apr 20, 2025 •

edited

Loading

davidcarlisle commented Apr 29, 2025 •

edited

Loading