-
-
Notifications
You must be signed in to change notification settings - Fork 710
Use MathML attributes for PDFs read in Adobe Acrobat #17984
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
See test results for failed build of commit d7c86a33bf |
It is a bug that results in incorrect speech and possibly incorrect braille if one of the attributes for the elements listed above is present. For example, if someone uses
then instead of hearing either "the complex numbers" or "double struck cap C", they will just hear "cap C". Similarly, the braille will be wrong as it will be missing a script indicator. Or as in the bug report, they won't hear the equation label as a label ("line 2 with label 2 ...") and I think they will hear the label as just data which is very confusing. My feeling this is a pretty serious bug and should be part of the beta. |
In which case, please change the base branch on GitHub and rebase your branch accordingly. |
I think I have this rebased to the beta and made all the changes suggested. This is at the edge of my github knowledge, so hopefully it is ok. |
See test results for failed build of commit e82e2d78b3 |
This has been retargeted to beta, but not rebased off beta. It includes commits from master which cannot be merged. You need to drop all the commits from master in the rebase |
@NSoiffer do you need help correctly rebasing this? |
Neil is traveling and mostly offline for a while, if you can handle this at your side I'm sure it would be appreciated. @seanbudd please see #17984 (comment) where any changes are pre-authorized. I have tested Neil's change and confirm it works but don't have an nvda build environment set up so can't easily help clear any remaining issues on this PR. |
Updated adobeAcrobat.py with suggestions as per the PR Updated changes.md as per the PR Fingers crossed I got this right...
… a static class method. Added a few more comments.
Co-authored-by: Sean Budd <[email protected]>
@SaschaCowley @seanbudd thanks for picking this up in Neil's absence. With 2025.1 we will be able to make mathematical PDF that may be read in acrobat and foxit with the mathematical content read to the same level as web pages, this PR is a last missing piece for acrobat. I'm so happy to see it merged. |
Blocked by #17984 Summary of the issue: In #17276, NVDA now treats the value of formular nodes in Adobe Acrobat as mathml, with out any real validation. In PDF 2.0 documents, this is no doubt an okay assumption, but for PDF 1.7 documents generated by Microsoft Word, this is now causing Microsoft word generated math speech alternative text to be processed by mathCAT, resulting in broken or junk navigation, as Microsoft Word is exposing its math speech text as the value of the node. However, at the same time Microsoft has also introduced a new custom mathml attribute it is exposing on formula nodes in PDFs generated from Microsoft Word, that contains real mathMl which is suitable for MathCAT. NVDA should make use of this new custom attribute if it exists. Description of user facing changes In Adobe Acrobat, NVDA can now read and interact with Math equations in PDF documents generated by Microsoft word. Description of development approach AcrobatNode NVDAObject's mathml property: first try and fetch Microsoft Office's custom mathMl custom attribute if it exists. Otherwise fallback to using the node's value or descendants.
…ce`. It isn't used in speech (although maybe it should trigger a pause if wide), but it is used in some braille notations as a signal that this is a "fill in the blank" space. I also added the elementary math attributes used in MathPlayer. Neither MathCAT nor Access8Math currently support the elementary math notations, but it is on the list of things to implement for MathCAT. Note: potentially this could go into the beta, but I can't get the beta to build on my machine so I can't test the fix there.
…on `mspace`. It isn't used in speech (although maybe it should trigger a pause if wide), but it is used in some braille notations as a signal that this is a "fill in the blank" space." This branch was based on the wrong branch This reverts commit 6a6a2a8.
Link to issue number: Closes #17984 (again) Summary of the issue: Acrobat's interface doesn't allow code to get all the attributes; they need to be queried individually. I missed an attribute that is relevant for some braille math codes. Description of user facing changes: Affects some math braille code output, such as Nemeth code. Description of developer facing changes: None. Description of development approach: Added some more cases for other math elements
Link to issue number:
This fixes #17980
Summary of the issue:
For PDF, the code was not grabbing MathML attributes needed for speech.
Description of user facing changes
The speech for math, and rarely the braille (e.g., bevelled fractions in Nemeth), was not always correct due to attributes not being picked up and the defaults being used.
Description of development approach
Unfortunately, the PDF interface does not allow grabbing all the attributes. Instead, one must ask for each attribute individually. Most attributes don't affect speech or braille, so it is not necessary to get them. I looked at what MathCAT used and added those. In particular, the following are picked up:
Note:
intent
andarg
are new to MathML 4 and are aimed at improving speech for math.Testing strategy:
There is a test file in the issue. It tests
intent
. I asked David Carlisle, a LaTeX developer, to generate some (fully tagged) PDF examples that use more thanintent
(which is what pdftex will generate on its own in some cases. He gave me a sample that displays poorly, but has some attrs hacked into it. There is a log statement (at debug level) that shows the MathML that is gathered up. The MathML picked was correct. The speech was also as expected given the MathML.I don't know how one would create a unit or system test for this. The issue contains a PDF example which can be tested (issue describes the expected result).
Known issues with pull request:
None.
Code Review Checklist:
@coderabbitai summary