Skip to content

Html module producing unusable PDFs #3343

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
DanLopess opened this issue Dec 17, 2021 · 1 comment
Closed

Html module producing unusable PDFs #3343

DanLopess opened this issue Dec 17, 2021 · 1 comment

Comments

@DanLopess
Copy link

DanLopess commented Dec 17, 2021

I have read and understood the contribution guidelines

I'm trying to generate a PDF from a React component. In my scenario it is a table but it could be anything. The problem I'm facing is not with generating the PDF but the produced PDF itself which is too heavy and unusable.

I have a function like this:

createPdf = async (html: HTMLElement, pdfName: string = "report.pdf") => {
    const doc = new jsPDF("p", "pt", "a4", true);

    await doc.html(html, {
      margin: 10,
      html2canvas: {
        scale: 0.65
      },
    });
    doc.save(pdfName);
    return doc.output("blob");
  };

I insert an HTMLElement which is the DOM content for my React component and it generates the PDF correctly. However this pdf is very big and extremely slow to load. Mind that I have an i9 with 32GB of RAM and it easily takes 15secs to render one page of the pdf in the browser.

Initially it was generating a 150MB pdf but then I set compressed to true and it's now around 600KB. That changes the size but it didn't improve the performance somehow. I've tried multiple computers and browsers and I've tinkered with the options for html2canvas and nothing seems to fix this.

I did notice however that pdf readers like Adobe Acrobat seem to deal with it fine.

Here is the pdf (docdroid seems to be able to read it fine, but the browser pdf viewer can't handle it)

After some analysis on the produced document I noticed a couple of things:

  • The decompression of too much high precision content as plain text it becomes a hog at 1,175,023 bytes one duplicated box entry looks like this 10. 1653.7799999999999727 m 569. 1653.7799999999999727 l 569. 588.6534374999998818 l 10. 588.6534374999998818 l 10. 1653.7799999999999727 l 569. 1653.7799999999999727 l
  • Every letter is masses of detail here is the first letter "I" on my first page BT /F1 9.1 Tf 10.4649999999999981 TL 0. 0. 0. rg 20.3999999999941792 794.1899999999999409 Td (I) Tj ET 1. w 0. 0. 0. rg 1. G 0. w 0 j 0. 0. 0. rg 10. 824.0900000000000318 m 565.0999999999985448 824.0900000000000318 l 565.0999999999985448 -225.4365625000001501 l 10. -225.4365625000001501 l 10. 824.0900000000000318 l 565.0999999999985448 824.0900000000000318 l W n 0. w . If this was being correctly generated it should be described as plain text (ASCII or UTF) but it's not.
  • There is something seriously wrong with the pdf generation engine to not be combining letters into words, but even if lines were lines of text that .999999999 structure is massive bloat requiring extra recalculation time

If someone has any insight on this, it would be extremely helpful.

@HackbrettXXX
Copy link
Collaborator

We are aware of this issue, see #3137. Closing this as a duplicate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants