Skip to content

command line pdftotext works fine, not so pdf-text-extract for same pdf file #30

@markbobick

Description

@markbobick

OS is fedora 20
using a file with "good provenance" "https://www.edge.org/documents/life/Life.pdf" as test for code:
(wget the file without issue)
...
var pdfToTextCommand = '/usr/bin/pdftotext';
var extract = require('pdf-text-extract');
...
extract(filePath, { splitPages: false, eol: "unix" }, pdfToTextCommand, function (err, text) {
if (err) {
var message = { message: err + " could not convert requested PDF file to text" };
console.log(JSON.stringify(message));
res.json(message);
}
console.log(text);

result:
"Error: pdf-text-extract command failed: Syntax Warning: May not be a PDF file (continuing anyway)\nSyntax Error: Couldn't find trailer dictionary\nSyntax Error: Couldn't read xref table\n

all files from any source I've attempted have same issue. performed the stare and compare. do not see error. using command line pdftotext works great. where is my error? Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions