I wonder if you converted it to a postscript file (.ps) would that remove it?
I think there is a program pdf2ps.
" convert it from pdf to ps and back to pdf. Basically, it prints the file, which (I believe) will remove all meta information. I imagine that pdf2ps doesn’t have a built in javascript library, so I think it is safe to assume that any malicious javascript will be securely removed in this process."
Yes that seems OK
Another idea. You can use the qpdf program to inspect a pdf and remove things manually.
Note :
There is a program called diffpdf which compares 2 pdf files… like diff compares 2 txt files. You could use it to check if things have been removed.
I wonder if you converted it to a postscript file (.ps) would that remove it?
I think there is a program pdf2ps .
I don’t have idea if it is possible … @Rosika some thougths about this? Perhaps it is related with the pdfinfo command suggested by you at:
Continuing
" convert it from pdf to ps and back to pdf. Basically, it prints the file, which (I believe) will remove all meta information. I imagine that pdf2ps doesn’t have a built in javascript library, so I think it is safe to assume that any malicious javascript will be securely removed in this process ."
From where you took that statement? It has " "
Yes that seems OK
If some member of this network can confirm that would be great
Another idea. You can use the qpdf program to inspect a pdf and remove things manually.
Have you test it?
Note :
There is a program called diffpdf which compares 2 pdf files… like diff compares 2 txt files. You could use it to check if things have been removed.
Huge thanks for that command … but for this scenario only exists 1 file, therefore there is no other file to do a comparison but this command now is in my toolbox
As usual huge thanks for your polite support friend!
Go ahead and use pdf2ps… then convert the .ps file back to .pdf with ps2pdf
The javascript should not survive the conversion.
Compare the result with the original using diffpdf
Go ahead and use pdf2ps … then convert the .ps file back to .pdf with ps2pdf
The javascript should not survive the conversion.
Compare the result with the original using diffpdf
I was always of the opinion that flattening would do the job the easiest way.
I´m not totally sure about it, so you´d have to try it out for yourself and then check the outcome to see whether JS was removed.
Printing the PDF to a new PDF using the “Print to PDF” feature (available in most Linux PDF viewers or web browsers) effectively flattens the document.
This process should remove JavaScript, as the printer driver does not transfer code, only the visual representation. At least that´s my understanding of it.
Another method could be using the ghostscript command.
Personally I use it a lot for converting PDFs to monochrome or to combine PDFs and the like.
But you might use it for getting rid of JS, too.
Just try the following and then check the result:
Converting a PDF to PostScript using pdf2ps and then back to PDF with ps2pdf will also remove JavaScript and most interactive content. The PostScript format does not support embedded JavaScript, so the conversion process strips it out* .
This is a reliable method, but it may degrade some aspects of the PDF, such as hyperlinks, bookmarks, or advanced layout features.