I dowloaded it via the browser (chromium BTW) and via wget. And just as @01101111 reported the md5sum is exactly the same.
So I suppose there´s little point in reporting a “bug” to the wget maintainers as wget doesn´t seem to be the culprit.
BUT:
As an additional step I thought: give the comparison another try by downloading another book from tradepub in order to see if could replicate the initial behaviour.
So I downloaded the book “Linux from Scratch” (version 7.4) which seems to be an earlier version of what @01101111 downloaded himself.
i resisted downloading the pdf you previously mentioned for the same reason as @daniel.m.tripp: the registration questions just seemed to be a bit more than i wanted to part with. since it seemed possible the pdf was licensed or copyrighted i didn’t ask for the link you used, but it seems unlikely that the lfs 7.4 pdf you downloaded would be. if you don’t mind sharing that link, i would give the process a go with firefox and wget.
of course it also gets interesting (or so it would seem) if you add in curl as suggested previously by @daniel.m.tripp and other browsers.
I was actually suggesting to “misuse” the bug tracker for a question regarding this topic. The problem is that if you would ask in a normal forum, as it is the case now, the probability is almost zero that you find someone who knows wget so deeply, that they would be able to answer the question well enough.
thanks for the clarification.
I see.
Taking the latest findings into account it seems that doing the same with curl might (or might not) yield results as well.
Cos´ it turned out that wget and curl actually got the exact same PDFs.
looks like there is a timestamp or timeout included in the link/email. i was able to get a copy with firefox, but epiphany took me to another download page where it asked me to register and wget grabbed an html doc that lead to same or similar. i tried the initial link again and got:
The link to ‘Linux from Scratch’ has expired. Please register below to get your free download.
no apologies necessary. it is just an interesting diversion to look into
i am getting a similar html doc from wget. there is no mention of link expiration this time. just an offer to download another copy. i admit that i hadn’t used it or curl before. are you adding any options or just wget link?
Of course the firejail part isn´t really necessary. I tend to firejail almost everything. A bit paranoid
perhaps.
Yet even the developer provides a dedicated profile for it (wget.profile). So why not use it…
For your purposes I think this command should do (hopefully):
i was able to grab another copy with epiphany (browser - aka Gnome Web) when wget wouldn’t work. after i got your response i tried to copy and paste with the quotes you used. i believe that yielded the same html doc. curl refused to work with those quotes. eventually wget with single quotes gave different terminal output, but the same html doc. curl also ran with single quotes, but by that point it appears that the link timed out again.
i remember sending something to myself with firefox send and one of the options was to set either a time limit or number of tries. it is possible this link is using a similar method or methods. while frustrating in this particular instance, it makes sense to not give away bandwidth when the same user should just be able to make copies of the document
in spite of all of that, i did run md5sum on the copies i got from firefox and epiphany with an interesting difference:
the file sizes from pdfinfo also match across those two different sets.
it would have been interesting to see what my wget and curl sums were, but i think that throws at least a minor monkey wrench in the working hypothesis that it is just a difference between the browser and terminal fetched documents.
I´m so sorry that neither wget nor curl wold work the way we intended.
After all you put so much effort into trying. It´s a real shame.
O.K. Some success at last. Phew.
Your findings are very interesing indeed.
First of all: There´s a difference when downloading with different devices/browsers. That confirms my findings. Great.
And then:
Wow. Not sure what to make of that but it´s very interesting.
It gets better and better.
O.K. We have verified that the PDFs are (slightly) different with your method as well. That´s really great.
At least it´s safe to say we have done as much as we could.
Thank you so much for your time and effort in investiganting that matter.
And: so sorry that my links didn´t work the way they should have.
I forgot that you can actually look at it from a Base16 perspective.
Since it is almost certain that something gets either pre- or appended to the PDF file through wget and curl, you can look at what difference between the first and last 16 bytes of the PDF. There should be your answer.
Hi @Akito,
what a wonderful idea.
Lacking the underlying knowledge (haven´t had used a hexviewer so far ) it would never have occurred to me to use it.
Yes, I thought so, too.
Thanks a lot for the hint.