Minutely different results when downloading PDF via wget and browser

OFFTOPIC:

Hi once more,

does anyone have a clue as how to mark this thread as “solved”? I was just looking around and couldn´t find a hint. :blush:

some of the categories don’t have a solution option for whatever reason. discussion and no category happen to fit that description unfortunately :slight_smile:

1 Like

Oh, I didn´t know that.

That´s O.K. then.

Many thanks.
Greetings.
Rosika :slightly_smiling_face:

1 Like

both application and general linux question offer the option of choosing a solution if you had your heart set on one and wanted to switch. i think you should be able to switch since you started the thread…

1 Like

I see.
Thanks. But I think I´ll leave this thread here under “discussion” as I think that´s really best suited.

We discussed a lot after all. :wink:

Greetings.
Rosika :slightly_smiling_face:

2 Likes

@all:

Hi once more,

just wanted to inform you that I now put forward a question referring to the topic we´ve been discussing at
https://answers.launchpad.net/ubuntu/+source/wget/+question/692180 .

As soon as I get an answer I´ll inform you about it.

Many greetings.
Rosika :slightly_smiling_face:

2 Likes

Hi all:

Manfred Hampl (m-hampl) just replied with an answer.
I didn´t think we would get one so early. :wink:

[Manfred Hampl (m-hampl)](https://launchpad.net/~m-hampl) said 15 minutes ago: #1

Interesting problem.

To me it seems hat either wget adds that "</body>.</html>" trailer,
or maybe it is even the server.

You have to be aware that the URL is not directly pointing to the file, but the URL has to be interpreted on the server first.
Eventually the server provides slightly different data depending on the requesting application, differing between wget and browsers.

My suggestion:
Enable --debug for the wget command and do some testing with different "User-Agent" values for wget.

Greetings.
Rosika :slightly_smiling_face:

2 Likes

Hi again,

following the suggestion from Manfred Hampl I now may present my findings:

1.) detailed output of wget --debug can be found here:
https://gist.github.com/Rosika2/4507557b4fbf12b481f216851dafdf36

They refer to use with user-agents chromium, falkon and firefox.

2.) md5sum *.pdf (results):

a.) 6ec4ff88e8884c61587e124af2e6181d browser_Linux_from_Scratch.pdf # from within chromium
b.) 6ec4ff88e8884c61587e124af2e6181d wget_user-agent_chromium_Linux_from_Scratch.pdf # NEW
c.) 6ec4ff88e8884c61587e124af2e6181d wget_user-agent_falkon_Linux_from_Scratch.pdf # NEW
d.) da65d66d0dfd995d7fd4f7e7327506b3 wget_user-agent_firefox_Linux_from_Scratch.pdf # NEW
e.) da65d66d0dfd995d7fd4f7e7327506b3 wget_Linux_from_Scratch.pdf # without explicitely setting user-agent
f.) da65d66d0dfd995d7fd4f7e7327506b3 curl_Linux_from_Scratch.pdf

So that´s it.

a.) and b.) shouldn´t come as a surprise as the user-agent should be the same.

c.) is the same. I.e. user-agent falkon behaves like a.) and b.).

The rest behaves like user-agent firefox.

All this adds up to:

a.), b.) and c.) : File size: 959314 bytes
d.), e.) and f.) : File size: 959330 bytes (the above discussed added 16 bytes)

These are my results.

Many greetings.
Rosika :slightly_smiling_face:

1 Like

@all:

Hi everybody,

after downloading the discussed PDF (Linux from Scratch) directly via firefox-browser and discussing the results with Manfred Hampl (m-hampl) he arrived at a slightly modified definition as a result.

The following may be regarded as the ultimative solution to the initial question as to why the 16 byte difference exists:

You have to be aware that the URL is not directly pointing to the file, but the URL has to be interpreted on the server first.
Eventually the server provides slightly different data depending on the requesting application, differing by User-Agent value, e.g. between Chromium and Firefox.

So we´ve done it. :+1:

Thanks a lot to all of you for your very persistent help.
And let´s send our regards also to Manfred Hampl at Question #692180 “difference when downloading PDF via wget and b... : Questions : wget package : Ubuntu for assisting us in our quest.

Many greetings.
Rosika :slightly_smiling_face:

1 Like

thank you for continuing to share your results as well as the launchpad replies. i had never used wget previously so it shouldn’t have come as a surprise that the user agent could be changed, but that was interesting to learn. i did also take a moment to stop by and thank Manfred Hampl for providing that much-sought after (but often enough unobtainable) why :slight_smile:

@01101111:

Thanks for the message.
I´m glad that this topic was of such (rather unexpected) interest for others as well.

Many greetings.
Rosika :slightly_smiling_face:

1 Like

Just a guess, but I would guess that one transmits or sends 32 bit words and the other sends 16 bit words and buffer fills to 32. It could depend on hardware or hardware buffering and done to speed up sending. Like why send 16 bits at a time when it is just as fast to send 32?

1 Like

Very good point. Though, the contents were technically different, so there weren’t just null bytes acting as a filler.

1 Like

Something I#ve noticed the last few months, whenever I download a PDF I get an error message saying download failed but it’s always been exactly where I told it to download to (PDF books folder not default download folder)
It doesn’t bother me just seems a bit strange?