Download password-protected pages with wget

Rosika · 12 August 2020 13:09

Hi altogether,

I am faced with a peculiar problem.

For downloading a html-page with wget I normally use a command like this:

wget "https://itsfoss.community/t/i-like-conky-quite-a-lot-actually/5148"

That works well but when it comes to password-protected pages I run into difficulties.
I looked a bit around and found a certain syntax that should work. Here´s an example:

wget --user=vivek --ask-password http://192.168.1.10/docs/foo.pdf
(taken from: https://www.cyberciti.biz/faq/wget-command-with-username-password/ )

Yet in my case it doesn´t.

I wanted to save a .html-page from itsfoss.community which is private.
So I tried the command

wget --user=Rosika --ask-password 'https://itsfoss.community/t/[...]'

I was asked for my PW and entered it.
Yet I got this response:

Password for user ‘Rosika’: 
--2020-08-11 18:42:20--  https://itsfoss.community/t/[...]
Resolving itsfoss.community (itsfoss.community)... 178.128.172.26
Connecting to itsfoss.community (itsfoss.community)|178.128.172.26|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2020-08-11 18:42:25 ERROR 404: Not Found.

Theoretically this command should work but it doesn´t.

Can anybody help?

Many thanks in advance.
Greetings.
Rosika

Akito · 12 August 2020 13:43

Could you create a private post for testing purposes, so you are able to share the whole command including the whole URL? Then we can test it, too.

01101111 · 12 August 2020 13:46

i tried the same with a page from personal messages. i tried my login email as well as username and even http instead of https, but all with the same result: 404 not found.

Akito · 12 August 2020 13:49

Never send credentials over HTTP.

Rosika · 12 August 2020 13:50

Hi @Akito:

thanks.
I created a private post here:https://itsfoss.community/t/test-file-for-wget/5209 .

So the respective command would have to be:

Rosika · 12 August 2020 13:52

Hi cord,

thanks a lot for the confirmation. So I´m not alone with that problem.

Rosika · 12 August 2020 13:59

additional info:

I also tried to solve the problem with lynx-browser and posted it to the lynx mailing list.
Here´s the original message:

I often use lynx for documentation purposes by saving forum-pages as a “.txt-file” using the “-dump”-option like so:

lynx -dump "https://itsfoss.community/t/save-changes-on-ubuntu-live-usb/5189/7" > file.txt

Normally that works very well but I can´t use this command if the page is password-protected.
The resulting .txt-file yields among other info:
“Oops! That page doesn’t exist or is private.”
I have got the credentials (user name and password), so that should be no problem.
But I´m at a loss as how to implement them in the command. I´m not sure about the syntax.

Up and until now they couldn´t help me either.

Greetings
Rosika

Akito · 12 August 2020 14:00

The problem is that you need a more specific request. Here is what works for me, but won’t work for you, because you need to gather your own HTTP request for this to work:

curl 'https://itsfoss.community/t/test-file-for-wget/5209' -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:79.0) Gecko/20100101 Firefox/79.0' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8' -H 'Accept-Language: en,en-US;q=0.5' --compressed -H 'DNT: 1' -H 'Connection: keep-alive' -H 'Cookie: _t=41d5de726ab37ca9f8e1801c3c713227; _forum_session=UWZFM1RvUFBIekRYUE4yRGRNZlhOdjZCclNEZERHSHJpSFJwLzJ0M2FiZmZlWjNQb3dKTFFYZTgzWG1CY3pmbmZRUVczMGhNRXhOdTRnYjNsME5YVGNVNTZuTzJ4KzFlbStJZ09WSkdtd0Z4bG93UHFrdnAzb0NzYmF3QW1rekZYdlI1dHBJQ0QrdzV4SGlibDlGSnowdXpBbDdxTFhQSkJSUVpXMVpGQmxvVVdWZ1B2aVEvdGhtUWdYekZBWmh0L3RJNzl6cUVVRTFQQ3dJRTQ2T2tJcDRFUzA3SnJwdHRGRC9DUHIwL1hnN3BLZ3cwRzdYK0hpTy9KQzVCejR4SklhekwxQkltblFtSVB3NVFaMlE5NEJPRnJXVlpWdmd3a3M1cnpKU0NLcWh4QVg1QmdlY2s3ckxvdjBYK0lRYklaVytKb3YyZHlscCt4WFhSOUk4YTBLYkptRHRoR3JNSGlQeW4reTRKbUNubW5HSVBFRHI2TVVUMGV2YU43YWZjWnU5VFFxQ29ocGFOTm1EdUxheFl1YSszdE9ROEo1cVkzUUdJWUoyQ0REQWRtK0lxL1k0cE1uREZMQ0tHYnJVUy0tUTZqSGVjdWRMK0lETEJURFJQaXFqdz09--c7dba8478aab6a0acc5d1e5f15280747bc52c797' -H 'Upgrade-Insecure-Requests: 1' -H 'Pragma: no-cache' -H 'Cache-Control: no-cache'

Easier to read:

curl ‘https://itsfoss.community/t/test-file-for-wget/5209’ -H ‘User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:79.0) Gecko/20100101 Firefox/79.0’ -H ‘Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,/;q=0.8’ -H ‘Accept-Language: en,en-US;q=0.5’ --compressed -H ‘DNT: 1’ -H ‘Connection: keep-alive’ -H ‘Cookie: _t=41d5de726ab37ca9f8e1801c3c713227; _forum_session=UWZFM1RvUFBIekRYUE4yRGRNZlhOdjZCclNEZERHSHJpSFJwLzJ0M2FiZmZlWjNQb3dKTFFYZTgzWG1CY3pmbmZRUVczMGhNRXhOdTRnYjNsME5YVGNVNTZuTzJ4KzFlbStJZ09WSkdtd0Z4bG93UHFrdnAzb0NzYmF3QW1rekZYdlI1dHBJQ0QrdzV4SGlibDlGSnowdXpBbDdxTFhQSkJSUVpXMVpGQmxvVVdWZ1B2aVEvdGhtUWdYekZBWmh0L3RJNzl6cUVVRTFQQ3dJRTQ2T2tJcDRFUzA3SnJwdHRGRC9DUHIwL1hnN3BLZ3cwRzdYK0hpTy9KQzVCejR4SklhekwxQkltblFtSVB3NVFaMlE5NEJPRnJXVlpWdmd3a3M1cnpKU0NLcWh4QVg1QmdlY2s3ckxvdjBYK0lRYklaVytKb3YyZHlscCt4WFhSOUk4YTBLYkptRHRoR3JNSGlQeW4reTRKbUNubW5HSVBFRHI2TVVUMGV2YU43YWZjWnU5VFFxQ29ocGFOTm1EdUxheFl1YSszdE9ROEo1cVkzUUdJWUoyQ0REQWRtK0lxL1k0cE1uREZMQ0tHYnJVUy0tUTZqSGVjdWRMK0lETEJURFJQaXFqdz09–c7dba8478aab6a0acc5d1e5f15280747bc52c797’ -H ‘Upgrade-Insecure-Requests: 1’ -H ‘Pragma: no-cache’ -H ‘Cache-Control: no-cache’

Rosika · 12 August 2020 14:15

Hi @Akito:

Impressive command.
How did you come by it?

I don´t quite understand.
The url is https://itsfoss.community/t/test-file-for-wget/5209 , or am I wrong here?

Greetings.
Rosika

Akito · 12 August 2020 14:25

That’s the URL. But technically you are making a GET request and this can feature and even require more information than just the destination URL.

Press F12 when you are on the private message. Then go to the Network tab and look for the request initiated by BrowserTabChild*. Right-click the request and Copy as cURL (POSIX). Then paste & execute the acquired command in your terminal.

Rosika · 12 August 2020 14:34

Thanks @Akito,

currently I´m using chromium and cannot follow your pocedure.
I´ll give firefox a try.

Rosika · 12 August 2020 15:58

Hi again,

I´ve been trying to reproduce your instructions but wasn´t really successful (neither with chromium nor with firefox).

Never mind.
I think this - even if I could manage - wouldn´t be target-oriented for my purposes as I would have to log in via another browser (like chromium) first in order to obtain the command parameters with “Copy as cURL”.

In that case I could copy the contents of the page manually.

Perhaps I´ll still get something from the lynx-people…

But thanks a lot for your help anyway.

Greetings.
Rosika

Akito · 12 August 2020 18:09

There are probably programmatic ways of achieving what you want.

Rosika · 13 August 2020 10:18

Hi @Akito,

thanks a lot for the link. I´ll look into it.

Yet it keeps me bugging that I cannot get your curl-command working the way you did.
Despite what I said previously I´d like to be able to use that as well.

Therefore let me ask a bit further, if you don´t mind.

What browser did you use for procuring that long option for curl?
Why wouldn´t your exact command work with me?
I.e.: Why would I have to use a HTTP request of my own?

Thanks for your help.

Greetings.
Rosika

Akito · 13 August 2020 11:10

itsfoss_curl

Because you need your own session_id to get permission to view the request. Every user connected here has their own id and is permitted to view certain parts of this forum, depending on the permissions associated with their accounts and therefore with their session_id.

Rosika · 13 August 2020 11:59

Hi @Akito,

first of all: thank you very much for that wonderful explanatory gif you created. What an excellent idea.

Finally I managed to follow your instructions. Yet I needed firefox to do this. Couldn´t reproduce it with chromium.

O.K. Thanks. Really didn´t know that.

So now that I managed to download the page as per your instructions I put a redirect at the end of the command (“> output.html”) to save it as an html-file.
Yet opening that one in a browser simply displayed an empty page!

So I tried a redirect to text (“> output.txt”).
Here the contents of the page are displayed but the output is pretty messed up as everything the html-file consists if of is shown.

But at least I got the command working which is success indeed.

Thanks for your great effort ad help.

Greetings.
Rosika

Akito · 13 August 2020 12:02

Instead of redirection, you should try the o or O option from curl.

https://linux.die.net/man/1/curl

Rosika · 13 August 2020 12:29

Thanks @Akito,

I tried the o and O option.
[command] -o "file.txt" yielded the same cluttered text output as the redirect did. So no improvement there.

And [command] -O this time produced an html file that instead of displaying nothing (like my redirect did) showed the following:

Sign Up Log In

Oops! That page doesn’t exist or is private.

Popular […]

Strange…
Greetings.
Rosika

Akito · 13 August 2020 12:30

It means you weren’t authenticated properly to view the message. Make sure your session_id is valid by getting the newest one in the way that is shown in the Gif.

Rosika · 13 August 2020 12:45

H again,

O.K. I did that.
Applying [command] -O resulted in getting an html-file named “5209”. It consists of 86,3 kB data according to my file-manager.
But opening it in the browser presents an empty page once again.

Still thanks a lot.

Greetings.
Rosika