Download password-protected pages with wget

Hi altogether,

I am faced with a peculiar problem.

For downloading a html-page with wget I normally use a command like this:

wget "https://itsfoss.community/t/i-like-conky-quite-a-lot-actually/5148"

That works well but when it comes to password-protected pages I run into difficulties.
I looked a bit around and found a certain syntax that should work. Here´s an example:

wget --user=vivek --ask-password http://192.168.1.10/docs/foo.pdf
(taken from: https://www.cyberciti.biz/faq/wget-command-with-username-password/ )

Yet in my case it doesn´t.

I wanted to save a .html-page from itsfoss.community which is private.
So I tried the command

wget --user=Rosika --ask-password 'https://itsfoss.community/t/[...]'

I was asked for my PW and entered it.
Yet I got this response:

Password for user ‘Rosika’: 
--2020-08-11 18:42:20--  https://itsfoss.community/t/[...]
Resolving itsfoss.community (itsfoss.community)... 178.128.172.26
Connecting to itsfoss.community (itsfoss.community)|178.128.172.26|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2020-08-11 18:42:25 ERROR 404: Not Found.

Theoretically this command should work but it doesn´t.

Can anybody help?

Many thanks in advance.
Greetings.
Rosika :slightly_smiling_face:

1 Like

Could you create a private post for testing purposes, so you are able to share the whole command including the whole URL? Then we can test it, too.

2 Likes

i tried the same with a page from personal messages. i tried my login email as well as username and even http instead of https, but all with the same result: 404 not found.

2 Likes

Never send credentials over HTTP. :laughing:

1 Like

Hi @Akito:

thanks.
I created a private post here:https://itsfoss.community/t/test-file-for-wget/5209 .

So the respective command would have to be:

Hi cord,

thanks a lot for the confirmation. So I´m not alone with that problem.

1 Like

additional info:

I also tried to solve the problem with lynx-browser and posted it to the lynx mailing list.
Here´s the original message:

I often use lynx for documentation purposes by saving forum-pages as a “.txt-file” using the “-dump”-option like so:

lynx -dump "https://itsfoss.community/t/save-changes-on-ubuntu-live-usb/5189/7" > file.txt

Normally that works very well but I can´t use this command if the page is password-protected.
The resulting .txt-file yields among other info:
“Oops! That page doesn’t exist or is private.”
I have got the credentials (user name and password), so that should be no problem.
But I´m at a loss as how to implement them in the command. I´m not sure about the syntax.

Up and until now they couldn´t help me either.

Greetings
Rosika :blush:

1 Like

The problem is that you need a more specific request. Here is what works for me, but won’t work for you, because you need to gather your own HTTP request for this to work:

curl 'https://itsfoss.community/t/test-file-for-wget/5209' -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:79.0) Gecko/20100101 Firefox/79.0' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8' -H 'Accept-Language: en,en-US;q=0.5' --compressed -H 'DNT: 1' -H 'Connection: keep-alive' -H 'Cookie: _t=41d5de726ab37ca9f8e1801c3c713227; _forum_session=UWZFM1RvUFBIekRYUE4yRGRNZlhOdjZCclNEZERHSHJpSFJwLzJ0M2FiZmZlWjNQb3dKTFFYZTgzWG1CY3pmbmZRUVczMGhNRXhOdTRnYjNsME5YVGNVNTZuTzJ4KzFlbStJZ09WSkdtd0Z4bG93UHFrdnAzb0NzYmF3QW1rekZYdlI1dHBJQ0QrdzV4SGlibDlGSnowdXpBbDdxTFhQSkJSUVpXMVpGQmxvVVdWZ1B2aVEvdGhtUWdYekZBWmh0L3RJNzl6cUVVRTFQQ3dJRTQ2T2tJcDRFUzA3SnJwdHRGRC9DUHIwL1hnN3BLZ3cwRzdYK0hpTy9KQzVCejR4SklhekwxQkltblFtSVB3NVFaMlE5NEJPRnJXVlpWdmd3a3M1cnpKU0NLcWh4QVg1QmdlY2s3ckxvdjBYK0lRYklaVytKb3YyZHlscCt4WFhSOUk4YTBLYkptRHRoR3JNSGlQeW4reTRKbUNubW5HSVBFRHI2TVVUMGV2YU43YWZjWnU5VFFxQ29ocGFOTm1EdUxheFl1YSszdE9ROEo1cVkzUUdJWUoyQ0REQWRtK0lxL1k0cE1uREZMQ0tHYnJVUy0tUTZqSGVjdWRMK0lETEJURFJQaXFqdz09--c7dba8478aab6a0acc5d1e5f15280747bc52c797' -H 'Upgrade-Insecure-Requests: 1' -H 'Pragma: no-cache' -H 'Cache-Control: no-cache'

Easier to read:

curl ‘https://itsfoss.community/t/test-file-for-wget/5209’ -H ‘User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:79.0) Gecko/20100101 Firefox/79.0’ -H ‘Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,/;q=0.8’ -H ‘Accept-Language: en,en-US;q=0.5’ --compressed -H ‘DNT: 1’ -H ‘Connection: keep-alive’ -H ‘Cookie: _t=41d5de726ab37ca9f8e1801c3c713227; _forum_session=UWZFM1RvUFBIekRYUE4yRGRNZlhOdjZCclNEZERHSHJpSFJwLzJ0M2FiZmZlWjNQb3dKTFFYZTgzWG1CY3pmbmZRUVczMGhNRXhOdTRnYjNsME5YVGNVNTZuTzJ4KzFlbStJZ09WSkdtd0Z4bG93UHFrdnAzb0NzYmF3QW1rekZYdlI1dHBJQ0QrdzV4SGlibDlGSnowdXpBbDdxTFhQSkJSUVpXMVpGQmxvVVdWZ1B2aVEvdGhtUWdYekZBWmh0L3RJNzl6cUVVRTFQQ3dJRTQ2T2tJcDRFUzA3SnJwdHRGRC9DUHIwL1hnN3BLZ3cwRzdYK0hpTy9KQzVCejR4SklhekwxQkltblFtSVB3NVFaMlE5NEJPRnJXVlpWdmd3a3M1cnpKU0NLcWh4QVg1QmdlY2s3ckxvdjBYK0lRYklaVytKb3YyZHlscCt4WFhSOUk4YTBLYkptRHRoR3JNSGlQeW4reTRKbUNubW5HSVBFRHI2TVVUMGV2YU43YWZjWnU5VFFxQ29ocGFOTm1EdUxheFl1YSszdE9ROEo1cVkzUUdJWUoyQ0REQWRtK0lxL1k0cE1uREZMQ0tHYnJVUy0tUTZqSGVjdWRMK0lETEJURFJQaXFqdz09–c7dba8478aab6a0acc5d1e5f15280747bc52c797’ -H ‘Upgrade-Insecure-Requests: 1’ -H ‘Pragma: no-cache’ -H ‘Cache-Control: no-cache’

2 Likes

Hi @Akito:

Impressive command.
How did you come by it?

I don´t quite understand.
The url is https://itsfoss.community/t/test-file-for-wget/5209 , or am I wrong here?

Greetings.
Rosika :blush:

1 Like

That’s the URL. But technically you are making a GET request and this can feature and even require more information than just the destination URL.

Press F12 when you are on the private message. Then go to the Network tab and look for the request initiated by BrowserTabChild*. Right-click the request and Copy as cURL (POSIX). Then paste & execute the acquired command in your terminal.

1 Like

Thanks @Akito,

currently I´m using chromium and cannot follow your pocedure.
I´ll give firefox a try.

1 Like

Hi again,

I´ve been trying to reproduce your instructions but wasn´t really successful (neither with chromium nor with firefox).

Never mind.
I think this - even if I could manage - wouldn´t be target-oriented for my purposes as I would have to log in via another browser (like chromium) first in order to obtain the command parameters with “Copy as cURL”.

In that case I could copy the contents of the page manually.

Perhaps I´ll still get something from the lynx-people…

But thanks a lot for your help anyway.

Greetings.
Rosika :slightly_smiling_face:

1 Like

There are probably programmatic ways of achieving what you want.

https://docs.discourse.org/

1 Like

Hi @Akito,

thanks a lot for the link. I´ll look into it.

Yet it keeps me bugging that I cannot get your curl-command working the way you did.
Despite what I said previously I´d like to be able to use that as well.

Therefore let me ask a bit further, if you don´t mind. :blush:

  • What browser did you use for procuring that long option for curl?
  • Why wouldn´t your exact command work with me?
    I.e.: Why would I have to use a HTTP request of my own?

Thanks for your help.

Greetings.
Rosika :slightly_smiling_face:

1 Like

itsfoss_curl

Because you need your own session_id to get permission to view the request. Every user connected here has their own id and is permitted to view certain parts of this forum, depending on the permissions associated with their accounts and therefore with their session_id.

3 Likes

Hi @Akito,

first of all: thank you very much for that wonderful explanatory gif you created. What an excellent idea. :+1:

Finally I managed to follow your instructions. Yet I needed firefox to do this. Couldn´t reproduce it with chromium.

O.K. Thanks. Really didn´t know that.

So now that I managed to download the page as per your instructions I put a redirect at the end of the command (“> output.html”) to save it as an html-file.
Yet opening that one in a browser simply displayed an empty page!

So I tried a redirect to text (“> output.txt”).
Here the contents of the page are displayed but the output is pretty messed up as everything the html-file consists if of is shown.

But at least I got the command working which is success indeed.

Thanks for your great effort ad help.

Greetings.
Rosika :slightly_smiling_face:

1 Like

Instead of redirection, you should try the o or O option from curl.

https://linux.die.net/man/1/curl

1 Like

Thanks @Akito,

I tried the o and O option.
[command] -o "file.txt" yielded the same cluttered text output as the redirect did. So no improvement there.

And [command] -O this time produced an html file that instead of displaying nothing (like my redirect did) showed the following:

Sign Up Log In

Oops! That page doesn’t exist or is private.

Popular […]

Strange…
Greetings.
Rosika :slightly_smiling_face:

1 Like

It means you weren’t authenticated properly to view the message. Make sure your session_id is valid by getting the newest one in the way that is shown in the Gif.

2 Likes

H again,

O.K. I did that.
Applying [command] -O resulted in getting an html-file named “5209”. It consists of 86,3 kB data according to my file-manager.
But opening it in the browser presents an empty page once again. :slightly_frowning_face:

Still thanks a lot.

Greetings.
Rosika :slightly_smiling_face: