Well, I then employed lynx for that matter (I´d like to stay within the terminal) and all I needed to do was hitting the “d” key from the lynx history and
I immediately got a nice html document of the respective page which I could easily save.
So lynx doesn´t seem to have any problems with downloading the page but wget doesn´t have the right permissions
That´s odd as I normally employ wget for dowloading almost anything and to the best of my recollection I never ran into any difficulties.
I am not up on user-agents… could you explain what they do?
From your output is seems they handle negotiation between your wget or browser and the website?
What is the default user-agent for wget?
They are there to introduce your browser to the server. Based on that the server may decide to serve different content: normally, say, if there’s a feature in Mozilla, which Internet Explorer does not have, the server will render a page for Mozilla that benefits that feature, but will not try to use that feature for Internet Explorer.
This requires proper cooperation between websites developer, the hosting servers maintenence team, and of course the developers of that browser…
Sometimes this all is misused, and website owners use sniffing the user agent string just to restrict acces to content for no real reason. For example, my bank considered my that time browser, Firefox ESR too old and insecure and incompatible (wich is a wrong statement), so they served me a page that reminded me to update my browser. Of course, after I changed the user agent string, everything worked smooth, which clearly prooves that being “incompatible” is a wrong statement. They sniffed the agent string, but checked only the version number, and indeed, that was way lower than a current Firefox version, but they did not make further checks, wether it runs on Windows, is it a “normal” version, or the ESR, etc.
Sometimes the website owner tries to sniff the OS version from the user agent string, so may serve the good content only for Windows users (now I can’t tell which site this was - something entertaining/streaming) worked with Windows browsers well, but not on Linux: changing user agent string to lie about the browser (pretending running on Windows) solved the problem, and it worked without problem.
Of course, if there’s a real reason (for example the site requires a widevine level which is currently not supported on Linux), the site will not work on a Linux based browser regardless of the user agent string.
You are all welcome!
…and meanwhile I could dig up in my memories that it was Disney+ which did that use-agent abuse, and there was a debate in a hungarian forum, where I got a little bit angry at D+, and looking at the “solution” wolrdwide prooved I wasn’t the only one thinking the same way:
“It’s looking more like a high level political problem than a technical one.”
Looking at possible implementation I could do this thing too with my NGINX instance:
It’s interesting to look at our web servers logs and see the mix of OS and browsers and versions. There are lots and lots of bots out there too. Googlebot, Bingbot, Slurp (Yahoo), Yandex all identify themselves with a user agent string. We use Pingdom to monitor our web sites and they have a user agent string.
Some people will be freaked out that companies are gathering information but I really think this is justified. You have to build websites with your audience in mind. The freaky thing would be tying it to you personally.
Utilities like curl and wget have their own user agent but that can be customized, obviously.