Problems with txt2html

Hi all, :wave:

As I need to convert some text files to html files (and keep the original structure of the text file) I installed the package txt2html.

It should work but I guess I´m not doing it right. Here´s an example:

Original text file:

cat test.txt 
Hi, this is a test.

I´m going to perform a test with txt2html.

Hope it´ll be working.

Then I performed the command

txt2html --infile test.txt --outfile output.html

on it and got the following result:

txt2html

Clearly there´s something wrong with the character set. :thinking:
Any ideas how I could get it right :question:

Many thanks in advance and many greetings from Rosika. :slightly_smiling_face:

Update:

Seems I found the correct parameter: --eight_bit_clean.

The correct command would be:

txt2html --eight_bit_clean --infile test.txt --outfile output.html.

This should produce the correct output:

kgw_new

Yes, it worked :wink:

Cheers from Rosika :slightly_smiling_face:

3 Likes

That is a strange name.
Do you know what it does?

--eight_bit_clean | -8
    If false, convert Latin-1 characters to HTML entities. If true, this conversion is disabled. (default: false) 

Are you using Latin1? Most computers today use UTF-8
Why would HTML treat an apostrophe differently?

’ is apostrophe
` is grave
I can not type left quote or right quote… not on my KB
What did you use to type that text?
It is non-ascii

3 Likes

Is that the left single quote? Mine is on the same key with tilde. As opposed to the right single quote being on the same key as double quote.

I never thought about it before. It must be the software that formats the double quote since there is only one of them.

Sheila

3 Likes

No grave is not single left quote
discourse changed what I typed

` is grave
' is apostrophe

curly quote marks are not in the ASCII caharacter set, and not on my keyboard

linux - Keyboard types diacritics (¨) instead of double quotes - Super User

If you are using HTML there are codes for curly quotes

Dont put curly quotes in code or scripts … use straight quote ( apostrophe)

This does not answer @Rosika’s question, sorry

3 Likes

Grave accent: what is this used for?

I don’t think I have ever used it, although it does look like a single quote from the left.

Thanks,
Sheila

2 Likes

Rather than use a conversion program which has these issues due to character sets.

Why not use word and simply select save as html, this gets over the problems. But it introduces others in the form of excess html code.

2 Likes

It is not an accent, it is a character. Also called a backquote.
Used

  • in shell scripts to indicate the result of a command is to be substituted
  eg     HOME = `pwd`
            ls ` cat filename`

In shellscriptspeak it is called a command substitution operator.

  • in markdown single ‘`’ and triple ‘```’ used to keep stuff verbatim
  • in maths used as a superscript to indicate transpose of a matrix
  • there must be other uses

The various types of quotes are a big problem in typesetting

4 Likes

And also “backtick”… In discource - it’s used to format monospace text (e.g. when you select a paragraph - something you might copy and past from a terminal) and pres “</>” button - if it’s a single line - it will prefix and suffix that line with the backtick / backquote char " ` " - but a whole paragraph with " ``` "…

And as @nevj mentioned - you can use it to set the output of a command into a variable - e.g. in the old KSH / Kornshell and C shell

PROG=`basename $0`

But that’s “deprecated” - in modern shells we don’t really even use the backtick (that I can think of anyway) - above now would be

PROG=$(basename $0)
6 Likes

Right, showing my age again… I still use csh.
I think it still works in bash? Does in sh.

Yes , tried this in termux

~ $ x=`pwd`
~ $ echo $x
/data/data/com.termux/files/home

I think termux is bash?

4 Likes

Yeah - it still works - I didn’t say it didn’t work - just that in “common practise” that’s considered “deprecated” usage - using `backticks` is showing your age mate :smiley:

I think the default in TermUX is bash - yes… I always switch all my stuff to ZSH… And MacOS these days defaults to ZSH…

3 Likes

Hi all, :wave:

thank you very much for so many replies. :heart:

@nevj :

Not exactly, but you looked it up in the man pages. Thanks.

Well I got inspired by the remark of a member of the ubuntuusers forum.
Another user seemd to have encountered a similar problem.

The solution/workaround was:

Try using the option --eight-bit clean, […] However, this would still not be HTML conform.

Seems to have worked well enough for me. :wink:

No, Neville. My system is set to UTF-8 as well.

The background to having to resort to text —> html is the phenomenon I encountered in the topic discussed here.

I needed to export a bunch of e-mails from thunderbird.

The addon “ImportExportTools NG”, which always worked perfectly in the past, gave me some trouble as of late. The exported e-mails (in html format) displayed a lot of weird characters.
It was pretty annyoing.

Yet I can export the e-mail in plain text format too, which works well.
But I need the final result to be displayed in html format.

txt2html does the trick, but I need the parameter --eight_bit_clean to be set. Otherwise it would introduce some weirdness itself.

I used my “Lenovo Lenovo Black Silk USB Keyboard” with QWERTZ layout, as always:


(from Tastaturbelegung – Wikipedia)

Thanks also for the links, Neville.

I never knew that.

Seems to be a difference between `

and

´
I could produce an “é” but not an “e” with grave.

Also: to produce a grave you need just one click. To produce an aigu you need two clicks.

@Sheila_Flanagan

I used the “accent aigu” from my KB:

… in the original text.

@callpaul.eu :

Thanks for the suggestion.

I just tried it with libreoffice writer. It was a short test but it could handle the “accent aigu” issue well. :+1:

@daniel.m.tripp :

Thanks, Dan, for the additional interesting infos.

Many greetings to you all.

Rosika :slightly_smiling_face:

4 Likes

Hi Rosika
I had a QWERTZ keyboard once… a machine built by Zeiss/Kontron.
Your keyboard has a key for left and right curly quotes… Now I can see how you were able to type it.
Normal US keyboards dont have that key. You would have to use hex codes.

You certainly started some discussion
Regards
Neville

3 Likes

Hi Neville, :wave:

thanks for your comments.

It´s interesting to see how different keyboards from different countries/cultures are laid out differently.

I´m still pondering about how to set a grave accent on top of an “a”. :thinking:

I guess I could install/activate the french language plugin in my “Keyboard layout extension”, accessable from the taskbar.
I do have German, Swedish and Hungarian activated so far.

But I still cannot believe an accent grave cannot be gotten hold of a simpler way.

Perhaps this way…

Many greetings from Rosika :slightly_smiling_face:

Update:

Linux:
Press and hold the Ctrl, Shift, and U keys simultaneously. Then type the
code “0300” (without quotes), which is the Unicode value for the grave accent.

(How to Type a Grave Accent Mark on Any Keyboard - The Tech Edvocate)

Type the vowel first.

a, then: ctrl+-shift+u, then 0300

gives you:

3 Likes

I never knew how to do that. Thank you
I think that would work regardless of which languages you have activated?

People should note that the above is different from the grave character, which has a key on most keyboards and is a full ascii character, not an accent.

3 Likes

Hi Neville, :wave:

You´re welcome.

But it has to be noted that this KB shortcut method unfortunately depends on the editor you´re using.

As you can see it works on discourse. But it doesn´t work in my terminal.
It would be the preferred way to acces it directly via the keyboard. But this works only with accent aigu, like: á (Pressing ´ first, then “a”).

That´s certainly the explanation of it.

Many greetings from Rosika :slightly_smiling_face:

P.S.:

Yes, I should think so as well.

3 Likes

Hi Rosika,
I am confused
I think the key your pink arrow points to is curly bracket, … it is not a grave character
You can see a grave on @Sheila_Flanagan 's keyboard at the top left…
I dont think your KB has a grave key.
Dont bother, it is not important
Regards
Neville

3 Likes

Hi Neville, :wave:

No, no. Actually the arrow points to the key which depicts the grave/aigu accents:

2kgw_new

Yes, it does. Grave and aigu are on the same key, as you can see. But they behave very differently.

Cheers from Rosika :slightly_smiling_face:

4 Likes

Couple of things

The insertion of special accented characters only works if you have an extended keyboard like on a desktop. If its a laptop then it does not work on every system

If you have another language set installed the yes typing the character then the letter will work, but again not on all.

Further option if you use a on screen keyboard holding down a key such as e will display the posible options for e with accents

Ę ě ĕ ə ẹ è é ê ë ē ė

Same with a, i, o, u, and of course
C ç ć ĉ č

But if you then convert it to html … it screws it up completely, and if you are using a browser to look at the html then it depends on the browser font and its support.

Type ology is a 3 year course at some universities!
Especially if you get into design.

In france everybody has a funny accent, except me, but i am from yorkshire where we dont need accents !

That last bit was a joke in case you did not follow !

5 Likes

Hi Paul, :wave:

thank you very much for some extended information. :heart:

It´s pretty embarrassing. I haven´t thought of that although I use onboard quite frequently (for my online-banking).
Just tried it out and, of course you´re right. I got all of these characters in a jiffy:

é è ê

Just a few examples.
Thanks for bringing it up.

Right. That´s the case with me.
Still: Accent aigu and grave behave differently. But @nevj
already provided a good explanation for that.

That´s o.k.
For demonstration purposes I just switched to Swedish:
Typing the Swedish “O”, that´s an “a” with a little circle on the top:

å, Å

… I had to hit “ü”, or “Ü” on my KB (the key to the right of “P”).

The only “problem” with that approach is you have to know the Swedish KB layout (or any other language layout for that matter) preferably by heart. :wink:

(from: Swedish - Keyboard Layout Info)

Many greetings from Rosika :slightly_smiling_face:

3 Likes

OK, I get it now.
In your original post, when you typed
I'm
you used an aigu ( I dont have one, I used single quote)
I thought you used a single curly quote.
Sorry for the confusion

I tried Paul’s key hold. It works on Android tablet.
äöü
Thanks Paul

3 Likes