Counting of characters in a text-file

Hi all, :wave:

I´m getting a bit confused about word count, or rather letter count. :blush:

Example:

I´ve created the following text file:

Hi, this is an experiment.

and saved it in versuch.txt .

This file consists of 20 letters plus a comma and a full-stop (in total 22 characters without the blank spaces).

I want to count the characters without the blank spaces.

On https://www.lostsaloon.com/technology/how-to-count-characters-words-and-lines-in-a-file-in-linux/ I found some help for achieving this goal.

Yet there still seem to be one issue left: :thinking:

  • wc -m versuch.txt
    and
    wc -c versuch.txt

give me 27 as a result. Why might that be? I know here the blank spaces are counted as well; but there are just 4 of them. So that should be 26 instead of 27, right :question:

  • Here I get the same discrepancy:

cat versuch.txt | tr -d [:blank:] | wc -m

Without the blank spaces, i.e. letters and comma and full-stop there should be 22 visible characters, but the result is 23.

Does anyone of you have any idea why the count doesn´t seem to be correct (one off) :question:

Thanks a lot in advance.

Many greetings from Rosika :slightly_smiling_face:

It may be counting the linefeed at end of each line

If you want to see all the characters use od - a

2 Likes

Linefeed looks likely to me. I used xxd to display the file in hex and there is a trailing linefeed.

$ xxd versuch.txt 
00000000: 4869 2c20 7468 6973 2069 7320 616e 2065  Hi, this is an e
00000010: 7870 6572 696d 656e 742e 0a              xperiment..
1 Like

Hi again, :wave:

@nevj:

Thanks.

Well, I tried od -a . According to the man-pages that should be it. :thinking: Hope it´s alright this way…

Here´s the result:


0000000   H   i   ,  sp   t   h   i   s  sp   i   s  sp   a   n  sp   e
0000020   x   p   e   r   i   m   e   n   t   .  nl
0000033

I guess the “nl” part means newline:question:
That would count as another character then, right?

@pdecker:

Thanks to you as well.

Right, I got the same result.
I noticed two full-stops at the end here. Sure, this way it´s 23 instead of 22 visible characters.
Again a sign of newline perhaps… :question:

Thanks a lot to both of you. :heart:

Many greetings from Rosika :slightly_smiling_face:

2 Likes

The xxd command shows the hex representation on the left and text on the right. What looks like two dots at the end on the right is really 2e 0a in hex. That last character is a newline. In Windows you’d normally see a carriage return and line feed pair (13 10 decimal or 0D 0A in hex).

I thought it was kind of odd VI added the newline since I didn’t hit enter in the editor. I just typed in the text and saved it.

1 Like

That is odd. I want to look at that

@Rosika Yes nl is a newline character. Also called linefeed , , or \n.
It counts as a character.
Files made in Windows have carriage return and linefeed… 2 extra characters.
It dates back to the days of teletypes

Cheers
Neville

2 Likes

Tried it. You are right, it adds a newline automatically.
If I type in a ‘return’ ( ie enter key), it adds another newline, and I get this

nevj@trinity:~$ od -a test2.txt
0000000   a   b   c  sp   d   e   f  nl  nl
0000011

I dont like that. Doing things behind my back

1 Like

Hi all, :wave:

thanks for your latest comments.

Quite.
I used featherpad for the creation of the text but otherwise did it just like you. Basically there shouldn´t have been a newline character, I guess. :thinking:

Right you are. :wink:

Thanks for the confirmation. :+1:

I´ve now tried out the xxd command and indeed arrived at this:

I typed “Hallo” (with variations) 4 times:

  • 1.) “Hallo” with no carrage return. # Had to punch in CTRL - D two times to quit though
  • 2.) “Hallo” with carriage return # Here I got the dot for newline, tnx to @pdecker´s explanation
  • 3.) “Hallo.” # Had to punch in CTRL - D two times to quit
  • 4.) “Hallo.” with carriage return # two dots, the last for newline

Well, I think all´s clear now.

Thanks so much to @nevj and @pdecker for teaching me something new again. :wink:

Many greetings from Rosika :slightly_smiling_face:

1 Like

The most precise solution would be to explicitly whitelist characters which are supposed to be counted. For example, whitelist all German characters, plus punctuation, etc. Then, you wou will get precisely the character count you are looking for.

2 Likes

Thanks @Akito for the suggestion.

Great idea. I´ll still have to find out how to do it. But this should be the perfect solution. :+1:

Many greetings
Rosika :slightly_smiling_face: