Questiion regarding potential length of input into a variable

Hi all, :wave:

I once wrote a tts-script (text-to-speech) which covers the languages English and German (with variations).
For that I was making use of the command pico2wave.

Now I´d like to implement Swedish in my script as well but pico2wave doesn´t cover that language. It´s a bit of a shame though as pico2wave
delivers really high quality of natural sound reproduction; not mechanical at all. :+1:

So for Swedish I´d like to implement espeak.
The voives here (f/m) sound mechanical indeed but may be tolerated due to the fact
that I tend to read along the respective text parallel to the output of the script. :blush:

For testing purposes I wrote this reduced simple script:

#!/bin/bash

read -p "Please enter text:   " input
echo $input > /tmp/kgw.txt
firejail espeak -s 160 -a 198 -vsv+f2 -f /tmp/kgw.txt

It works well, yet I have one question left to ask:

Is there a limit of how long (i.e. how many words or how many characters) the text written into the variable “input” may be? :thinking:

I tried it with a short text:

Välkommen till ett exklusivt event som samlar beslutsfattare och strateger inom it-säkerhet.
Ta del av erfarenhet och råd från experter, var med och diskutera med andra som har liknande utmaningar och
inspireras av nya insikter. På Security Day ryms såväl strategiska frågor som praktiska utmaningar.

and it worked without any hiccups. But that´s still a pretty short text.

Many thanks in advance for your help.

Many greetings.
Rosika :slightly_smiling_face:

EDIT:

Seems I spoke too soon. :frowning_face:

There seems to be an additional problem withe the variable:

Writing into the variable by copy-and-paste works well as long there are no empty lines and in fact even no new lines.
So from the following text

Jag tycker om dig. Jag tycker om dig.
Jag tycker om dig.

Jag tycker om dig.

Only the first two instances of “Jag tycker om dig.” are written into the text-file:

cat /tmp/kgw.txt
Jag tycker om dig. Jag tycker om dig.

and thus played back. :frowning_face:

So it seems there´s still some work to be done as far as variable input is concerned.

Many greetings.
Rosika :slightly_smiling_face:

Hmm,

I tried the following command:

cat prov1.txt | tr -d '\n' > oneline.txt

whereby prov1.txt consists of:

Jag tycker om dig. Jag tycker om dig.
Jag tycker om dig.

Jag tycker om dig.

oneline.txt now consists of

cat /tmp/kgw.txt
Jag tycker om dig. Jag tycker om dig.Jag tycker om dig.Jag tycker om dig.

That´s still not good enough as the tr-command doesn´t leave a whitespace after the preceding period. That still has to be accounted for as otherwise the period is spelled out by espeak. :thinking:

Many greetings
Rosika :slightly_smiling_face:

Perhaps I´ve got it now:

  • original prov1.txt:

Jag tycker om dig. Jag tycker om dig.
Jag tycker om dig.

Jag tycker om dig.

  • changing it into oneline.txt:
    cat prov1.txt | tr -d '\n' > oneline.txt
    yields:

Jag tycker om dig. Jag tycker om dig.Jag tycker om dig.Jag tycker om dig.

  • inserting an additional space after period:
    sed 's/\./\. /g' oneline.txt > oneline2.txt

The result would be:

cat /tmp/oneline2.txt
Jag tycker om dig. Jag tycker om dig. Jag tycker om dig. Jag tycker om dig.

Seems to work now. :blush:

Many greetings from Rosika :slightly_smiling_face:

Uh,
there still seems to be a problem when there is an empty line in the original text.
In this case only this very first line is processed and all the subsequent ones are discarded. :frowning_face:

Perhaps writing a longer text into a variable turns out to be a bit too complicated for this project of mine. :thinking:

Many greetings.
Rosika :slightly_smiling_face:

I think it would depend on buffers in memory of the espeak program.

It is possible for a program to read an almost infinite input stream, as long as it processes each bit of data on the fly then discards it. However if it stores the stream, there are limits.

File size limit is hardly likely to enter into it, but that is set in ulimit and is probably set to unlimited, so it would just depend on disk space.

Have you heard of DSP (digital,signal processing). It does stuff like that on endless input strings in real time.

Interesting issue
Neville

1 Like

Hi Neville, :wave:

thanks a lot for your valued opinion once again. :heart:

First of all:

I think I found a good - and very simple - solution to my problem. I guess I was thinking “around too many corners”. :blush:

Making use of the cat-command takes care of my input in the script. No need to put a lengthy string into a variable.

My script looks like this now:

#!/bin/bash

echo "Please enter text:"
echo
cat > /tmp/kgw.txt
firejail espeak -s 160 -a 198 -vsv+f2 -f /tmp/kgw.txt

So I just have to copy-and-paste any (even longer) text into the terminal.
To signal and end to the input and proceed with the rest of the script: enter CTRL+D.

The rest is this:

The input is redirected to the text-file /tmp/kgw.txt.
From there espeak takes over and reads the text with a female Swedish voice (though somewhat “metallic”), a bit slower than standard speed (160 instead of 175) and a bit louder than standard (198 instead of 100). :slightly_smiling_face:

No need to cater for whitespaces or empty lines in the text. Everything´s processed perfectly by default.

Secondly:

I see. Thanks for the info.

O.K.

The original problem seems to have been residing in the fact that a variable isn´t well suited for my purposes here due to the fact that as soon as an empty line enters the stage there will be problems (the rest of the text would be discarded, it seems).

Well, I´m glad that cat seems to be better (if not perfectly) suited for the purpose. :blush:

I guess at some point in the past I heard of it but surely never looked into the matter.
Out of interest (and time permitting) I will endeavour to do so.

Thanks a lot for the hint and thanks for your help in general. :+1:

Many greetings
Rosika :slightly_smiling_face:

You need a modified cat that deletes empty lines
Maybe you could use awk or sed … they are filters like cat but they can modify the input stream before passing it on.

If you are interested I can rake up some awk examples … old stuff, I cant remember it, will have to search.

What you are doing is basically DSP. That is what it is about … modifying a continuous input stream with software. It is mostly concerned with communications and filtering out noise.

Your scripting skills are quite good. You would cope with awk or sed.

Regards
Neville

PS This came from wikipedia

 For example, the following uses the d command to filter out lines that only contain spaces, or only contain the end of line character:

sed ‘/^ *$/d’ inputFileName

1 Like

Hi again Neville, :wave:

thanks for you latest post.

O.K. That got me thinking.

For experiment´s sake I went to the site Security Day 2022 - Computer Sweden and provided the following text to the script:

En dag om it-säkerhet – hot, lösningar och strategier

Välkommen till ett exklusivt event som samlar beslutsfattare och strateger inom it-säkerhet. Ta del av erfarenhet och råd från experter, var med och diskutera med andra som har liknande utmaningar och inspireras av nya insikter. På Security Day ryms såväl strategiska frågor som praktiska utmaningar.

Det här är ett måste-event för dig som är CISO, CSO, CIO eller liknande, där du får höra mer om de senaste hotbilderna och får ta del av råd och andras erfarenheter kring säkerhetsstrategier och skydd.

Bland ämnena vi vill diskutera finns:

Aktuella hotbilder i en osäker omvärld
Ransomware angår alla – strategier, lärdomar och praktiska råd
Resilience – så rustar du en motståndskraftig organisation
Att säkra ett land – Sveriges cybersäkerhet i fokus
Desinformation och social ingenjörskonst – mjuka delar som kräver hårt motstånd
Efter Log4j: Open source – mindre säkert än du tror?
Kompetensbristen – så lockar du säkerhetsproffsen
Komplexitetspusslet – säkerhet när både data och medarbetare sprids ut
Krishantering – så agerar du som ledare när det värsta händer
Security by design – bygg säkert från början
Zero trust på riktigt – vem litar du på?
Så kan artificiell intelligens öka säkerheten

As can be seen there are several empty lines within the text which are accordingly taken over by the script.
A look at cat /tmp/kgw.txt confirmed it.

The script read out the text in Swedish without any hiccups and omissions.
I guess espeak can deal with whitespaces. :slightly_smiling_face:
It seems it was just the variable-variant which couldn´t work well with them (at least not without some processing first) … :thinking:

So normal cat seems to work with espeak .

Yes, I´m still interested in those. Thanks so much. :heart:

Up to now I´ve sometimes used sed, so awk examples wolud be quite interesting. :smiley:

I see. Good to know. Thanks for providing a concise definition.

Many greetings.
Rosika :slightly_smiling_face:

1 Like

Some examples I created:

@nevj and @Akito

Hi again, :wave:

sed ‘/^ *$/d’ kgw.txt # kgw.txt holding my example from above

I tried it but for the life of me couldn´t get it to work. It kept failing :slightly_frowning_face: .

env LANG=en_US.UTF-8  sed ‘/^ *$/d’ kgw.txt
sed: -e expression #1, char 1: unknown command: `�'

I have no idea why that is… :thinking:

@Akito:

thanks for this great link of yours.
Yet it seems I might not be up to the challenge. That´s certainly much too complicated for me. :blush:

Many greetings to you all.
Rosika :slightly_smiling_face:

That space between carat and star may be the problem
or try <kgw.txt

I will have a go in the pc tomorrow
Cheers
neville

1 Like

Hi Neville, :wave:

Thanks a lot. But no need to hurry.

Seems likely I won´t be online tomorrow anyway. Sorry. :frowning_face:

Yet - just to let you know - I tried the other variants you suggested. But they kept failing as well:

env LANG=en_US.UTF-8  sed ‘/^*$/d’ kgw.txt
sed: -e expression #1, char 1: unknown command: `�'

env LANG=en_US.UTF-8  sed ‘/^ *$/d’ <kgw.txt
sed: -e expression #1, char 1: unknown command: `�'

Many greetings from Rosika :slightly_smiling_face:

OK, let me sleep on it.
Something to do with utf8 maybe?
Our internet may be out tomorrow too. Phone line repairs after floods.

Neville

1 Like

There are some really useful things there.
Never used bash functions, but I like it
Neville

It is a shell problem.
Take the file

[nevj@trinity ~]$ cat txt
bs

dfgh
gh

Then in the bash shell try

[nevj@trinity ~]$ sed "/^ *$/d" txt
bs
dfgh
gh
[nevj@trinity ~]$ sed ‘/^ *$/d’ txt
sed: -e expression #1, char 1: unknown command: `�'

So in bash you need double quotes
But go to the Bourne shell

[nevj@trinity ~]$ sh
$ sed "/^ *$/d" txt
bs
dfgh
gh
$ sed '/^ *$/d' txt        
bs
dfgh
gh

and you can use either single or double quotes
The space between ^ and * is needed.

Regards
Neville

Awk examples

The philosophy of awk is “give me a command and I will do it for every line of the input file”.
Elementary example

[nevj@trinity ~]$ cat rmnull.awk
{
if ($1 != NULL)
  print
}

That is an awk script. It says “if the first field of a line is not NULL print the line”
To run the script do

[nevj@trinity ~]$ awk -f rmnull.awk txt
bs
dfgh
gh

So it removes blank lines too. Works in either shell

Another example

[nevj@trinity Awk]$ cat sum.awk
        { s = s + $1 }
END     { print s } 

This sums the first data field over all lines and prints just the sum. The END part defines something that is done after the file is processed. There is also a BEGIN.

Another case

[nevj@trinity Awk]$ cat cut.awk
{print $4,$3,$10,$11,$12,$13,$14,$15,$16,$17 }

This selects some fields and reorders them.

Awk scripts can be more complicated than these examples.

Regards
Neville

I personally wouldn’t recommend this classic shell. Bash is already a masochist’s dream and the classic shell is pretty much the same, except its whips are made out of steel rather than leather.

The old recommendation used to be… use csh for interactive shell and sh for scripts. What is it today?
I rather like csh

Neville

Everyone uses Bash, except they are not able to use a recent or any Bash (Mac OSX, *BSD, etc…).

well Void is peculiar.
It uses bash for users, but drops back to sh for superuser?