Text-to-speech script

Hi all, :wave:

this is a follow-up post to Accessibility in Linux .

We talked about accessibility in Linux distros and how certain implementations could be of some help.

@Mina commented:

Scripts are always appreciated.

So IĀ“ll post the script tts_script_7_deLuxe_focal.sh which I wrote some time agoā€¦
ā€¦ in case anyone might be interested in a TTS script that produces a (hopefully) non-robotic-like output. :wink:

#!/bin/bash

# tts-programm with choice for ENG and GER language texts and GER faster speed
# Di 26_Feb_2019 version 4
# new function: fastd deletes full stop, square brackets (for wikipedia)
# new function: addition of  equalizer-function for crisper audio
# new function: addition of eng_mod: deletes square brackets for ENG

# Funktionsdefinitionen

schnellerd() {
	pico2wave -l=de-DE -w=/tmp/test.wav "$(cat ${1})"
	firejail --whitelist='/tmp/*.wav' mplayer -af scaletempo=scale=1.15:speed=pitch,lavcresample=44100,equalizer=0:0:0:0:0:0:0:0:7:7 -srate 44100 /tmp/test.wav
rm /tmp/test.wav
}

deu() {
	pico2wave -l=de-DE -w=/tmp/test.wav "$(cat ${1})"
	firejail --whitelist='/tmp/*.wav' mplayer -af lavcresample=44100,equalizer=0:0:0:0:0:0:0:0:7:7 -srate 44100 /tmp/test.wav
rm /tmp/test.wav
}

eng() {
	pico2wave -l=en-US -w=/tmp/test.wav "$(cat ${1})"
	firejail --whitelist='/tmp/*.wav' mplayer -af lavcresample=44100,equalizer=0:0:0:0:0:0:0:0:7:7 -srate 44100 /tmp/test.wav
rm /tmp/test.wav
}

fastd() {
	cat > /tmp/kgw1.txt
	sed 's/\.//g;s/\,//g;s/\[//g;s/\]//g' /tmp/kgw1.txt > /tmp/kgw2.txt
	pico2wave -l=de-DE -w=/tmp/test.wav "$(cat /tmp/kgw2.txt)"
	firejail --whitelist='/tmp/*.wav' mplayer -af scaletempo=scale=1.15:speed=pitch,lavcresample=44100,equalizer=0:0:0:0:0:0:0:0:7:7 -srate 44100 /tmp/test.wav
rm /tmp/test.wav
rm /tmp/kgw1.txt
rm /tmp/kgw2.txt
}

eng_mod() {
		cat > /tmp/kgw1.txt
		sed 's/\[//g;s/\]//g' /tmp/kgw1.txt > /tmp/kgw2.txt
		pico2wave -l=en-US -w=/tmp/test.wav "$(cat /tmp/kgw2.txt)"
		firejail --whitelist='/tmp/*.wav' mplayer -af lavcresample=44100,equalizer=0:0:0:0:0:0:0:0:7:7 -srate 44100 /tmp/test.wav
rm /tmp/test.wav
rm /tmp/kgw1.txt
rm /tmp/kgw2.txt
}		

Hauptskript() {
echo "please choose desired language:"
read -p "English=1, German=2, faster German=3, very fast German=4, eng_mod=5    " auswahl
		case "$auswahl" in
			1) eng
			;;
			2) deu
			;;
			3) schnellerd
			;;
			4) fastd
			;;
			5) eng_mod
			;;
			*) echo "programme will shut down."	   
		esac
sleep 2.0
}

# Hauptteil
while true
	do
		read -p "start programme? Input:  j, n     "   a
		if [ "$a" == "j" ]
			then
				Hauptskript				
		else 
			echo "programme will shut down." 
			sleep 2.0; exit
		fi
	done
sleep 2.0

Manual:

After staring the script in a terminalā€¦

  • Just choose ā€œjā€ for starting the script, or ā€œnā€ for aborting.

  • Choose ā€œ5ā€ (eng_mod). This will be the best option for English speaking folks.

  • Copy any text into the terminal with ā€œCTRL+SHIFT+Vā€, then: ā€œEnterā€

  • then: ā€œCTRL+Dā€

  • The text will be played

  • ā€œstart programmeā€: ā€œnā€ will shut the script down.

Dependencies:

  • pico2wave:
    sudo apt-get install libttspico-utils sox

  • mplayer

  • firejail (optional). If the sandbox is not wanted or needed then please re-write the script :wink: ā€¦
    ā€¦ e.g. delete entries like firejail --whitelist='/tmp/*.wav'

Many greetings from Rosika :slightly_smiling_face:

2 Likes

This is nice!

I will try it out, later today or tomorrow and will let you know, what I think!

1 Like

This is nice and dead easy to handle (good job, Rosika) and I like the integration of different speeds and the equalizer, yet to my ears, the output of pico2wave is still very unconvincing.

Already about 20 years ago, I thought, the tech to have a computer to read texts in a natural sounding voice, like the one in Star Trek, The Next Generation, wasnā€™t too far in the future. I actually would have loved to have it talk like Iris Lettieri, the world-famous voice of Rio de Janeiroā€™s international airport for many years.

As it looks now, this is still far in the future.

2 Likes

There must be someone who can pass a voice stream through DSP filters and turn it into a particular type of voice.
Its just a math problem.

1 Like

Of course, the technology exists and is used in films and more and more by scammers and fraudstersĀ¹.

Still, it remains outside the realm of the normal user. There is also the question of personal data:
Our voice is a unique identifier of our person. Replicating it, is like replicating your fingerprints.

Ā¹Thatā€™s why, if nowadays you get a phone call from family member, a relative or a friend, asking you for personal information or money, always contact them by other means or find another way to make sure itā€™s them. Phone numbers can be faked, and voices too.

1 Like

So if it is everywhere except on a normal users desktop, all the damage is done, they may as well let everyone have it.
How much effort to reengineer it?

1 Like

Hi Mina, :wave:

thanks for trying it out, and thanks for the praise indeed. :heart:

O.K., it may be a matter of opinion and personal taste.
To me, however, pico2wave is a vast improvement over espeak and orca.

IĀ“ve been using espeak and gespeaker for Swedish texts in the past (pico2wave doesnĀ“t cover that) and these are really hard to understand.

So, at least to my ears, pico2wave is way better. Accordingly I considered it to be worth integrating into my script. :wink:

Thanks again, Mina.

Many greetings from Rosika :slightly_smiling_face:

2 Likes

Oh, youā€™re absolutely right on this. The improvement is indeed huge. The speech is very clear and understandable. No doubt!

Probably, my expectations are just too high.

1 Like

Thanks, Mina, for your reply.

ā€¦ and thanks indeed for your confirmation of this.

Well, oneĀ“s expectations can never be too high in this technologically advanced world of ours, I guess.
Just think of what else is already possibleā€¦ :wink:.

Many greetings from Rosika :slightly_smiling_face:

Actually: A lot.

Good speech models need a huge amount of training data (and computing power), which is hard to come by, unless of course, youā€™re Amazon, Apple, Google, Microsoft, etc. and youā€™ve got millions of users happily giving you their voices for free.

2 Likes

If they are doing it that way, it is not really scientific at all.
We need a theory of speech that explains how to construct a voice signal and how to apply variations

2 Likes