Text-to-speech script

Rosika · August 28, 2023, 4:45pm

Hi all,

this is a follow-up post to Accessibility in Linux .

We talked about accessibility in Linux distros and how certain implementations could be of some help.

@Mina commented:

Scripts are always appreciated.

So I´ll post the script tts_script_7_deLuxe_focal.sh which I wrote some time ago…
… in case anyone might be interested in a TTS script that produces a (hopefully) non-robotic-like output.

#!/bin/bash

# tts-programm with choice for ENG and GER language texts and GER faster speed
# Di 26_Feb_2019 version 4
# new function: fastd deletes full stop, square brackets (for wikipedia)
# new function: addition of  equalizer-function for crisper audio
# new function: addition of eng_mod: deletes square brackets for ENG

# Funktionsdefinitionen

schnellerd() {
	pico2wave -l=de-DE -w=/tmp/test.wav "$(cat ${1})"
	firejail --whitelist='/tmp/*.wav' mplayer -af scaletempo=scale=1.15:speed=pitch,lavcresample=44100,equalizer=0:0:0:0:0:0:0:0:7:7 -srate 44100 /tmp/test.wav
rm /tmp/test.wav
}

deu() {
	pico2wave -l=de-DE -w=/tmp/test.wav "$(cat ${1})"
	firejail --whitelist='/tmp/*.wav' mplayer -af lavcresample=44100,equalizer=0:0:0:0:0:0:0:0:7:7 -srate 44100 /tmp/test.wav
rm /tmp/test.wav
}

eng() {
	pico2wave -l=en-US -w=/tmp/test.wav "$(cat ${1})"
	firejail --whitelist='/tmp/*.wav' mplayer -af lavcresample=44100,equalizer=0:0:0:0:0:0:0:0:7:7 -srate 44100 /tmp/test.wav
rm /tmp/test.wav
}

fastd() {
	cat > /tmp/kgw1.txt
	sed 's/\.//g;s/\,//g;s/\[//g;s/\]//g' /tmp/kgw1.txt > /tmp/kgw2.txt
	pico2wave -l=de-DE -w=/tmp/test.wav "$(cat /tmp/kgw2.txt)"
	firejail --whitelist='/tmp/*.wav' mplayer -af scaletempo=scale=1.15:speed=pitch,lavcresample=44100,equalizer=0:0:0:0:0:0:0:0:7:7 -srate 44100 /tmp/test.wav
rm /tmp/test.wav
rm /tmp/kgw1.txt
rm /tmp/kgw2.txt
}

eng_mod() {
		cat > /tmp/kgw1.txt
		sed 's/\[//g;s/\]//g' /tmp/kgw1.txt > /tmp/kgw2.txt
		pico2wave -l=en-US -w=/tmp/test.wav "$(cat /tmp/kgw2.txt)"
		firejail --whitelist='/tmp/*.wav' mplayer -af lavcresample=44100,equalizer=0:0:0:0:0:0:0:0:7:7 -srate 44100 /tmp/test.wav
rm /tmp/test.wav
rm /tmp/kgw1.txt
rm /tmp/kgw2.txt
}		

Hauptskript() {
echo "please choose desired language:"
read -p "English=1, German=2, faster German=3, very fast German=4, eng_mod=5    " auswahl
		case "$auswahl" in
			1) eng
			;;
			2) deu
			;;
			3) schnellerd
			;;
			4) fastd
			;;
			5) eng_mod
			;;
			*) echo "programme will shut down."	   
		esac
sleep 2.0
}

# Hauptteil
while true
	do
		read -p "start programme? Input:  j, n     "   a
		if [ "$a" == "j" ]
			then
				Hauptskript				
		else 
			echo "programme will shut down." 
			sleep 2.0; exit
		fi
	done
sleep 2.0

Manual:

After staring the script in a terminal…

Just choose “j” for starting the script, or “n” for aborting.
Choose “5” (eng_mod). This will be the best option for English speaking folks.
Copy any text into the terminal with “CTRL+SHIFT+V”, then: “Enter”
then: “CTRL+D”
The text will be played
“start programme”: “n” will shut the script down.

Dependencies:

pico2wave:
sudo apt-get install libttspico-utils sox
mplayer
firejail (optional). If the sandbox is not wanted or needed then please re-write the script …
… e.g. delete entries like firejail --whitelist='/tmp/*.wav'

Many greetings from Rosika

Mina · August 28, 2023, 7:24pm

This is nice!

I will try it out, later today or tomorrow and will let you know, what I think!

Mina · August 29, 2023, 10:13am

This is nice and dead easy to handle (good job, Rosika) and I like the integration of different speeds and the equalizer, yet to my ears, the output of pico2wave is still very unconvincing.

Already about 20 years ago, I thought, the tech to have a computer to read texts in a natural sounding voice, like the one in Star Trek, The Next Generation, wasn’t too far in the future. I actually would have loved to have it talk like Iris Lettieri, the world-famous voice of Rio de Janeiro’s international airport for many years.

As it looks now, this is still far in the future.

nevj · August 29, 2023, 12:41pm

There must be someone who can pass a voice stream through DSP filters and turn it into a particular type of voice.
Its just a math problem.

Mina · August 29, 2023, 12:53pm

Of course, the technology exists and is used in films and more and more by scammers and fraudsters¹.

Still, it remains outside the realm of the normal user. There is also the question of personal data:
Our voice is a unique identifier of our person. Replicating it, is like replicating your fingerprints.

¹That’s why, if nowadays you get a phone call from family member, a relative or a friend, asking you for personal information or money, always contact them by other means or find another way to make sure it’s them. Phone numbers can be faked, and voices too.

nevj · August 29, 2023, 1:20pm

So if it is everywhere except on a normal users desktop, all the damage is done, they may as well let everyone have it.
How much effort to reengineer it?

Rosika · August 29, 2023, 1:33pm

Hi Mina,

thanks for trying it out, and thanks for the praise indeed.

O.K., it may be a matter of opinion and personal taste.
To me, however, pico2wave is a vast improvement over espeak and orca.

I´ve been using espeak and gespeaker for Swedish texts in the past (pico2wave doesn´t cover that) and these are really hard to understand.

So, at least to my ears, pico2wave is way better. Accordingly I considered it to be worth integrating into my script.

Thanks again, Mina.

Many greetings from Rosika

Mina · August 29, 2023, 3:26pm

Oh, you’re absolutely right on this. The improvement is indeed huge. The speech is very clear and understandable. No doubt!

Probably, my expectations are just too high.

Rosika · August 29, 2023, 3:31pm

Thanks, Mina, for your reply.

… and thanks indeed for your confirmation of this.

Well, one´s expectations can never be too high in this technologically advanced world of ours, I guess.
Just think of what else is already possible… .

Many greetings from Rosika

Mina · August 29, 2023, 3:36pm

Actually: A lot.

Good speech models need a huge amount of training data (and computing power), which is hard to come by, unless of course, you’re Amazon, Apple, Google, Microsoft, etc. and you’ve got millions of users happily giving you their voices for free.

nevj · August 30, 2023, 12:07am

If they are doing it that way, it is not really scientific at all.
We need a theory of speech that explains how to construct a voice signal and how to apply variations