So IĀ“ll post the script tts_script_7_deLuxe_focal.sh which I wrote some time agoā¦
ā¦ in case anyone might be interested in a TTS script that produces a (hopefully) non-robotic-like output.
#!/bin/bash
# tts-programm with choice for ENG and GER language texts and GER faster speed
# Di 26_Feb_2019 version 4
# new function: fastd deletes full stop, square brackets (for wikipedia)
# new function: addition of equalizer-function for crisper audio
# new function: addition of eng_mod: deletes square brackets for ENG
# Funktionsdefinitionen
schnellerd() {
pico2wave -l=de-DE -w=/tmp/test.wav "$(cat ${1})"
firejail --whitelist='/tmp/*.wav' mplayer -af scaletempo=scale=1.15:speed=pitch,lavcresample=44100,equalizer=0:0:0:0:0:0:0:0:7:7 -srate 44100 /tmp/test.wav
rm /tmp/test.wav
}
deu() {
pico2wave -l=de-DE -w=/tmp/test.wav "$(cat ${1})"
firejail --whitelist='/tmp/*.wav' mplayer -af lavcresample=44100,equalizer=0:0:0:0:0:0:0:0:7:7 -srate 44100 /tmp/test.wav
rm /tmp/test.wav
}
eng() {
pico2wave -l=en-US -w=/tmp/test.wav "$(cat ${1})"
firejail --whitelist='/tmp/*.wav' mplayer -af lavcresample=44100,equalizer=0:0:0:0:0:0:0:0:7:7 -srate 44100 /tmp/test.wav
rm /tmp/test.wav
}
fastd() {
cat > /tmp/kgw1.txt
sed 's/\.//g;s/\,//g;s/\[//g;s/\]//g' /tmp/kgw1.txt > /tmp/kgw2.txt
pico2wave -l=de-DE -w=/tmp/test.wav "$(cat /tmp/kgw2.txt)"
firejail --whitelist='/tmp/*.wav' mplayer -af scaletempo=scale=1.15:speed=pitch,lavcresample=44100,equalizer=0:0:0:0:0:0:0:0:7:7 -srate 44100 /tmp/test.wav
rm /tmp/test.wav
rm /tmp/kgw1.txt
rm /tmp/kgw2.txt
}
eng_mod() {
cat > /tmp/kgw1.txt
sed 's/\[//g;s/\]//g' /tmp/kgw1.txt > /tmp/kgw2.txt
pico2wave -l=en-US -w=/tmp/test.wav "$(cat /tmp/kgw2.txt)"
firejail --whitelist='/tmp/*.wav' mplayer -af lavcresample=44100,equalizer=0:0:0:0:0:0:0:0:7:7 -srate 44100 /tmp/test.wav
rm /tmp/test.wav
rm /tmp/kgw1.txt
rm /tmp/kgw2.txt
}
Hauptskript() {
echo "please choose desired language:"
read -p "English=1, German=2, faster German=3, very fast German=4, eng_mod=5 " auswahl
case "$auswahl" in
1) eng
;;
2) deu
;;
3) schnellerd
;;
4) fastd
;;
5) eng_mod
;;
*) echo "programme will shut down."
esac
sleep 2.0
}
# Hauptteil
while true
do
read -p "start programme? Input: j, n " a
if [ "$a" == "j" ]
then
Hauptskript
else
echo "programme will shut down."
sleep 2.0; exit
fi
done
sleep 2.0
Manual:
After staring the script in a terminalā¦
Just choose ājā for starting the script, or ānā for aborting.
Choose ā5ā (eng_mod). This will be the best option for English speaking folks.
Copy any text into the terminal with āCTRL+SHIFT+Vā, then: āEnterā
then: āCTRL+Dā
The text will be played
āstart programmeā: ānā will shut the script down.
firejail (optional). If the sandbox is not wanted or needed then please re-write the script ā¦
ā¦ e.g. delete entries like firejail --whitelist='/tmp/*.wav'
This is nice and dead easy to handle (good job, Rosika) and I like the integration of different speeds and the equalizer, yet to my ears, the output of pico2wave is still very unconvincing.
Already about 20 years ago, I thought, the tech to have a computer to read texts in a natural sounding voice, like the one in Star Trek, The Next Generation, wasnāt too far in the future. I actually would have loved to have it talk like Iris Lettieri, the world-famous voice of Rio de Janeiroās international airport for many years.
Of course, the technology exists and is used in films and more and more by scammers and fraudstersĀ¹.
Still, it remains outside the realm of the normal user. There is also the question of personal data:
Our voice is a unique identifier of our person. Replicating it, is like replicating your fingerprints.
Ā¹Thatās why, if nowadays you get a phone call from family member, a relative or a friend, asking you for personal information or money, always contact them by other means or find another way to make sure itās them. Phone numbers can be faked, and voices too.
So if it is everywhere except on a normal users desktop, all the damage is done, they may as well let everyone have it.
How much effort to reengineer it?
ā¦ and thanks indeed for your confirmation of this.
Well, oneĀ“s expectations can never be too high in this technologically advanced world of ours, I guess.
Just think of what else is already possibleā¦ .
Good speech models need a huge amount of training data (and computing power), which is hard to come by, unless of course, youāre Amazon, Apple, Google, Microsoft, etc. and youāve got millions of users happily giving you their voices for free.
If they are doing it that way, it is not really scientific at all.
We need a theory of speech that explains how to construct a voice signal and how to apply variations