So - I had a un-DRM copy of an ebook I own…
But it was annoying - the author, an English language author, writing in English, chose to put accents on vowels and consonants in character names - I find this nearly as annoying as authors (science fiction / fantasy) authors using apostrophes in character or place names to make them sound more “alien”…
Not just umlauts either - things like a circumflex (caret) above vowels… lots of it - too much… I suppose I’m disrespecting the author - but I dunno… But I wouldn’t continue turning pages if I had to suffer that annoyance. AFAIK names are often just made-up gobbledy-gook phonetic syllables smashed together to sound foreign or alien… I don’t like it… Names are actually words and have meanings…
I guess I could even - if bothered enough - use sed to replace all entries with some gobbledy-gook name, with Frederick or Bartholemew - but that would going a tad too far I reckon
So - thought - how do I get rid of these accents?
DRM epubs files are just zip files containing html.
HTML is just text.
So - unzip them into a folder - then use the “recode” utility (sudo apt install recode
) :
recode -f utf8..flat < input.html > output.html
So - create a loop
for F in *.html ; do recode -f utf8..flat < $F > $F.recoded
rm *.html
rename 's/.recoded//' *.recoded
zip ../BookTitle.epub *
That “rename” above is perl rename, sometimes called “prename” to avoid confusion with the rename utility that RedHat based distros default to.
Love the shell!
Note : if it was a historical novel and was using the correct spelling with accents, of e.g. French people, or Germans - it wouldn’t bother me. Heck - even Tolkien did it sometimes and that annoyed me - but - he was a linguist or language scholar…
P.S. I learned about “recode” here : https://unix.stackexchange.com/questions/631652/remove-accents-from-characters