I have several tapes (yes actual cassette tapes) of my grandfather reading a novel.
Unfortunately a few of the tapes have degraded to the point that I cannot play them back.
I would love to recreate his voice, to “rerecord” the missing bits.
The recordings are in Danish.
Is this possible?
If it is, how can I go about it?
Maybe the term you are searching for is “AI voice cloning”. The engine of https://elevenlabs.io/voice-cloning claims to be able to understand and reproduce even Danish.
Edit: They seem to require some voice verification to make sure the voice is yours. Which is odd in your case.
https://speechify.com/da should allow to recreate the voice of “your beloved one”, at least they mention it on their German page.
I did sign up for ElevenLabs, unfortunately they cannot allow me to clone a dead persons voice, as per their FAQ:
You may only clone your own voice or a voice you have the rights to clone. For added security, when creating a Professional Voice Clone we require users to complete a Voice Captcha mechanism by reading a text prompt within a specific time to confirm your voice matches the training samples you upload for training. If there’s a match, your request is sent for fine-tuning. If not, you’ll have to reach out via our help center to have your voice verified manually.
Now I’m sure it wouldn’t be an issue to get the legal rights, but when I spoke to their support, they did not have any way to verify beyond the captcha.
Maybe https://speechify.com/da/ works. At least they mention the recreation of the voice of “your beloved one” on their German page.
I can’t find this. Where is it on the German page?
I’ve been able to generate very good results with this open source project. You need a pretty good nVidia GPU, and it takes some time and tedious work to get it working they way you want it to:
https://github.com/neonbjb/tortoise-tts
Some voices sound exactly right. Other sound like a broken robot. The main reason I like it is that I can run it local without having to sign up for some stupid cloud service.
Looks very cool. I was unable to see anything regarding languages. Is it completely language independent somehow, or is it English only?
Elvenlabs is currently the best but you can get some very good results with first xtts then rvc as a second pass. It involves fine tuning models and running things with python and notebooks, so requires some know how.
You can explore more models on the huggingface page https://huggingface.co/models?pipeline_tag=text-to-speech&sort=trending
Most have a huggingface space dedicated to them where you can try them, here is the xtts space for example https://huggingface.co/spaces/coqui/xtts
The language adds an other layer of difficulty, I would try their demo first to see if it gives anything workable but it isn’t a language current tts software cater too, it doesn’t seem to be an available option on xtts sadly.
Thank you for the tips. As I see it currently, I expect the language to be the biggest hurdle. It doesn’t appear like something I can add myself, even if I had the data for a model. So as far as I can tell it involves two currently more or less impossible steps: Get model data and teach language to model.
If you can get them into a digital format I’ve personally used eleven labs to clone voices and make narrations for missions I created for a video game. I tried using different open source projects and getting it to run on my own with no avail, but 11 labs has been solid (it is unfortunately paid software of like $5/10 bucks a month though)
Was this with the “Instant voice clone”?