The OP mentioned that for so called, "High-fidelity voice cloning", it would take 20 minutes of training. I think a book author would want the best quality possible to reproduce their voice.
Many people prefer an audiobook version of a book to be read by the original author, which isn't always the case. If an author could make that version happen by using 20 minutes of their time + text2speech of the whole book, that would be an immensely positive value proposition on the side of this company.
But I'm not sure. Part of why I'd prefer the original author to read a book is that they vocally emphasize certain parts of the book, and I don't think these models could do that at this point.
> Many people prefer an audiobook version of a book to be read by the original author
Right, but having AI read the book in the author's voice is definitely not the author reading the work.
As you mention, the reason that people like to hear the author read it is because it's the author reading it, theoretically emphasizing and acting things out according to what was intended. It's not just to hear the author's voice.