OpenAI Unveils Early Results of Human-Like Text-to-Speech Feature, Raising Deepfake Concerns
OpenAI is unveiling preliminary findings from a test of a feature capable of articulating words with a convincingly human voice, marking a significant advancement in artificial intelligence while also raising concerns about the potential for deepfake manipulation. The company is sharing initial demonstrations and practical applications from a limited preview of its text-to-speech model, dubbed Voice Engine, which has been made available to approximately 10 developers thus far, according to a spokesperson.
The decision to refrain from a broader rollout of the feature follows feedback from various stakeholders, including policymakers, industry experts, educators, and creatives, as stated by an OpenAI spokesperson. Initially, the company had intended to distribute the tool to up to 100 developers through an application process, as outlined in a previous press briefing.
In a blog post published Friday, OpenAI acknowledged the substantial risks associated with generating speech that closely resembles human voices, particularly in light of concerns heightened during an election year. The company emphasized its engagement with domestic and international partners from governmental, media, entertainment, educational, and civil society sectors to integrate their insights into the development process.
While previous AI technologies have demonstrated the ability to simulate voices in certain contexts, OpenAI’s Voice Engine represents a notable advancement by producing speech that mimics specific individuals, including their unique cadence and intonation, using only a short sample of recorded audio.
During a demonstration, Bloomberg observed a clip featuring OpenAI’s CEO Sam Altman delivering a brief explanation of the technology in a voice virtually indistinguishable from his natural speech, yet entirely generated by AI. Jeff Harris, a product lead at OpenAI, described the quality of the generated voice as comparable to that of a human, provided the appropriate audio setup, though he cautioned about the ethical considerations surrounding accurate speech mimicry.
Among OpenAI’s current development partners utilizing the tool is the Norman Prince Neurosciences Institute at the not-for-profit health system Lifespan, which employs the technology to aid patients in regaining their voice. For instance, the tool facilitated the restoration of a young patient’s speech impaired by a brain tumor by replicating her earlier recordings for a school project, according to OpenAI’s blog post.
Additionally, OpenAI’s custom speech model can translate generated audio into various languages, offering potential applications for companies like Spotify Technology SA in podcast translation initiatives. The company highlighted educational benefits, such as diversifying voices for children’s educational content.
In the testing phase, OpenAI mandates that partners adhere to usage policies, obtain consent from original speakers before utilizing their voices, and disclose to listeners that the voices heard are AI-generated. Furthermore, an inaudible audio watermark is being implemented to differentiate between content created by the tool and authentic recordings.
Prior to a broader release, OpenAI seeks feedback from external experts, emphasizing the importance of global awareness regarding the trajectory of such technology, regardless of whether OpenAI ultimately proceeds with widespread deployment. The company also advocates for societal preparedness against challenges posed by advanced AI technologies, including urging banks to phase out voice authentication and promoting public education on deceptive AI content, alongside the development of techniques for discerning between real and AI-generated audio.