Voice Changer
Transform any voice recording with a new voice using our voice conversion technology.
Murf’s Voice Changer API lets you transform your existing audio into high-quality, lifelike AI voices from Murf. You can change who’s speaking, adjust the speed, pitch, and tone, and even apply different speaking styles. It doesn’t just swap voices — it understands how something is said. By analyzing your original audio, it captures natural elements like rhythm, intonation, pacing, word-level emphasis, and pauses. With advanced controls like Retain Prosody and Retain Accent, the output voice preserves the original speech’s flow, delivery style, and regional accent—just in a different voice.
Quickstart
You can either use the REST API directly or use one of our official SDKs to interact with the API.
A link to the audio file will be returned in the response. You can use this link to download the audio file and use it wherever you need it. The audio file will be available for download for 24 hours after generation.
Speech Customization
The Voice Changer endpoint offers comprehensive speech transformation capabilities, supporting all the customization features available in the Speech Synthesis endpoint. This includes style, pitch, speed, audio duration, variation and multi-native locale, along with advanced voice transformation features for greater flexibility and control
Here are the other features which we support in Voice Changer endpoint:
Retain Prosody
Prosody refers to the natural way someone speaks—their rhythm, tone, and emphasis. When Retain Prosody is turned on, the new voice keeps the same way of speaking as the original voice in your audio file, so it still sounds natural and close to how the person originally spoke.
- The default value is True.
- Features like variation, style, Audio duration become irrelevant when retain prosody is enabled.
- Disabling prosody retention will also disable accent retention
- Background sound will be retained only if both prosody retention and accent retention is enabled.
Retain Accent
Retain Accent helps keep the original accent from your input audio when changing the voice. This means the new voice will still sound like it has the same regional or native accent as the original speaker, making the audio feel more natural and familiar.
- The default value is True.
- If only accent transfer is disabled (with prosody retention enabled), the maximum allowed input length is 35 seconds. This limitation ensures efficient processing and high-quality output, preventing excessive distortion in longer segments.
Enabling both Retain Prosody and Retain Accent
When both Retain Prosody and Retain Accent are enabled, the Voice Changer endpoint delivers the most natural and expressive audio output, closely mirroring the original input in terms of rhythm, intonation, and accent. Additionally, this combination ensures that background sounds present in the original audio are also retained.
Return Transcription
When this feature is enabled, the system provides a transcript of the input audio. This is useful for users who need both voice transformation and textual data extraction from the input speech.
- The default value is False.
- When enabled, the user receives both the modified audio output and a text version of the original speech.
Transcription
This feature allows users to provide a transcription of an audio clip, which will then be used as input for the voice changer
Summary of Feature Interactions
FAQ
What are the supported input formats?
The API accepts the following input audio formats: WAV, MP3, ALAW, ULAW, FLAC.
What are the supported output formats?
Our system supports the following output formats: WAV (Default), MP3, FLAC, ALAW, and ULAW. The Voice Changer endpoint offers the same range of sample rates and channel types as the Speech Synthesis endpoint, allowing users to optimize output quality based on their specific needs.
What is the input file limit?
The maximum input file length is 1 minute. If “retain prosody” is set to true and “retain accent” is set to false, the limit is 35 seconds.
Can I get a transcript of the audio?
Yes! If you enable the “return transcription” option, a transcript will be generated for you.
Can I edit the transcription?
Yes, you can manually edit the transcription. However, manual transcription input will not work if both “retain prosody” and “retain accent” are set to true.