Voice Changer | Murf API

Murf’s Voice Changer API lets you transform your existing audio into high-quality, lifelike AI voices from Murf. You can change who’s speaking, adjust the speed, pitch, and tone, and even apply different speaking styles. It doesn’t just swap voices — it understands how something is said. By analyzing your original audio, it captures natural elements like rhythm, intonation, pacing, word-level emphasis, and pauses. With advanced controls like Retain Prosody and Retain Accent, the output voice preserves the original speech’s flow, delivery style, and regional accent—just in a different voice.

Voice Changer

Try this capability for free here

Quickstart

You can either use the REST API directly or use one of our official SDKs to interact with the API.

1 from murf import Murf
2 
3 client = Murf(
4     api_key="YOUR_API_KEY", # Not required if you have set the MURF_API_KEY environment variable
5 )
6 
7 file_path = "PATH_TO_YOUR_FILE" # Path to the file you want to use
8 
9 response = client.voice_changer.convert(
10   voice_id="en-US-terrell",
11   file=open(file_path, "rb"),
12   # file_url="URL_TO_YOUR_FILE", # Optional: Use `file_url` instead of `file` if you want to use a publicly accessible file
13   retain_prosody=True,
14   retain_accent=True
15 )
16 
17 print(response.audio_file)

A link to the audio file will be returned in the response. You can use this link to download the audio file and use it wherever you need it. The audio file will be available for download for 24 hours after generation.

Speech Customization

The Voice Changer endpoint offers comprehensive speech transformation capabilities, supporting all the customization features available in the Speech Synthesis endpoint. This includes style, pitch, speed, audio duration, variation and multi-native locale, along with advanced voice transformation features for greater flexibility and control

Here are the other features which we support in Voice Changer endpoint:

Retain Prosody

Prosody refers to the natural way someone speaks—their rhythm, tone, and emphasis. When Retain Prosody is turned on, the new voice keeps the same way of speaking as the original voice in your audio file, so it still sounds natural and close to how the person originally spoke.

The default value is True.
Features like variation, style, Audio duration become irrelevant when retain prosody is enabled.
Disabling prosody retention will also disable accent retention
Background sound will be retained only if both prosody retention and accent retention is enabled.

Input Sample

Without Retain Prosody

With Retain Prosody

Retain Accent

Retain Accent helps keep the original accent from your input audio when changing the voice. This means the new voice will still sound like it has the same regional or native accent as the original speaker, making the audio feel more natural and familiar.

The default value is True.
If only accent transfer is disabled (with prosody retention enabled), the maximum allowed input length is 35 seconds. This limitation ensures efficient processing and high-quality output, preventing excessive distortion in longer segments.

Enabling both Retain Prosody and Retain Accent

When both Retain Prosody and Retain Accent are enabled, the Voice Changer endpoint delivers the most natural and expressive audio output, closely mirroring the original input in terms of rhythm, intonation, and accent. Additionally, this combination ensures that background sounds present in the original audio are also retained.

Input Sample

Without Retain Prosody & Retain Accent

With Retain Prosody & Retain Accent

Return Transcription

When this feature is enabled, the system provides a transcript of the input audio. This is useful for users who need both voice transformation and textual data extraction from the input speech.

The default value is False.
When enabled, the user receives both the modified audio output and a text version of the original speech.

Transcription

This feature allows users to provide a transcription of an audio clip, which will then be used as input for the voice changer

Summary of Feature Interactions

	Supported Features	Maximum Allowed Length
Both Retain Prosody and Retain Accent Enabled	Speed, Pitch, Pauses, Return Transcription	1 minute
Retain Prosody Enabled while Retain Accent Disabled	MultiNative, Speed, Pitch, Pronunciation, Pauses, Transcription, Return Transcription	35 seconds
Both Retain Prosody and Retain Accent Disabled	All features	1 minute

FAQ

What are the supported input formats?

The API accepts the following input audio formats: WAV, MP3, ALAW, ULAW, FLAC.

What are the supported output formats?

Our system supports the following output formats: WAV (Default), MP3, FLAC, ALAW, and ULAW. The Voice Changer endpoint offers the same range of sample rates and channel types as the Speech Synthesis endpoint, allowing users to optimize output quality based on their specific needs.

What is the input file limit?

The maximum input file length is 1 minute. If “retain prosody” is set to true and “retain accent” is set to false, the limit is 35 seconds.

Can I get a transcript of the audio?

Yes! If you enable the “return transcription” option, a transcript will be generated for you.

Can I edit the transcription?

Yes, you can manually edit the transcription. However, manual transcription input will not work if both “retain prosody” and “retain accent” are set to true.