Text-to-Speech

Service

Free Tier

Pricing Model

Docs

Amazon Polly

5M std + 1M neural

~$4 /M (std), ~$16 /M (neural) after free tier

Google Cloud TTS

4M std + 1M WaveNet

~$4 /M (std), ~$16 /M (WaveNet) pay-as-you-go

Azure TTS

500K neural ongoing

~$15 /M (neural), discount at higher volumes

IBM Watson TTS

10K chars Lite plan

~$0.02 /1K (i.e. ~$20 /M). Enterprise options available

ElevenLabs

10K chars monthly

From ~$5/mo (30K chars) up to $330/mo (2M chars). Enterprise

Example Code

1. Amazon Polly

# Requires: pip install boto3
import boto3
import os

def synthesize_polly(text: str, output_filename: str = "polly_output.mp3", region: str | None = None):
    """Synthesizes speech using AWS Polly."""
    # Assumes AWS credentials are configured (e.g., via env vars, ~/.aws/credentials)
    aws_region = region or os.environ.get("AWS_REGION", "us-east-1")
    try:
        polly = boto3.client("polly", region_name=aws_region)
        response = polly.synthesize_speech(
            Text=text,
            OutputFormat="mp3",
            VoiceId="Joanna" # Example voice
        )

        # Check if AudioStream is present
        if "AudioStream" in response:
            with open(output_filename, "wb") as f:
                f.write(response["AudioStream"].read())
            print(f"Audio saved to {output_filename}")
        else:
            print("Error: Could not stream audio from Polly.")

    except Exception as e:
        print(f"Error calling AWS Polly: {e}")

# Example:
# synthesize_polly("Hello from AWS Polly!")

2. Google Cloud TTS

3. Azure TTS

4. IBM Watson TTS

5. ElevenLabs

Last updated