Text-to-Speech
Caskada does NOT provide built-in utilities
Instead, we offer examples that you can implement yourself. This approach gives you more flexibility and control over your project's dependencies and functionality.
Service
Free Tier
Pricing Model
Docs
IBM Watson TTS
10K chars Lite plan
~$0.02 /1K (i.e. ~$20 /M). Enterprise options available
ElevenLabs
10K chars monthly
From ~$5/mo (30K chars) up to $330/mo (2M chars). Enterprise
Example Code
1. Amazon Polly
# Requires: pip install boto3
import boto3
import os
def synthesize_polly(text: str, output_filename: str = "polly_output.mp3", region: str | None = None):
"""Synthesizes speech using AWS Polly."""
# Assumes AWS credentials are configured (e.g., via env vars, ~/.aws/credentials)
aws_region = region or os.environ.get("AWS_REGION", "us-east-1")
try:
polly = boto3.client("polly", region_name=aws_region)
response = polly.synthesize_speech(
Text=text,
OutputFormat="mp3",
VoiceId="Joanna" # Example voice
)
# Check if AudioStream is present
if "AudioStream" in response:
with open(output_filename, "wb") as f:
f.write(response["AudioStream"].read())
print(f"Audio saved to {output_filename}")
else:
print("Error: Could not stream audio from Polly.")
except Exception as e:
print(f"Error calling AWS Polly: {e}")
# Example:
# synthesize_polly("Hello from AWS Polly!")
2. Google Cloud TTS
3. Azure TTS
4. IBM Watson TTS
5. ElevenLabs
Last updated