Amazon Polly Specializedsince 2016

A text-to-speech service that converts text into natural-sounding audio

About 2 min readLast updated: 2026-03-24

What It Does

Amazon Polly is a text-to-speech (TTS) service that converts text into realistic speech. It offers dozens of voices in over 30 languages, with natural-sounding output powered by a neural TTS engine. SSML (Speech Synthesis Markup Language) lets you adjust speech rate, pitch, and pauses.

Use Cases

Improving website and app accessibility (screen reader support), generating e-learning narration, audio delivery of news articles, voice generation for IVR (interactive voice response) systems, and audio output for IoT devices.

Everyday Analogy

Think of a professional narrator. Hand over a script (text), and they read it naturally in the voice and language you specify. You can even give direction (SSML) like "slow down here" or "emphasize this part."

What Is Polly?

Amazon Polly is an AI service that converts text to speech. It offers two engines - Standard and Neural - with the Neural engine producing more natural, human-like speech. For Japanese, voices like Mizuki (female) and Takumi (male) are available. Generated audio can be downloaded or streamed in MP3, OGG, or PCM format.

SSML and Voice Customization

SSML tags give you fine-grained control over speech. Use to insert pauses, to change speed or pitch, to stress words, and to specify pronunciation. You can also choose speaking styles like newscaster or conversational, depending on the use case. Long texts can be processed with asynchronous synthesis tasks, with results saved to S3. To deepen your understanding of SSML and voice customization, reference books on Amazon can be helpful.

Getting Started

In the Polly console, go to the "Text-to-Speech" tab, enter your text, select a voice, and click "Listen." To use the API, pass text and a voice ID to the SynthesizeSpeech API. The free tier includes 5 million characters (Standard) / 1 million characters (Neural) per month for the first 12 months.

Things to Watch Out For

The Neural engine is higher quality but costs roughly 4x more per character than Standard - choose based on your use case
Redistributing generated audio is allowed within the terms of service, but presenting Polly-generated speech as a human voice is prohibited

What It Does

Use Cases

Everyday Analogy

What Is Polly?

SSML and Voice Customization

Getting Started

Things to Watch Out For

Related Services

Related Articles

More in This Category

Similar Articles and Services