Last reviewed on May 13, 2026

The mental model

Speech engines do not "understand" text the way a reader does. They run two passes: a text-normalization step that decides how to pronounce numbers, dates, abbreviations, and symbols, and a prosody step that decides pacing, pitch, and emphasis. Both passes lean heavily on punctuation. The biggest lever you have over the audio is the punctuation in the text you give the engine.

Everything on this page is an edit you make in the input box of the converter, not a setting on the engine. Pick a voice you like once (see installing system voices if your list is short), then spend the rest of your time on the words.

Punctuation: the cheapest improvement

Periods, commas, semicolons, em-dashes, and ellipses become pauses of different lengths. Question marks change intonation; exclamation points change emphasis. Most rough-sounding TTS playback comes from text that was written as a long single sentence with too few commas — the engine sprints through and gives up on phrasing.

Reads poorly
If you are not sure which voice to pick start with the default voice for your system language and only change it if the rhythm of the speech feels wrong because most modern voices handle plain prose well.
Reads well
If you are not sure which voice to pick, start with the default voice for your system language. Only change it if the rhythm of the speech feels wrong. Most modern voices handle plain prose well.

The rule is: if you would breathe there when reading aloud, put a comma there. If you would stop, put a period. The engine is not subtle — give it explicit cues.

Em-dashes and parentheses

Em-dashes (—) and parenthetical asides shape pacing differently. Em-dashes produce a noticeable pause and a slight pitch reset; parentheses lower pitch through the aside and recover it. If a sentence has too much information to deliver in one breath, an em-dash is usually cleaner than a comma.

Ellipses

An ellipsis ("…") usually produces a longer trailing pause than a period. Useful for deliberate hesitation; misleading if you used it for "and so on", because the engine will dwell on it as if more is coming.

Abbreviations and acronyms

Engines normalize abbreviations using a dictionary. Common ones — Mr., Dr., St., U.S.A. — usually pronounce correctly. Less common ones get spelled out letter-by-letter, which is sometimes right (ID, FBI) and sometimes wrong (NATO, said as "N-A-T-O" instead of "nay-toe").

If a particular abbreviation comes out wrong, spell it phonetically in the text. The engine cannot tell the difference; the audio listener can.

Reads poorly
The NATO summit ended Friday.
Reads well
The Nato summit ended Friday.

The same trick works for technical terms that get tortured by the normalizer: write "SQL" as "sequel" if the voice says "ess-cue-ell" and that bothers you. You are editing for the ear, not the page.

Numbers, dates, and currencies

Numbers are where engines differ the most. "1984" might be read as "nineteen eighty-four" in a sentence about a book and "one thousand nine hundred eighty-four" in a sentence about a quantity. Some engines look at surrounding words; some do not. If the result is wrong, write the number the way you want it spoken.

Reads poorly: "the year one thousand nine hundred eighty-four"
The book was published in 1984.
Reads well
The book was published in nineteen eighty-four.

Dates and currencies have similar pitfalls:

  • "3/4/2025" is read differently in US-English (March 4th) and UK-English (3rd of April) voices. Write the month out: "March 4, 2025."
  • "$1.5M" is sometimes "one point five M", sometimes "one and a half million dollars". Write "one and a half million dollars" if precision matters.
  • "4–6 PM" can be read as "four-six PM" without the en-dash being pronounced. "from four to six in the afternoon" is unambiguous.

Homographs: same spelling, different sound

English is full of words that change pronunciation by part of speech. "Lead" (verb) and "lead" (the metal). "Read" (present tense) and "read" (past tense). "Live" (verb) and "live" (adjective). Engines pick one pronunciation; if they pick wrong, the easiest fix is to rewrite around the word.

Reads poorly: ambiguous "read"
I read the manual last night.
Reads well
Last night, I finished reading the manual.

This is one of the few areas where rewriting beats every other trick.

Proper nouns and unusual names

Names are the hardest part of TTS. If a name matters and is mispronounced, spell it phonetically in the text. There is no shame in writing "Siobhan" as "Shiv-awn" if the alternative is the engine guessing.

For company names, brand names, or anything with idiosyncratic capitalization, the engine usually reads each letter unless the result looks like a word. "GIMP" becomes "gimp"; "GIF" becomes either "gif" or "jif" depending on the voice. If consistency matters, write the pronounced form.

URLs, code, and symbols

Most engines read URLs character-by-character, including the slashes and dots. If the URL is incidental, write "see our contact page" and link to it visually for sighted readers; the listener does not need the raw address. If the URL must be spoken, write the domain in plain words: "text into audio dot com" instead of https://textintoaudio.com.

Code is similar. Engines read "()" as "open parenthesis close parenthesis" and "{" as "open brace". For programming tutorials, write the code-adjacent prose so the listener gets meaning even when the code is skipped.

Pacing: speed and pauses

The Speed slider in the converter scales the entire utterance uniformly. That is fine for casual listening but blunt for emphasis. If you want a specific phrase to land slower, surround it with commas and full stops in the text. Engines slow down naturally around punctuation more than they speed up on demand.

For deliberate dramatic pauses, an em-dash or an explicit "(pause)" in parentheses works in some engines — but the most portable approach is to put the pause on its own line:

Reads well
There was only one question left.

Did they make it home?

The blank line forces a longer pause than a period alone.

A short editing checklist

  1. Read it aloud yourself first. Where you naturally breathe, put a comma. Where you stop, put a period.
  2. Spell tricky names and acronyms phonetically.
  3. Write numerals as words where the year/quantity/date ambiguity matters.
  4. Spot homographs and rewrite around them.
  5. Replace URLs with descriptive text unless the URL itself must be spoken.
  6. Use blank lines for big pauses; em-dashes for medium ones; commas for small ones.
  7. Play the result back at the speed you actually plan to listen at, not at 1.0x. Pacing problems often only appear at 1.5x or above.

What you can't fix with text editing

The engine and the voice you chose impose a ceiling. A flat, low-quality voice will not sound expressive no matter how carefully the text is punctuated. If you have done a clean edit and the audio still feels mechanical, two things tend to help: switch to a higher-quality voice on your OS (see install system voices), or consider whether a cloud TTS service is the right tool for what you are trying to produce.

For accessibility-oriented workflows — dyslexia support, language learning, proofreading — the most useful text edits are smaller than you might expect. See the accessibility and learning page for setups that pair this editing approach with the converter's voice and speed controls.