Browser TTS vs cloud TTS: when each makes sense

The two options, in one sentence each

Browser TTS uses the operating system's built-in speech engine through your browser. It is free, fast, private, and capped at whatever quality your OS provides. The converter on this site is browser TTS.

Cloud TTS sends text to a hosted service that returns a synthesized audio file or audio stream, typically generated by a neural model that is much larger than anything that fits inside a phone or laptop. It is usually paid per character, requires an account, and produces output that is often hard to distinguish from a human reader.

The honest summary

For listening, proofreading, language practice, and accessibility on your own device: browser TTS is almost always enough. For producing audio that someone else will listen to — audiobooks, training videos, IVR systems, podcasts — cloud TTS is almost always the right choice.

Most "is the converter good enough?" questions can be answered with one sub-question: does the audio leave your device? If yes, you probably want cloud TTS. If no, browser TTS is the simpler tool.

Comparison across the dimensions that matter

Dimension	Browser TTS	Cloud TTS
Voice quality ceiling	What your OS ships, plus optional Enhanced/Premium downloads	State-of-the-art neural voices; some indistinguishable from human in short clips
Voice variety	Tens of voices, usually a few per language	Hundreds of voices, often dozens per language and accent
Latency	Effectively zero after the first user gesture; runs on-device	Network round-trip plus model inference time, usually a few hundred ms to seconds
Offline use	Works fully offline once voices are installed	Requires connectivity unless you self-host an open model
Privacy	Text never leaves the device	Text is sent to the provider; retention and training-use depend on terms
Downloadable audio file	Not exposed by the API; you can capture playback with a recorder	Returns a standard audio file (MP3, WAV, OGG) ready to share
Cost model	Free; no metering, no quota	Per-character pricing, often free up to a small monthly tier
Commercial-use licensing	Depends on your OS vendor's terms; often acceptable for incidental use	Usually explicit "use in commercial output" clauses with clear limits
Cross-device consistency	Different voices and quality on every visitor's device	Same voice, same quality, anywhere
Programmatic control	Rate, pitch, volume; partial SSML; limited word-boundary events	Full SSML, fine-grained timing marks, neural prosody controls

Decision criteria

1. Who is the listener?

If the listener is you — listening to articles, hearing your own draft, learning a language, studying notes — browser TTS is plenty. Whatever quality difference exists between your OS voice and a cloud voice is overshadowed by your willingness to adjust pacing and re-listen.

If the listener is someone else, on a device you do not control, cloud TTS makes far more sense. You get the same output on every visitor's device, and you can ship the audio as a file.

2. Do you need a downloadable file?

Browser TTS does not expose the synthesized audio as a stream to JavaScript. There is no portable way to obtain an MP3 from it. The "Save text" button on our converter saves the source text and the voice settings; the audio itself you must capture with your OS recorder while playback is running. If you need an MP3 or WAV directly, cloud TTS is the right tool.

3. How sensitive is the text?

Browser TTS keeps text on the device. Cloud TTS sends it to a provider, who may log requests and, depending on the contract, may use the text for product improvement. For confidential drafts, personal medical notes, or anything you would not paste into a search engine, browser TTS is the privacy-preserving choice.

4. How much text, how often?

Browser TTS is free and unmetered; you can run it for hours without thinking about cost. Cloud TTS bills per character. Long-form audiobook production is a different cost equation from a single voicemail greeting. If your usage is bursty or curiosity-driven, the absence of metering is browser TTS's quiet superpower.

5. How much do prosody and SSML matter?

If you need precise control — a specific pause length, an inserted breath, a phonetic spelling, a particular emphasis — cloud TTS engines respond to SSML in ways the browser cannot consistently match. Browser engines accept some SSML, but support varies (see browser support). For most plain-prose reading, the difference is invisible.

6. Do you need a consistent voice across users?

A page that loads in our converter will sound different on a MacBook running Premium voices than on a five-year-old Windows laptop with the default voice only. For internal tools and personal use, that is fine. For a published podcast, training video, or accessibility version distributed to many users, the consistency that cloud TTS provides is hard to beat.

Worked examples

Listening to a long article

Browser TTS. Open the article, paste a passage into the converter, hit Play, and continue. No upload, no account, no quota. Switch passages when you reach the end of one.

Hearing your draft before publishing

Browser TTS. The point is to catch awkward phrasing, not to produce broadcast-quality audio. The neural voices on macOS and Windows are smooth enough that subtle problems still surface. See the writing for TTS page for editing tricks that make the playback land.

Recording a 30-minute training video voiceover

Cloud TTS. You want a consistent voice across re-recordings, an MP3 you can drop into a video editor, and word-level timing marks if you plan to add captions. The cost is small relative to the time you'd otherwise spend recording yourself.

Producing an audiobook from a 70,000-word manuscript

Cloud TTS — but think carefully about voice license terms. Some providers permit commercial publication of synthesized audio explicitly; others require a higher-tier plan or restrict re-distribution. Verify before you record.

Practicing Spanish pronunciation for an hour a day

Browser TTS. The voice you want is "consistent, free, available on demand". A premium voice on iOS or macOS is more than enough; the install voices page covers downloading the right ones.

Building a public website that reads articles aloud

Cloud TTS, with browser TTS as a free fallback if the user has it. Cloud TTS gives every visitor the same experience; the browser fallback covers visitors who reject the cloud audio for any reason.

Common mistakes

Choosing cloud TTS for personal listening. Most personal use cases do not need a downloadable file; you pay for features you do not consume.
Choosing browser TTS for production audio. Inconsistent output across devices and the inability to export a file create downstream pain that the up-front saving never recovers.
Pasting confidential text into cloud TTS without checking terms. Some free tiers retain text for model improvement; some paid tiers do not. Read the data-use clauses before pasting.
Comparing voices on the same device. Cloud voices sound better on a quiet evening in headphones than they do over a phone speaker. Test in the real conditions the audio will be heard in.

A small note on terminology

"Browser TTS" in this article means the Web Speech API as implemented in mainstream browsers. There is also a category of browser extensions that route text through cloud services and return audio inside the browser; those are cloud TTS wearing a browser-extension coat. The distinction that matters is where the audio is synthesized, not where the playback happens.

For more on the in-browser side, see the browser support page; for the editing side, the writing for TTS page; and for accessibility-specific workflows, the accessibility setups page. The Terms of Service covers our position on commercial reuse of audio generated through our converter.