Generate speech audio from text using Microsoft Edge's text-to-speech API.
Heavily inspired by rany2/edge-tts and SchneeHertz/node-edge-tts
- Using standard web APIs. Should work in all (modern) JS environment
- Provides subtitle/caption data
Using npm:
npm install @echristian/edge-tts
Using pnpm:
pnpm install @echristian/edge-tts
// Web
const { audio, subtitle } = await generate({
text: "Hello, world!",
voice: "en-US-JennyNeural",
language: "en-US",
});
// Create an audio element and play the generated audio
const audioElement = new Audio(URL.createObjectURL(audio));
audioElement.play();
// Access subtitle data
console.log(subtitle);
Options that will be sent alongside the websocket request:
text
(required): The text that will be generated as audiovoice
(optional): Voice persona used to read the message. Defaults to"en-US-AvaNeural"
- Please refer to Language and voice support for the Speech service
language
(optional): Language of the message. Defaults to"en-US"
- Please refer to Language and voice support for the Speech service
outputFormat
(optional): Format of the audio output. Defaults to"audio-24khz-96kbitrate-mono-mp3"
- Please refer to SpeechSynthesisOutputFormat Enum
rate
(optional): Indicates the speaking rate of the text. Defaults to"default"
- Please refer to Customize voice and sound with SSML
pitch
(optional): Indicates the baseline pitch for the text. Defaults to"default"
- Please refer to Customize voice and sound with SSML
Options for parsing the subtitle:
splitBy
(required): The function will split the cues based on this option"sentence"
: will split the text usingIntl.Segmenter
"word"
: will split the text to X count of words for each cue"duration"
: will split the text to X duration of milliseconds for each cue
count
(optional): Used when splitting by"words"
or"duration"
- When splitting by
"words"
, count means the amount of words for each cue - When splitting by
"duration"
, count means the duration in milliseconds for each cue
- When splitting by
metadata
(required): Array of metadata received throughout the websocket connection