@ericc/edge-tts

Generate speech audio from text using Microsoft Edge's text-to-speech API.

Heavily inspired by rany2/edge-tts and SchneeHertz/node-edge-tts

Features

Using standard web APIs. Should work in all (modern) JS environment
Provides subtitle/caption data

Installation

Using npm:

npm install @echristian/edge-tts

Using pnpm:

pnpm install @echristian/edge-tts

Basic Usage

// Web
const { audio, subtitle } = await generate({
  text: "Hello, world!",
  voice: "en-US-JennyNeural",
  language: "en-US",
});

// Create an audio element and play the generated audio
const audioElement = new Audio(URL.createObjectURL(audio));
audioElement.play();

// Access subtitle data
console.log(subtitle);

Options

GenerateOptions

Options that will be sent alongside the websocket request:

text (required): The text that will be generated as audio
voice (optional): Voice persona used to read the message. Defaults to "en-US-AvaNeural"
- Please refer to Language and voice support for the Speech service
language (optional): Language of the message. Defaults to "en-US"
- Please refer to Language and voice support for the Speech service
outputFormat (optional): Format of the audio output. Defaults to "audio-24khz-96kbitrate-mono-mp3"
- Please refer to SpeechSynthesisOutputFormat Enum
rate (optional): Indicates the speaking rate of the text. Defaults to "default"
- Please refer to Customize voice and sound with SSML
pitch (optional): Indicates the baseline pitch for the text. Defaults to "default"
- Please refer to Customize voice and sound with SSML

ParseSubtitleOptions

Options for parsing the subtitle:

splitBy (required): The function will split the cues based on this option
- "sentence": will split the text using Intl.Segmenter
- "word": will split the text to X count of words for each cue
- "duration": will split the text to X duration of milliseconds for each cue
count (optional): Used when splitting by "words" or "duration"
- When splitting by "words", count means the amount of words for each cue
- When splitting by "duration", count means the duration in milliseconds for each cue
metadata (required): Array of metadata received throughout the websocket connection

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.config.ts		build.config.ts
eslint.config.js		eslint.config.js
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

@ericc/edge-tts

Features

Installation

Basic Usage

Options

GenerateOptions

ParseSubtitleOptions

Credits

About

Languages

License

ericc-ch/edge-tts

Folders and files

Latest commit

History

Repository files navigation

@ericc/edge-tts

Features

Installation

Basic Usage

Options

GenerateOptions

ParseSubtitleOptions

Credits

About

Topics

Resources

License

Stars

Watchers

Forks

Languages