This repository provides resources and tools for Automatic Speech Recognition (ASR) in Persian. It includes datasets, pretrained models, and a speech segmentation tool, all aimed at advancing Persian ASR technology.
- Persian ASR Dataset and Preparation Scripts
- Pretrained Persian ASR Models
- CTC-Based Speech Segmentation Tool
This section contains information and scripts to prepare and use the Persian ASR dataset. Detailed instructions can be found in the README located in the data directory.
We provide several pretrained ASR models for Persian and will release more in the future.
Model | Hugging Face Repository | WER (Greedy Decoding) | WER (Beam=5) | WER (Beam=5 + LM) |
---|---|---|---|---|
Wav2Vec2 XLS-R 300M | wav2vec2-xls-r-300m-fa | 27.92% | 27.89% | 22.63% |
Conformer Medium | nemo-conformer-medium-fa | 32.08% | 31.94% | 27.47% |
Before running the model scripts, the dataset must be prepared in the .jsonl
format. The preparation scripts for common datasets can be found in the data
directory. To learn more about how to generate the necessary .jsonl
files, refer to the data README.
To use a pretrained model, navigate to the corresponding model folder under models
. Each model folder contains two scripts: train.py
and inference.py
.
You can get detailed usage instructions by running:
python train.py --help
python inference.py --help
This tool performs speech segmentation based on Connectionist Temporal Classification (CTC). For detailed usage instructions, refer to the README inside the segmentation tool folder.
To use the tools and models in this repository, clone the repository and install the necessary dependencies:
git clone https://github.com/alifarrokh/persian-asr.git
cd persian-asr
pip install -r requirements.txt
We welcome contributions! Please submit a pull request or open an issue to suggest improvements or report bugs.