Persian Digits Audio Dataset
We are proud to present the Persian Digits Audio Dataset, a comprehensive collection of audio recordings representing the digits 0 through 9 in the Persian language. This dataset has been meticulously gathered to support research and development in the fields of speech recognition, machine learning, and other applications requiring audio digit recognition.
Dataset Overview
The dataset consists of recordings from multiple speakers and covers ten classes, each corresponding to a Persian digit from zero to nine. These audio clips were recorded under supervised conditions to ensure clarity and consistency, making this dataset a valuable resource for training and testing speech recognition models.
Augmentation Process
To enhance the dataset’s robustness and applicability in varied acoustic conditions, we have applied a series of augmentations:
- Noise Additions: Gaussian noise and background noises are added to simulate different listening environments.
- Temporal and Pitch Modifications: Time stretching and pitch shifting help model different speech rates and vocal pitches.
- Artificial Distortions: MP3 compression and bit crushing simulate lower-quality audio inputs.
- Signal Alterations: Time shifting and polarity inversion introduce additional variability. These augmentations ensure that models trained with this dataset are more resilient and perform well across a variety of audio scenarios.
Contributors
This project was led and supervised by Alireza Akhavanpour. The dataset was compiled with the help of the following students:
- Alireza Kamiab
- Reyhaneh Zare
- Negar Baghaei Nejad
- Mobina Shafiei
- Seyed Mohammadreza Daryabak
- Mohammad Takht Firooze
- Mahtareh Moghaddam
- Aida Farqani
- Mehdi Sheikh Ansari
- Mohammadreza Ghaderi
- Mojtaba Shafie Hosseini
- Soroush Mirzavandi
- Reza Cheshmesimab
- Reza Ghanbarzadeh
- Mohammadamin Kianfar
- Mostafa Madbari
- Mohammad Abdoli
- Fatemeh Tabsi
How to Cite This Dataset
If you use the Persian Digits Audio Dataset in your research or project, please cite it using the following format:
Akhavanpour, A., Kamiab, A., Zare, R., Baghaei Nejad, N., Shafiei, M., Daryabak, S. M., Takht Firooze, M., Moghaddam, M., Farqani, A., Sheikh Ansari, M., Ghaderi, M., Shafie Hosseini, M., Mirzavandi, S., Cheshmesimab, R., Ghanbarzadeh, R., Kianfar, M., Madbari, M., Abdoli, M., & Tabsi, F. (2024). Persian Digits Audio Dataset. Retrieved from https://class.vision/persian-audio-digits
You Can Download the Original Dataset:
And the Augmented One:
دیدگاهتان را بنویسید