Persian Digits Audio Dataset

We are proud to present the Persian Digits Audio Dataset, a comprehensive collection of audio recordings representing the digits 0 through 9 in the Persian language. This dataset has been meticulously gathered to support research and development in the fields of speech recognition, machine learning, and other applications requiring audio digit recognition.

Dataset Overview

The dataset consists of recordings from multiple speakers and covers ten classes, each corresponding to a Persian digit from zero to nine. These audio clips were recorded under supervised conditions to ensure clarity and consistency, making this dataset a valuable resource for training and testing speech recognition models.

Augmentation Process

To enhance the dataset’s robustness and applicability in varied acoustic conditions, we have applied a series of augmentations:

Noise Additions: Gaussian noise and background noises are added to simulate different listening environments.
Temporal and Pitch Modifications: Time stretching and pitch shifting help model different speech rates and vocal pitches.
Artificial Distortions: MP3 compression and bit crushing simulate lower-quality audio inputs.
Signal Alterations: Time shifting and polarity inversion introduce additional variability. These augmentations ensure that models trained with this dataset are more resilient and perform well across a variety of audio scenarios.

Contributors

This project was led and supervised by Alireza Akhavanpour. The dataset was compiled with the help of the following students:

Alireza Kamiab
Reyhaneh Zare
Negar Baghaei Nejad
Mobina Shafiei
Seyed Mohammadreza Daryabak
Mohammad Takht Firooze
Mahtareh Moghaddam
Aida Farqani
Mehdi Sheikh Ansari
Mohammadreza Ghaderi
Mojtaba Shafie Hosseini
Soroush Mirzavandi
Reza Cheshmesimab
Reza Ghanbarzadeh
Mohammadamin Kianfar
Mostafa Madbari
Mohammad Abdoli
Fatemeh Tabsi

How to Cite This Dataset

If you use the Persian Digits Audio Dataset in your research or project, please cite it using the following format:

Akhavanpour, A., Kamiab, A., Zare, R., Baghaei Nejad, N., Shafiei, M., Daryabak, S. M., Takht Firooze, M., Moghaddam, M., Farqani, A., Sheikh Ansari, M., Ghaderi, M., Shafie Hosseini, M., Mirzavandi, S., Cheshmesimab, R., Ghanbarzadeh, R., Kianfar, M., Madbari, M., Abdoli, M., & Tabsi, F. (2024). Persian Digits Audio Dataset. Retrieved from https://class.vision/persian-audio-digits

Dataset Overview

Augmentation Process

Contributors

How to Cite This Dataset

مطالب زیر را حتما مطالعه کنید

دیدگاهتان را بنویسید لغو پاسخ

درباره کلاس‌ویژن

دسترسی سریع

تمامی حقوق سایت برای کلاس‌ویژن محفوظ می باشد.

Dataset Overview

Augmentation Process

Contributors

How to Cite This Dataset

مطالب زیر را حتما مطالعه کنید

دیتاست کارت ملی ایرانی

مجموعه داده STL-10

مجموعه داده شناسایی حرکات دست

مجموعه داده‌ی صوتی ارقام فارسی

مجموعه‌داده‌ی The Stack

چالش ایمیج‌نت (ImageNet) چیست؟ (+ویدیو)

دیدگاهتان را بنویسید لغو پاسخ

درباره کلاس‌ویژن

دسترسی سریع

درخواست مشاوره رایگان