A Model for Albanian Speech Recognition Using End-to-End Deep Learning Techniques

Amarildo  Rista; Arbana  Kadriu

doi:10.56345/ijrdv9n301

Authors

Amarildo Rista South East European University, Faculty of Contemporary Sciences and Technologies, Tetovo, North Macedonia
Arbana Kadriu South East European University, Faculty of Contemporary Sciences and Technologies, Tetovo, North Macedonia

DOI:

https://doi.org/10.56345/ijrdv9n301

Keywords:

Deep learning, Albanian language, End-to-end ASR, Speech Recognition, Corpus

Abstract

End-to-end Automatic Speech Recognition (ASR) system folds the acoustic model (AM), language model (LM), and pronunciation model (PM) into a single neural network. The joint optimization of all these components optimizes performance of the model. In this paper, we introduce a model for Albanian speech recognition (SR) using end-to-end deep learning techniques. The two main modules that build this model are: Residual Convolutional Neural Networks (ResCNN), which aims to learn the relevant features and Bidirectional Recurrent Neural Networks (BiRNN) aiming to leverage the learned ResCNN audio features. To train and evaluate the model, we have built a corpus for Albanian Speech Recognition (CASR), which contains 100 hours of audio data along with their transcripts. During the design of the corpus we took into account the attributes of the speaker such as: age, gender, and accent, speed of utterance and dialect, so that it is as heterogeneous as possible. The evaluation of the model is done through word error rate (WER) and character error rate (CER) metrics. It achieves 5% WER and 1% CER.

A Model for Albanian Speech Recognition Using End-to-End Deep Learning Techniques

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

License

How to Cite

Similar Articles

Most read articles by the same author(s)

Make a Submission

Journal Information

Latest publications

Information