FFSTC: Fongbe to French speech translation corpus
Publication Type
Conference Proceeding
Publication Date (Issue Year)
2024
Journal Name
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Abstract
In this paper, we introduce the Fongbe to French Speech Translation Corpus (FFSTC). This corpus encompasses approximately 31 hours of collected Fongbe language content, featuring both French transcriptions and corresponding Fongbe voice recordings. FFSTC represents a comprehensive dataset compiled through various collection methods and the efforts of dedicated individuals. Furthermore, we conduct baseline experiments using Fairseq’s transformer_s and conformer models to evaluate data quality and validity. Our results indicate a score BLEU of 8.96 for the transformer_s model and 8.14 for the conformer model, establishing a baseline for the FFSTC corpus.
Keywords
Speech translation corpus, spoken language translation, low-resource language, Fongbe-French, Fongbe
Rsif Scholar Name
Dèdjro Fortuné Kponou
Thematic Area
ICTs Including Big Data and Artificial Intelligence
Africa Host University (AHU)
Université d'Abomey-Calavi, Benin
Funding Statement
We would like to thank the Partnership for Skills in Applied Sciences, Engineering, and Technology (PASET) through the Regional Scholarship and Innovation Fund (RSIF) for the support for this research
Recommended Citation
Kponou, D. F., Laleye, F. A., & Ezin, E. C. (2024). FFSTC: Fongbe to French speech translation corpus. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) https://doi.org/https://aclanthology.org/2024.lrec-main.638