FFSTC: Fongbe to French speech translation corpus

Publication Type

Conference Proceeding

Publication Date (Issue Year)

2024

Journal Name

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Abstract

In this paper, we introduce the Fongbe to French Speech Translation Corpus (FFSTC). This corpus encompasses approximately 31 hours of collected Fongbe language content, featuring both French transcriptions and corresponding Fongbe voice recordings. FFSTC represents a comprehensive dataset compiled through various collection methods and the efforts of dedicated individuals. Furthermore, we conduct baseline experiments using Fairseq’s transformer_s and conformer models to evaluate data quality and validity. Our results indicate a score BLEU of 8.96 for the transformer_s model and 8.14 for the conformer model, establishing a baseline for the FFSTC corpus.

Keywords

Speech translation corpus, spoken language translation, low-resource language, Fongbe-French, Fongbe

Rsif Scholar Name

Dèdjro Fortuné Kponou

Rsif Scholar Nationality

Benin

Cohort

Cohort 4

Thematic Area

ICTs Including Big Data and Artificial Intelligence

Africa Host University (AHU)

Université d'Abomey-Calavi, Benin

Funding Statement

We would like to thank the Partnership for Skills in Applied Sciences, Engineering, and Technology (PASET) through the Regional Scholarship and Innovation Fund (RSIF) for the support for this research

This document is currently not available here.

Share

COinS