2nd Swiss German Speech to Standard German Text Shared Task
The goal of this task is to build a system able to translate Swiss German speech to Standard German text and optimize it for the Graubünden dialect.
We provide the dataset SDS-200 [1] with 200 hours of Swiss German recordings from all dialects with Standard German transcriptions, including 6 hours of Graubünden dialect. Participants are also allowed to use the SwissDial [2] dataset (Swiss German recordings, Standard German text, all dialects, 34 hours total, 11 hours Graubünden) as well as the Standard German, French, and Italian datasets of the Common Voice [3] project. No additional data is allowed.
The team with the best BLEU score on a 5 hours test set with Graubünden speakers wins the contest. The data in the test set was collected in a similar fashion to SDS-200.
We encourage participants to explore suitable transfer learning and finetuning approaches based on the Swiss German, Standard German, French, and Italian data provided.
Please register here for detailed information and to submit your test set predictions.
Workshop Important Dates
- 01.04.22: Task description and data published
- 05.05.22: Open for submissions
- 30.05.22 08:59 CEST: Submission deadline
- 08.06.22: SwissText workshop with presentations by participants
Workshop Schedule
Note: This is a tentative schedule
- Shared task introduction, results overview (15 min)
- Participants present their approaches (60 min)
- Discussion, future directions (15 min)
Workshop Resources
The data can be downloaded as follows:
- SDS-200: This dataset is available under a Creative Commons BY-NC license (commercial use not allowed). Please write an email to michel.pluess@fhnw.ch with a short text stating that you will not use the data for commercial purposes and I will send you a download link.
- SwissDial: https://mtc.ethz.ch/publications/open-source/swiss-dial.html
- Common Voice:
- Test set (audio only): https://drive.switch.ch/index.php/s/qmOgf9yI3RP72Hp
Note: No additional data is allowed.
Organizers
- Michel Plüss, FHNW, michel.pluess@fhnw.ch
- Christian Scheller, FHNW, christian.scheller@fhnw.ch
- Yanick Schraner, FHNW, yanick.schraner@fhnw.ch
- Manfred Vogel, FHNW, manfred.vogel@fhnw.ch
References
[1] Michel Plüss, Manuela Hürlimann, Marc Cuny, Alla Stöckli, Nikolaos Kapotis, Julia Hartmann, Malgorzata Anna Ulasik, Christian Scheller, Yanick Schraner, Amit Jain, Jan Deriu, Mark Cieliebak, Manfred Vogel. 2022. SDS-200 – A Swiss German Speech to Standard German Text Corpus. Submitted to LREC 2022.
[2] Pelin Dogan-Schönberger, Julian Mäder, Thomas Hofmann. 2021. SwissDial: Parallel Multidialectal Corpus of Spoken Swiss German. arXiv:2103.11401 [cs.CL].
[3] Ardila, R. and Branson, M. and Davis, K. and Henretty, M. and Kohler, M. and Meyer, J. and Morais, R. and Saunders, L. and Tyers, F. M. and Weber, G. 2020. Common Voice: A Massively-Multilingual Speech Corpus. Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020).