2nd Swiss German Speech to Standard German Text Shared Task

The goal of this task is to build a system able to translate Swiss German speech to Standard German text and optimize it for the Graubünden dialect.

We provide the dataset SDS-200 [1] with 200 hours of Swiss German recordings from all dialects with Standard German transcriptions, including 6 hours of Graubünden dialect. Participants are also allowed to use the SwissDial [2] dataset (Swiss German recordings, Standard German text, all dialects, 34 hours total, 11 hours Graubünden) as well as the Standard German, French, and Italian datasets of the Common Voice [3] project. No additional data is allowed.

The team with the best BLEU score on a 5 hours test set with Graubünden speakers wins the contest. The data in the test set was collected in a similar fashion to SDS-200.

We encourage participants to explore suitable transfer learning and finetuning approaches based on the Swiss German, Standard German, French, and Italian data provided.

Please register here for detailed information and to submit your test set predictions.

Workshop Important Dates

  • 01.04.22: Task description and data published
  • 05.05.22: Open for submissions
  • 30.05.22 08:59 CEST: Submission deadline
  • 08.06.22: SwissText workshop with presentations by participants

 

Workshop Schedule

Note: This is a tentative schedule

 

  • Shared task introduction, results overview (15 min)
  • Participants present their approaches (60 min)
  • Discussion, future directions (15 min)

Workshop Resources

The data can be downloaded as follows:

 

Note: No additional data is allowed.

Organizers

References

[1] Michel Plüss, Manuela Hürlimann, Marc Cuny, Alla Stöckli, Nikolaos Kapotis, Julia Hartmann, Malgorzata Anna Ulasik, Christian Scheller, Yanick Schraner, Amit Jain, Jan Deriu, Mark Cieliebak, Manfred Vogel. 2022. SDS-200 – A Swiss German Speech to Standard German Text Corpus. Submitted to LREC 2022.

[2] Pelin Dogan-Schönberger, Julian Mäder, Thomas Hofmann. 2021. SwissDial: Parallel Multidialectal Corpus of Spoken Swiss German. arXiv:2103.11401 [cs.CL].

[3] Ardila, R. and Branson, M. and Davis, K. and Henretty, M. and Kohler, M. and Meyer, J. and Morais, R. and Saunders, L. and Tyers, F. M. and Weber, G. 2020. Common Voice: A Massively-Multilingual Speech Corpus. Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020).

Important Dates

  • April 1, 2022: Paper submission
  • February 15, 2022: Workshop proposal
  • May 20, 2022: Early registration ends
  • June 8 – 10, 2022: Main Conference