Swissdox Hackathon
The goal of this workshop is to find changes, trends and correlations in Swiss newspapers, from a political, historical, social or linguistic perspective.
We provide a dataset from the Swissdox@LiRI service, a collaboration between LiRI, at University of Zurich and SMD/Swissdox, containing 100,000 Swiss news articles from each newspaper (German and French speaking). This large data set allows you to address a large range of questions, from the perspectives of politics, media science, linguistics, history and others. Possible research question could be, for instance:
• How do Swiss newspapers report on Swiss votes? Can the outcome of a vote be predicted from the newspaper reporting and stance? Are certain newspapers better suitable for this task than others? For instance, [1] showed a correlation between the frequency of a candidate’s name in the news and electoral success in France.
• Can language use give us an indication of the political leaning of a newspaper? For instance [2] quantify political leanings of newspapers in South Korea.
• How can one model and measure semantic and linguistic change in newspapers over the last 25 years? For instance [3] describes quantitative change in British spoken language.
• Can gender stereotypes be and their change be traced? Are there differences specific to some papers?
• How has stance to debated topics, such as global warming, migration, environment, Russia, health, changed over the past 30 years?
• Is a possible political leaning (left/right) more strongly discernable in interior than in foreign news?
• Which newspapers give more prominence to which topics?
• Which associations do neologisms provoke? Do they reflect a political orientation?
• How does choice of topics correlate to external indices, such as people’s worries as expressed in the Worry Barometer ?
You can find the slides prensented during the session here.
Workshop Important Dates
- 22.04.23: Task description and data published
- 01.05.23: Open for submissions
- 31.05.23 08:59 CEST: Submission deadline
- 12.06.23: SwissText workshop with presentations by participants
Workshop Schedule
Note: This is a tentative schedule
- Hackathon introduction, results overview (15 min)
- Participants present their approaches (60-240 min, 15-20 min per participant)
- Discussion, future directions (15 min)
Workshop Resources and Rewards
The data (100,000 articles per newspaper) will be made available after registration and signing the license form. For copyright reasons, the data will need to be deleted. The data contains the year of publication, the newspaper, the original XML representation of the article, and a syntactically annotated version. The winner will be decided in a vote of all particiapnts at the workshop.
For copyright reasons, the data will need to be deleted a month after the end of the workshop.
- Gerold Schneider
- Johannes Graën
- Jonathan Schaber
- Noah Bubenhofer
- [1] Véronis, Jean. 2007. La presse a fait mieux que les sondeurs. la-presse-fait-mieux-que-les.html.
- [2] Hyungsuc Kang & Janghoon Yang. 2022. Quantifying Perceived Political Bias of Newspapers through a Document Classification Technique. Journal of Quantitative Linguistics, 29:2, 127-150, DOI: 10.1080/09296174.2020.1771136
- [3] Schneider, Gerold. 2022. “Recent changes in spoken British English according to spoken BNC2014”. In Susanne Flach & Martin Hilpert (eds.). Broadening the spectrum of corpus linguistics: New approaches to variability and change. [Studies in Corpus Linguistics.] Amsterdam: John Benjamins. 173-195.