Call for Hackathon Participation

Hackashop on news media content analysis and automated report generation

1 – 21 February (online hackathon) + 19/20 April, 2021 (presentations)
Hackathon in conjunction with EACL 2021
News media content, including news articles and the comments that readers post, is a rich potential source of insight into current events and opinions. This hackashop, a combination of hackathon and workshop, aims to bring together an audience from computer science, social science and industry to target multi-disciplinary challenges in news and comment analysis and reporting. 

In order to encourage scientific and technological advances, the hackashop has a dual format: (1) a traditional workshop format for discussing scientific advances is combined with (2) a preceding hackathon-type event in which participants are provided with news media data and a suite of tools to experiment with novel approaches and solutions. 

Hackashop participants are welcome to participate in one or both activities. The workshop day on 19 or 20 April will then bring participants together to share new insights from both research- and experimentation-based work. This call for participation deals with the hackathon track only (see the hackathon website for call for workshop papers.)

Why participate in the hackathon

Would you like to tackle exciting challenges in news media analysis and generation? Or do you like developing and experimenting with solutions in a multidisciplinary setting? Or are you possibly interested in multilingual settings, smaller languages, or low-resource languages? Or would you like to learn things about NLP hands on, and from working with others? 

We will provide access to relevant tools and models from ongoing research, datasets you can experiment with, as well as support from technical experts who know the tools and the models, from social scientists who have insight into media research, and from media professionals who know the practice and needs of the media industry.

The hackathon targets anyone interested in Natural Language Processing and Machine Learning (doctoral and graduate students, researchers and professionals). We especially welcome media researchers from social sciences/media studies and private sector representatives to join the cross-disciplinary teams.

If you are a student, you may be eligible to earn 3 ECTS for your participation. (University of Helsinki, the organizer, will grant its students 3 ECTS. We will issue other students a certificate of participation with which you can negotiate in your own university.)

Hackathon format

Due to Covid-19, the hackathon is organised as a virtual event over a period of three weeks, Feb 1 – Feb 21, 2021. We do not expect full-time participation during the time period, but regular and substantial effort will likely be required for a satisfying experience for yourself and the collaborators.

We will organise structured activities to support participants: matchmaking before the event, tutorials to tools, datasets and challenges in the beginning, expert consultancy and coaching through the hackathon, forums for communication and peer support, joint events for sharing experiences and hints and seeing what others are working on. 

A specialty of the hackashop is that completed hackathon projects are invited to submit a brief report (appr. 2 pages) to the hackashop workshop proceedings, to be published by EACL, and to present their project briefly (5-10 min) in the workshop event on 19 or 20 April in conjunction with EACL. These give hackathon participants both the opportunity to publish their project and to present it to a wider audience.

Rules of the hackathon

Do cool stuff related to news media analysis and generation. More specifically:

  • identify a relevant challenge (feel free to use our example challenges as seeds), 
  • develop a solution to the challenge using at least some of the tools or models we provide, 
  • experiment, e.g., with the data that we provide,
  • write a brief report and pitch/demo your results in the workshop event.

Example challenges

Automated content analysis of news media, including news articles and users’ comments on them, can provide unparalleled insight into current events, interests and opinions, as well as trends and changes in them. The needs are varied, from the readers who consume news of their personal interest to journalists who keep track of what is going on in the world, try to understand what their readers think of various topics, or want to automate routine reporting.

You and your team are free to choose a specific problem. To help you get started, we will provide example challenges (as well as tools and datasets for them). For instance: 

  • Detect changes in reporting about a certain political party over time, in terms of contents, viewpoints or attitudes.
  • Describe news stories concisely, e.g., by extracting or inferring keywords or phrases automatically from each story.
  • Summarise or visualise readers’ comments and viewpoints, e.g., by extracting and representing meaningful/interesting information from comment threads.
  • Detect user comments that provide useful additional information that supplements the article e.g., by correcting it or adding constructive information to discussion. 
  • Improve a given, automatically generated news report by making the language more fluent, varied or colorful through post processing.
  • Generate informative and/or creative titles for a given news story.

We encourage solutions for multilingual and cross-lingual problems and for smaller languages – and will provide tools and data for them. 

Examples of tools and models

We will provide a large collection of various tools that you can use to attack the above challenges, especially for some smaller languages and in multilingual settings. The tools correspond in majority to individual components and readily trained models for various languages, but also comprise selected integrated toolkits and workflows. The tools and models offered often cover some subset of Croatian, Estonian, Slovene, Finnish, Swedish, Lithuanian, Russian, Latvian, English; some tools are more general, some allow training for new languages.

We expect participants to make use of at least some of the provided tools/models. They include the following:

  • Multilingual BERT models
  • Keyword extraction
  • Sentiment analysis
  • Named entity recognition
  • Topic modeling
  • Diachronic analysis
  • Temporal metadata visualisation
  • Detection of offensive language
  • Comment filtering
  • Author profiling
  • Automated report generator for EuroStat statistics
  • Base code for generation of creative expressions
  • Texta Toolkit for building text analytics applications, with several machine learning components
  • ClowdFlows workflow system with example workflows

Examples of datasets

We will also provide news datasets that can be used in experimentation. Some of the datasets are from our partner media companies (in Finnish, Croatian, Estonian, and Swedish), some are from public sources (in English and other languages). You are also welcome to use your own data! Datasets we provide include the following:

  • News archives
  • Reader comment datasets
  • Headline datasets

Important dates for the hackathon

Jan 18 – Jan 29: Match-making period for those seeking for team mates
Jan 29: Registration deadline
Feb 1 – Feb 21: Hackathon
Feb 21: Hackathon reports due
Mar 1: Camera-ready versions of reports due
Apr 19 or 20: Workshop event

Quick feedback will be provided on hackathon reports for preparing the camera-ready versions. All hackathon reports will be included in the proceedings (but we reserve the right to reject reports of low quality).

Registration deadline is Jan 29, 2021. Registration is free of charge.

You are free to register as a team or as an individual (but we ask all members of teams to register themselves). We will help in match-making between participants who are looking for teammates.

Organizing committee

Hannu Toivonen (University of Helsinki, Finland), Hackashop Chair
Michele Boggia (University of Helsinki, Finland), Interaction Chair
Marko Robnik-Šikonja (University of Ljubljana, Slovenia), Tool Chair
Matthew Purver (Queen Mary University of London, UK), Data Chair
Carl-Gustav Linden (University of Bergen, Norway), Challenge Chair
Senja Pollak (Jozef Stefan Institute, Slovenia)



The hackashop is supported by the Horizon2020 project EMBEDDIA (“Cross-Lingual Embeddings for Less-Represented Languages in European News Media”, project number 825153, 2020-2022),