EMBEDDIA at the Naprej/Forward media festival

EMBEDDIA was presented at the 8th Naprej/Forward media festival in Ljubljana, Slovenia on November 21. Naprej/Forward is an annual festival organized by the Slovene Association of Journalists (Društvo novinarjev Slovenije). Each year the festival program contains a section called the Media harvest, which includes presentations of projects, stories or any kind of media content, that stood out in the country in the past year. EMBEDDIA, which is coordinated by the Slovene Jožef Stefan Institute (Institut Jožef Stefan), was among this year’s chosen projects. 

Our project’s coordinator Senja Pollak presented some general facts about EMBEDDIA, our team, partners and spoke about cross-lingual embeddings and the ways we are working on using them. She also showed a demo of a multilingual offensive speech detector, which was trained only for English data but worked for over 90 languages thanks to cross-lingual embeddings. After the presentation, she also answered questions from the audience, which mainly consisted of journalists and media content creators. 

Senja Pollak (JSI) presenting EMBEDDIA at the Naprej/Forward media festival. Foto: Kaja Brezočnik.
After the presentation, Senja answered questions from the audience, consisting mainly of journalists and media content creators. Foto: Kaja Brezočnik.

EMBEDDIA at the META-FORUM 2019

Author: Marko Robnik-Šikonja (UL)

META-FORUM is a series of annual conferences on language technologies with a focus on EU languages. The 2019 edition of the conference focused on the emerging European Language Grid (ELG) platform, which intends to become yellow pages for language resources and technologies.

The two-day conference (8th and 9th of September) gathered representatives of EU language-related projects, companies, and public institutions. Besides panel presentations and discussion forums, many projects prepared demonstrations of their ideas and progress. The EMBEDDIA project was represented by Matt Purver from the Queen Mary University of London, Andraž Pelicon from Jožef Stefan Institute, and Marko Robnik-Šikonja from the University of Ljubljana. The team showed a demo of multilingual offensive speech detector, which was trained only on English data but worked for over 90 languages thanks to cross-lingual embeddings. The demo received a warm welcome of its visitors.

As a part of pre-conference meetings, EU H2020 projects, financed in ICT-29 call, discussed closer cooperation with ELG. The ELG team intends to integrate different language services as docker images. The EMBEDDIA project presented resources and technologies it will build and challenges of integrating them into the ELG.

Matthew Purver (QMUL) presenting EMBEDDIA at the META-FORUM 2019. Photo source: European Language Grid (ELG).
Matthew Purver (QMUL) – EMBEDDIA Data Manager. Photo source: European Language Grid (ELG).
Andraž Pelicon (JSI) presenting a demo of multilingual offensive speech detector. The detector was trained only on English data but worked for over 90 languages thanks to cross-lingual embeddings. Photo source: European Language Grid (ELG).

EMBEDDIA at META-FORUM 2019 (photos)

EMBEDDIA was presented at the META-FORUM 2019 on October 8-9 2019 in Brussels, Belgium. Below are some photos from the event – more impressions to follow.


From left to right: Andraž Pelicon (JSI) and Matthew Purver (QMUL).

EMBEDDIA to be presented at META-FORUM 2019

EMBEDDIA will be presented at the META-FORUM 2019 on October 8-9 2019 in Brussels, Belgium.

META-FORUM is the international conference on innovative Language Technologies for a multilingual information society. The event brings together a diverse audience: Experts from research and industry will meet to discuss the latest news and to learn more about new developments, projects and initiatives from the European Language and Language Technology community. 

EMBEDDIA will be presented by Marko Robnik-Šikonja (technical manager) and Matthew Purver (data manager). The presentation will include a presentation of technical and development goals as well as a live demo of hate speech detection (for any language!) – presented by Andraž Pelicon.

This headline is not creative – or is it?

Author: Khalid Alnajjar (UH) and Leo Leppänen (UH)

News forms a crucial part of our daily lives, keeping us up-to-date with the world around us. In the deluge of news presented to us daily, the role of headlines is more important than ever before. Headlines give us a glimpse into the content, allowing us to make split second judgements of whether some piece of news is important or otherwise interesting. As such, journalists pour significant amounts of creativity into their headlines.

In automated production of news, correctness is usually valued far higher than textual flourish. Most real-world systems for automated news production employ human-written templates as the basis of the natural language, filling them up with the desired information like a journalistic version of Mad Libs. While such approaches provide great control over what is produced, the texts easily end up using repetitious and dull language. 

Computational creativity is a subfield of artificial intelligence that looks at computational systems and creativity from various points of view: how can we build creative machines, what does creativity mean in relation to computers, what does it mean when we say something is creative, etc. Building automated journalism systems with computational creativity in mind would allow the systems to produce more natural-looking, diverse and interesting news.

At the same time, combining natural language generation, journalism, and computational creativity is not a problem-free exercise. The production of creative news headlines is a complex problem due to the interplay of various cultural, conventional and stylistic factors. Not only does the headline need to be humorous and factually correct, it also needs to match a very special set of grammatical rules only employed in headline writing, also known as Headlinese. In some cases, going the route of clickbait can be considered correct (“Former celebrity shares their hottest sauna tips!”), in others it’s not (“You’ll never guess how much homelessness increased from last year! ”) Furthermore, headlines are often about topics where the line between acceptable and unacceptable humor is extremely thin. A creative system producing headlines would need to actively consider the offensiveness and “clickbaityness” of produced headlines and remove those that go beyond context-dependent boundaries. 

Understandably, research on applying computational creativity to news is scarce. As part of the EMBEDDIA project, researchers from the University of Helsinki are studying the effects of introducing existing computational creativity methods into an automated journalism system, with initial results on adding creativity to headlines about municipal elections published earlier this year.

References:

Alnajjar, K., Leppänen, L., & Toivonen, H. (2019). No Time Like the Present: Methods for Generating Colourful and Factual Multilingual News Headlines. In K. Grace, M. Cook, D. Ventura, & M. L. Maher (Eds.), Proceedings of the 10th International Conference on Computational Creativity (pp. 258-265). Association for Computational Creativity.

Leppänen, L., Munezero, M., Granroth-Wilding, M., & Toivonen, H. (2017). Data-Driven News Generation for Automated Journalism. In The 10th International Natural Language Generation Conference, Proceedings of the Conference (pp. 188-197). Stroudsburg: Association for Computational Linguistics. https://doi.org/10.18653/v1/w17-3528

Shared task at SemEval 2020 organized by EMBEDDIA

We are happy to announce that our proposal for the shared task “Graded Word Similarity in Context task for evaluating word embeddings in a multi-lingual setting” has been accepted. The shared task will be organized at SemEval 2020. For more details, watch the video below.