EMBEDDIA was presented at the META-FORUM 2019 on October 8-9 2019 in Brussels, Belgium. Below are some photos from the event – more impressions to follow.
EMBEDDIA will be presented at the META-FORUM 2019 on October 8-9 2019 in Brussels, Belgium.
META-FORUM is the international conference on innovative Language Technologies for a multilingual information society. The event brings together a diverse audience: Experts from research and industry will meet to discuss the latest news and to learn more about new developments, projects and initiatives from the European Language and Language Technology community.
EMBEDDIA will be presented by Marko Robnik-Šikonja (technical manager) and Matthew Purver (data manager). The presentation will include a presentation of technical and development goals as well as a live demo of hate speech detection (for any language!) – presented by Andraž Pelicon.
Author: Khalid Alnajjar (UH) and Leo Leppänen (UH)
News forms a crucial part of our daily lives, keeping us up-to-date with the world around us. In the deluge of news presented to us daily, the role of headlines is more important than ever before. Headlines give us a glimpse into the content, allowing us to make split second judgements of whether some piece of news is important or otherwise interesting. As such, journalists pour significant amounts of creativity into their headlines.
In automated production of news, correctness is usually valued far higher than textual flourish. Most real-world systems for automated news production employ human-written templates as the basis of the natural language, filling them up with the desired information like a journalistic version of Mad Libs. While such approaches provide great control over what is produced, the texts easily end up using repetitious and dull language.
Computational creativity is a subfield of artificial intelligence that looks at computational systems and creativity from various points of view: how can we build creative machines, what does creativity mean in relation to computers, what does it mean when we say something is creative, etc. Building automated journalism systems with computational creativity in mind would allow the systems to produce more natural-looking, diverse and interesting news.
At the same time, combining natural language generation, journalism, and computational creativity is not a problem-free exercise. The production of creative news headlines is a complex problem due to the interplay of various cultural, conventional and stylistic factors. Not only does the headline need to be humorous and factually correct, it also needs to match a very special set of grammatical rules only employed in headline writing, also known as Headlinese. In some cases, going the route of clickbait can be considered correct (“Former celebrity shares their hottest sauna tips!”), in others it’s not (“You’ll never guess how much homelessness increased from last year! ”) Furthermore, headlines are often about topics where the line between acceptable and unacceptable humor is extremely thin. A creative system producing headlines would need to actively consider the offensiveness and “clickbaityness” of produced headlines and remove those that go beyond context-dependent boundaries.
Understandably, research on applying computational creativity to news is scarce. As part of the EMBEDDIA project, researchers from the University of Helsinki are studying the effects of introducing existing computational creativity methods into an automated journalism system, with initial results on adding creativity to headlines about municipal elections published earlier this year.
Alnajjar, K., Leppänen, L., & Toivonen, H. (2019). No Time Like the Present: Methods for Generating Colourful and Factual Multilingual News Headlines. In K. Grace, M. Cook, D. Ventura, & M. L. Maher (Eds.), Proceedings of the 10th International Conference on Computational Creativity (pp. 258-265). Association for Computational Creativity.
Leppänen, L., Munezero, M., Granroth-Wilding, M., & Toivonen, H. (2017). Data-Driven News Generation for Automated Journalism. In The 10th International Natural Language Generation Conference, Proceedings of the Conference (pp. 188-197). Stroudsburg: Association for Computational Linguistics. https://doi.org/10.18653/v1/w17-3528
Professor Marko Robnik-Šikonja from University of Ljubljana discussed cross-lingual technologies and EMBEDDIA on the Slovenian Radio (Podobe znanja).
We are happy to announce that our proposal for the shared task “Graded Word Similarity in Context task for evaluating word embeddings in a multi-lingual setting” has been accepted. The shared task will be organized at SemEval 2020. For more details, watch the video below.