Software and Datasets
- Software is available from the EMBEDDIA Github page:
- Datasets and embeddings:
- ELMo embeddings, Slovenian: http://hdl.handle.net/11356/1257
Publications
Journal papers
- Stephen McGregor, Kat Agres, Karolina Rataj, Matthew Purver and Geraint Wiggins (2019). Re-Representing Metaphor: Modelling Metaphor Perception Using Dynamically Contextual Distributional Semantics. Frontiers in Psychology, to appear.
- Blaž Škrlj, Jan Kralj, Nada Lavrač and Senja Pollak (2019). Towards Robust Text Classification with Semantics-Aware Recurrent Neural Architecture. Machine Learning and Knowledge Extraction 1(2): 575-589.
- Tadej Škvorc, Simon Krek, Senja Pollak, Špela Arhar Holdt and Marko Robnik-Šikonja (2019). Predicting Slovene Text Complexity Using Readability Measures. Contributions to Contemporary History 59.1.
- Matej Martinc and Senja Pollak (2019). Combining n-grams and deep convolutional features for language variety classification. Natural Language Engineering : 1-26.
- Andraž Repar, Vid Podpečan, Anže Vavpetič, Nada Lavrač, and Senja Pollak (2019). TermEnsembler: An enseble learning approach to bilingual term extraction and alignment. Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication 25(1):93-120.
- Andraž Repar, Matej Martinc, and Senja Pollak (2019). Replication, analysis and adaptation of a term alignment approach. Language resources and evaluation. https://doi.org/10.1007/s10579-019-09477-1.
Conference papers
- Andraž Pelicon, Matej Martinc and Petra Kralj Novak (2019). Embeddia at SemEval-2019 Task 6: Detecting hate with neural network and transfer learning approaches. In Proceedings of the 13th International Workshop on Semantic Evaluation (SemEval).
- Matej Martinc and Senja Pollak (2019). Pooled LSTM for Dutch cross-genre gender classification. In Proceedings of the Shared Task on Cross-Genre Gender Detection in Dutch at Computational Linguistic in the Netherlands (CLIN 2019) conference.
- Matej Martinc, Blaž Škrlj and Senja Pollak (2019). Who is hot and who is not? Profiling celebs on Twitter. In the Working Notes of CLEF 2019 – Conference and Labs of the Evaluation Forum.
- Matej Martinc, Blaž Škrlj and Senja Pollak (2019). Fake or not: Distinguishing between bots, males and females. In the Working Notes of CLEF 2019 – Conference and Labs of the Evaluation Forum.
- Khalid Alnajjar, Leo Leppänen, and Hannu Toivonen (2019). No Time Like the Present: Methods for Generating Colourful and Factual Multilingual News Headlines. In Proceedings of the 10th International Conference on Computational Creativity (pp. 258-265). Association for Computational Creativity.
- Jose G. Moreno, Elvys Linhares Pontes, Mickael Coustaty, and Antoine Doucet (2019). TLR at BSNLP2019: A multilingual named entity recognition system. Proceedings of the BSNLP-2019 Workshop, ACL 2019. pp: 83-88.
- Blaž Škrlj, Andraž Repar, and Senja Pollak (2019). RaKUn: Rank-based Keyword extraction via Unsupervised learning and Meta vertex aggregation. In Proceedings of the 7th International Conference on Statistical Language and Speech Processing (SLSP2019). pp: 311-323.
- Blaž Škrlj and Senja Pollak (2019). Language comparison via network topology. In Proceedings of the 7th International Conference on Statistical Language and Speech Processing (SLSP2019). pp: 112-123.
- Kristian Miok, Dong Nguyen-Doan, Blaž Škrlj, Daniela Zaharie, and Marko Robnik-Šikonja (2019). Prediction Uncertainty Estimation for Hate Speech Classification. In Proceedings of the 7th International Conference on Statistical Language and Speech Processing (SLSP2019). pp: 286-298.
- Senja Pollak, Andraž Repar, Matej Martinc, and Vid Podpečan (2019). Karst exploration: Extracting terms and definitions from karst domain corpus. In Proceedings of the 6th biennial conference on electronic lexicography, eLex 2019. pp: 934-956.
- Morteza Rohanian, Julian Hough, and Matthew Purver (2019). Detecting Depression with Word-Level Multimodal Fusion. In Proceedings of Interspeech 2019. pp: 1443-1447.
- Kristian Miok, Dong Nguyen-Doan, Daniela Zaharie, and Marko Robnik-Šikonja (2019). Generating Data using Monte Carlo Dropout. In Proceedings of 2019 IEEE 15th International Conference on Intelligent Computer Communication and Processing (ICCP 2019).
- Jani Marjanen, Lidia Pivovarova, Elaine Zosa, and Jussi Kurunmäki (2019). Clustering Ideological Terms in Historical Newspaper Data with Diachronic Word Embeddings. In Proceedings of the 5th International Workshop on Computational History.
- Lidia Pivovarova, Elaine Zosa, and Jussi Kurunmäki (2019). Word Clustering for Historical Newspapers Analysis. In Proceedings of the Workshop on Language Technology for Digital Historical Archives.
Case Studies
To appear.
Deliverables
Below is a list of submitted public deliverables of EMBEDDIA.
Deliverable | Submission date |
D7.1: Project website and social media accounts (T7.1) | 31/03/2019 |
D6.1: Recommendations on avoiding gender and other biases (T6.4) | 30/04/2019 |