Our SemEval 2020 Task3: Predicting the (Graded) Effect of Context in Word Similarity is open for submissions. For this task, we ask participants to build systems to predict the effect that context has on human perception of similarity of words. Participants can submit their results until the 12th of March.
In order to be able to look at these effects, we built several datasets where we asked annotators to score how similar a pair of words are after they have read a short paragraph (which contains the two words). Each pair is scored within two of these paragraphs, allowing us to look at changes in similarity ratings due to context. We built datasets, containing these contextual similarity ratings, in four different languages:
- Croatian: HR
- English: EN
- Finnish: FI
- Slovenian: SL
The pairs of words come from the well known SimLex999 dataset. The contexts are chosen so as to encourage different perceptions of similarity. Polysemy plays a role, however, we are especially interested in more subtle, graded changes in meaning. All data and examples are available on this link: https://competitions.codalab.org/competitions/20905 and more details here: https://arxiv.org/abs/1912.05320.