Page Not Found
Page not found.
A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.
Page not found.
About the ECLADATTA project
Published in EMNLP 2023, Conference on Empirical Methods in Natural Language Processing, 2023
Large language models have recently risen in popularity due to their ability to perform many natural language tasks without requiring any fine-tuning. In this work, we focus on two novel ideas: (1) generating definitions from examples and using them for zero-shot classification, and (2) investigating how an LLM makes use of the definitions. We thoroughly analyze the performance of GPT-3 model for fine-grained multi-label conspiracy theory classification of tweets using zero-shot labeling. In doing so, we asses how to improve the labeling by providing minimal but meaningful context in the form of the definitions of the labels. We compare descriptive noun phrases, humancrafted definitions, introduce a new method to help the model generate definitions from examples, and propose a method to evaluate GPT-3’s understanding of the definitions. We demonstrate that improving definitions of class labels has a direct consequence on the downstream classification results.
Recommended citation: Peskine, Y., Korencic, D., Grubišic, I., Papotti, P., Troncy, R., & Rosso, P. Definitions Matter: Guiding GPT for Multi-label Classification.
Published in Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), pages 1177–1182, 2024
This paper describes the submission of team EURECOM at SemEval-2024 Task 4: Multilingual Detection of Persuasion Techniques in Memes. We only tackled the first sub-task, consisting of detecting 20 named persuasion techniques in the textual content of memes. We trained multiple BERT-based models (BERT, RoBERTa, BERT pre-trained on harmful detection) using different losses (Cross Entropy, Binary Cross Entropy, Focal Loss and a custom-made hierarchical loss). The best results were obtained by leveraging the hierarchical nature of the data, by outputting ancestor classes and with a hierarchical loss. Our final submission consist of an ensembling of our top-3 best models for each persuasion techniques. We obtain hierarchical F1 scores of 0.655 (English), 0.345 (Bulgarian), 0.442 (North Macedonian) and 0.178 (Arabic) on the test set.
Recommended citation: Youri Peskine, Raphael Troncy, and Paolo Papotti. 2024. EURECOM at SemEval-2024 Task 4: Hierarchical Loss and Model Ensembling in Detecting Persuasion Techniques. In Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), pages 1177–1182, Mexico City, Mexico. Association for Computational Linguistics.
Published in Extraction et Gestion des Connaissances EGC2025 (pp. 279-286), 2025
L’extraction de relations (RE) est une tâche clé en traitement du langage naturel, visant à identifier les relations sémantiques entre des entités dans un texte. Les méthodes traditionnelles supervisées entraînent des modèles pour annoter les entités et prédire leurs relations. Récemment, cette tâche a évolué vers un problème séquence-à-séquence, où les relations sont converties en chaînes cibles générées à partir du texte d’entrée. Les modèles de langage, de plus en plus utilisés dans ce domaine, ont permis des avancées notables avec divers niveaux de raffinement. L’objectif de l’étude présentée ici est d’évaluer l’apport des grands modèles de langue (LLM) dans la tâche d’extraction de relations dans un domaine spécifique (ici le domaine économique), par rapport à des modèles de langue plus petits. Pour ce faire, nous avons considéré comme baseline un modèle reposant sur l’architecture BERT et entraîné dans ce domaine, et quatre LLMs, à savoir FinGPT spécifique au domaine de la finance, et XLNet, ChatGLM2 et LLama3 qui sont généralistes. Tous ces modèles ont été évalués sur une même tâche d’extraction, avec, pour les LLM généralistes, des affinements par few-shot learning et fine-tuning. Les expériences ont montré que les meilleures performances en termes de F-score ont été obtenues avec des LLM affinés, Llama3 obtenant les meilleures performances.
Recommended citation: M Mohamed Ettaleb, Mouna Kamel, Véronique Moriceau, Nathalie Aussenac-Gilles. La contribution des LLM à l'extraction de relations dans le domaine financier. Extraction et Gestion des Connaissances EGC2025, Thomas Guyet; Baptiste Lafabrègue; Aurélie Leborgne, Jan 2025, Strasbourg, France. pp.279-286. ⟨hal-04940352⟩
Published in Joint Workshop of the 9th Financial Technology and Natural Language Processing (FinNLP), the 6th Financial Narrative Processing (FNP), and the 1st Workshop on Large Language Models for Finance and Legal (LLMFinLegal), 2025
Relation Extraction (RE) is a fundamental task in natural language processing, aimed at deducing semantic relationships between entities in a text. Traditional supervised extraction methods relation extraction methods involve training models to annotate tokens representing entity mentions, followed by predicting the relationship between these entities. However, recent advancements have transformed this task into a sequence-to-sequence problem. This involves converting relationships between entities into target string, which are then generated from the input text. Thus, language models now appear as a solution to this task and have already been used in numerous studies, with various levels of refinement, across different domains. The objective of the present study is to evaluate the contribution of large language models (LLM) to the task of relation extraction in a specific domain (in this case, the economic domain), compared to smaller language models. To do this, we considered as a baseline a model based on the BERT architecture, trained in this domain, and four LLM, namely FinGPT specific to the financial domain, XLNet, ChatGLM, and Llama3, which are generalists. All these models were evaluated on the same extraction task, with zero-shot for the general-purpose LLM, as well as refinements through few-shot learning and fine-tuning. The experiments showedthat the best performance in terms of F-score was achieved with fine-tuned LLM, with Llama3 achieving the highest performance.
Recommended citation: Mohamed Ettaleb, Mouna Kamel, Nathalie Aussenac-Gilles, and Véronique Moriceau. 2025. The contribution of LLMs to relation extraction in the economic field. In Proceedings of the Joint Workshop of the 9th Financial Technology and Natural Language Processing (FinNLP), the 6th Financial Narrative Processing (FNP), and the 1st Workshop on Large Language Models for Finance and Legal (LLMFinLegal), pages 175–183, Abu Dhabi, UAE. Association for Computational Linguistics.
Published in Atelier TextMine 2025, EGC2025, 2025
L’extraction de relations (RE) vise à identifier et caractériser les relations sémantiques entre des entités dans un texte, une tâche clé en traitement automatique du langage naturel (TALN). Les approches supervisées traditionnelles reposent sur l’annotation des entités suivie de la prédiction des relations entre elles. Récemment, les méthodes séquence-à-séquence ont simplifié ce processus en générant directement les relations sous forme de chaînes cibles. Les grands modèles de langage (LLMs) se distinguent par leur capacité à traiter efficacement ces tâches complexes. Dans le cadre du défi TextMine 2025, l’objectif est d’automatiser l’extraction de relations à partir de rapports complexes pour le renseignement et la défense. Ce défi offre une opportunité unique d’évaluer les performances des LLMs dans des scénarios réalistes. Nous proposons une approche utilisant le modèle Llama3 (Dubey et al., 2024) pour détecter et classer les relations entre paires d’entités dans un texte. Nous combinons la puissance des LLMs avec des étapes de filtrage préalable reposant sur les types d’entités et de relations. L’objectif est d’évaluer dans quelle mesure un LLM peut répondre aux besoins de l’extraction de relations dans des contextes complexes, tout en mettant en lumière ses limites et les défis à surmonter.
Recommended citation: Mohamed Ettaleb, Mouna Kamel, Véronique Moriceau, Nathalie Aussenac-Gilles. Défi TextMine 2025 : Utilisation des Grands Modèles de Langue pour l'Extraction de Relations dans les Rapports de Renseignement. EGC - Atelier TextMine 2025, Pascal Cuxac; Cédric Lopez; Adrien Guille, Jan 2025, Strasbourg, France. pp.57-58, ⟨10.48550/ARXIV.2407.21783⟩. ⟨hal-04940482⟩
Published in COLING 2025, 31st International Conference on Computational Linguistics, 2025
Tropes — recurring narrative elements like the “smoking gun” or the “veil of secrecy” — are often used in movies to convey familiar patterns. However, they also play a significant role in online communication about societal issues, where they can oversimplify complex matters and deteriorate public discourse. Recognizing these tropes can offer insights into the emotional manipulation and potential bias present in online discussions. This paper addresses the challenge of automatically detecting tropes in social media posts. We define the task, distinguish it from previous work, and create a ground-truth dataset of social media posts related to vaccines and immigration, manually labeled with tropes. Using this dataset, we develop a supervised machine learning technique for multi-label classification, fine-tune a model, and demonstrate its effectiveness experimentally. Our results show that tropes are common across domains and that fine-tuned models can detect them with high accuracy.
Recommended citation: Alessandra Flaccavento, Youri Peskine, Paolo Papotti, Riccardo Torlone, and Raphael Troncy. 2025. Automated Detection of Tropes In Short Texts. In Proceedings of the 31st International Conference on Computational Linguistics, pages 5936–5951, Abu Dhabi, UAE. Association for Computational Linguistics.
Published:
Orange attended the WikidataCon2023 hybrid event and Yann Almeras presented Supercharging Wikidata with External Aliases and New Entity Types. Abstract: Wikidata plays a crucial role in facilitating Named Entities Linking and Relations extraction for companies and researchers. However, it also faces certain limitations. Unlike DBpedia, Wikidata lacks a comprehensive taxonomy of entities, and many entities have a partial list of aliases that could benefit from enrichment. In this talk, we will introduce a database built within Orange that supplements Wikidata with enriched entity information sourced from various external databases using intelligent heuristics. Then we will show how this database can be used to highlight inconsistencies and poor-quality data in Wikidata and across various Wikipedia editions. We will also share our plans to develop robots to seamlessly transfer enhanced data back into the public Wikidata instance, fostering a more robust and accurate knowledge base.