Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Pages

Posts

publications

Definitions matter: Guiding GPT for multi-label classification

Published in EMNLP 2023, Conference on Empirical Methods in Natural Language Processing, 2023

Large language models have recently risen in popularity due to their ability to perform many natural language tasks without requiring any fine-tuning. In this work, we focus on two novel ideas: (1) generating definitions from examples and using them for zero-shot classification, and (2) investigating how an LLM makes use of the definitions. We thoroughly analyze the performance of GPT-3 model for fine-grained multi-label conspiracy theory classification of tweets using zero-shot labeling. In doing so, we asses how to improve the labeling by providing minimal but meaningful context in the form of the definitions of the labels. We compare descriptive noun phrases, humancrafted definitions, introduce a new method to help the model generate definitions from examples, and propose a method to evaluate GPT-3’s understanding of the definitions. We demonstrate that improving definitions of class labels has a direct consequence on the downstream classification results.

Recommended citation: Peskine, Y., Korencic, D., Grubišic, I., Papotti, P., Troncy, R., & Rosso, P. Definitions Matter: Guiding GPT for Multi-label Classification.

talks

Supercharging Wikidata with External Aliases and New Entity Types

Published:

Orange attended the WikidataCon2023 hybrid event and Yann Almeras presented Supercharging Wikidata with External Aliases and New Entity Types. Abstract: Wikidata plays a crucial role in facilitating Named Entities Linking and Relations extraction for companies and researchers. However, it also faces certain limitations. Unlike DBpedia, Wikidata lacks a comprehensive taxonomy of entities, and many entities have a partial list of aliases that could benefit from enrichment. In this talk, we will introduce a database built within Orange that supplements Wikidata with enriched entity information sourced from various external databases using intelligent heuristics. Then we will show how this database can be used to highlight inconsistencies and poor-quality data in Wikidata and across various Wikipedia editions. We will also share our plans to develop robots to seamlessly transfer enhanced data back into the public Wikidata instance, fostering a more robust and accurate knowledge base.