ENEOLI Webinar – NeoN: A New Tool for the Detection and Preliminary Analysis of Lexical Innovation
We are pleased to invite you to an upcoming webinar that will be of particular interest to those working on the detection of lexical innovation. This event is part of the activities of our network aimed at presenting new tools and methodologies for the study of neologisms and is open to all ENEOLI members.
We introduce NeoN, a new system designed to detect and conduct preliminary analyses of newly emerging words in Polish. Developed by the Linguistic Engineering Group at the Institute of Computer Science of the Polish Academy of Sciences, NeoN combines corpus and dictionary checks, frequency analysis, contextual lemmatization, and orthographic normalization with a Large Language Model module. This multi-layer architecture effectively processes daily updated text sources (e.g., RSS feeds) to accurately identify lexical innovations while filtering out problematic/error cases.During our presentation, we will outline the rationale and design principles behind NeoN, demonstrate the interface and its context display features, and illustrate its functionalities, such as automated categorisation and definition generation. We will also present concrete examples of newly detected lexical items and highlight how a human-in-the-loop approach is necessary for making good use of automated outputs. Finally, we will discuss potential future developments, such as extending NeoN to other languages and integrating additional functionalities to further support the automated monitoring and analysis of lexical innovation.
Aleksandra Tomaszewska is a researcher in AI data and corpus linguistics, recognized as one of Poland’s Top 100 Women in Engineering in the Policy and Advocacy category by Perspektywy. She co-created local language models (e.g., coordinating the Polish dataset for PLLuM) and is an active member of the AI Working Group at the Polish Ministry of Digital Affairs. Her work spans innovation in corpus studies, bias mitigation in Polish and Polish-language models, and lexical innovation. She contributes to research projects and promotes corpus linguistics and NLP data in academic and industry venues. She is an ENEOLI member and coordinates the Institutional Gender Fair Language task group.
Maciej Ogrodniczuk is Head of the Department of Language Modelling at the Institute of Computer Science of the Polish Academy of Sciences. He holds a Master’s degree in Computer Science from the Faculty of Mathematics, Informatics and Mechanics, and a PhD in Linguistics from the Faculty of Modern Languages at the University of Warsaw. He is involved in numerous national and international projects related to language processing and digital humanities.
Dr. Dariusz Czerski serves as an Assistant at the Artificial Intelligence Department of the Institute of Computer Science at the Polish Academy of Sciences. With fifteen years of experience in AI research, he specializes in automated web data collection, data clustering, and information retrieval. He played a key role in developing the NEKST search engine and has authored numerous scientific publications in these areas.
Bartosz Żuk is a PHD student at the Institute of Computer Science, Polish Academy of Sciences. In his research, he focuses mainly on the alignment of Large Language Model and discourse relation parsing.
