posted by user: grupocole || 886 views || tracked by 2 users: [display]

BUCC 2025 : 18th Workshop on Building and Using Comparable Corpora workshop at COLING'25

FacebookTwitterLinkedInGoogle

Link: https://comparable.lisn.upsaclay.fr/bucc2025/
 
When Jan 19, 2025 - Jan 20, 2025
Where Abu Dhabi, UAE
Submission Deadline Nov 30, 2024
Notification Due Dec 8, 2024
Final Version Due Dec 12, 2024
Categories    NLP   computational linguistics   artificial intelligene
 

Call For Papers



18th WORKSHOP ON BUILDING AND USING COMPARABLE CORPORA
WITH SHARED TASK ON MULTILINGUAL TERMINOLOGY EXTRACTION
FROM COMPARABLE CORPORA

Co-located with COLING 2025 (Abu Dhabi)

Paper submission deadline: 30 November, 2024

Workshop website: https://comparable.lisn.upsaclay.fr/bucc2025/

Keynote speaker: Preslav Nakov, Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi

**************************************************************

* Motivation

In the language engineering and linguistics communities, research in
comparable corpora has been motivated by two main reasons. In language
engineering, on the one hand, it is chiefly motivated by the need to
use comparable corpora as training data for statistical NLP
applications such as statistical and neural machine translation or
cross-lingual retrieval. In linguistics, on the other hand, comparable
corpora are of interest because they enable cross-language discoveries
and comparisons. It is generally accepted in both communities that
comparable corpora consist of documents that are comparable in content
and form in various degrees and dimensions across several
languages. Parallel corpora are on the one end of this spectrum, and
unrelated corpora are on the other.

In recent years, the use of comparable corpora for pre-training Large
Language Models (LLMs) has led to their impressive multilingual and
cross-lingual abilities, which are relevant to a range of applications,
including Information Retrieval, Machine Translation, Cross-lingual text
classification, etc. The linguistic definitions and observations related
to comparable corpora can improve methods to mine such corpora or
to improve cross-lingual transfer of LLMs. Therefore, it is of great interest
to bring together builders and users of such corpora.

* Shared Task
This year we will run a shared task aimed at detecting translations of
terms via comparable corpora. Please see the website for details

* Topics
We solicit contributions on all topics related to comparable (and parallel) corpora, including but not limited to the following:

Building Comparable Corpora:
- Automatic and semi-automatic methods
- Methods to mine parallel and non-parallel corpora from the web
- Tools and criteria to evaluate the comparability of corpora
- Parallel vs non-parallel corpora, monolingual corpora
- Rare and minority languages, across language families
- Multi-media/multi-modal comparable corpora

Applications of comparable corpora:
- Human translation
- Language learning
- Cross-language information retrieval & document categorization
- Bilingual and multilingual projections
- (Unsupervised) Machine translation
- Writing assistance
- Machine learning techniques using comparable corpora

Mining from Comparable Corpora:
- Cross-language distributional semantics, word embeddings and
pre-trained multilingual transformer models
- Extraction of parallel segments or paraphrases from comparable corpora
- Methods to derive parallel from non-parallel corpora (e.g. to provide
for low-resource languages in neural machine translation)
- Extraction of bilingual and multilingual translations of single words,
multi-word expressions, proper names, named entities, sentences, and
paraphrases from comparable corpora, etc.
- Induction of morphological, grammatical, and translation rules from
comparable corpora
- Induction of multilingual word classes from comparable corpora

Comparable Corpora in the Humanities:

- Comparing linguistic phenomena across languages in contrastive
linguistics
- Analyzing properties of translated language in translation studies
- Studying language change over time in diachronic linguistics
- Assigning texts to authors via authors' corpora in forensic
linguistics
- Comparing rhetorical features in discourse analysis
- Studying cultural differences in sociolinguistics
- Analyzing language universals in typological research

* Workshop Organizers
- Serge Sharoff (University of Leeds)
- Ayla Rigouts Terryn (Université de Montréal (UdeM), Mila)
- Pierre Zweigenbaum (Université Paris-Saclay, CNRS, LISN, Orsay)
- Reinhard Rapp (University of Mainz, Germany)

* Program Committee
- Ebrahim Ansari (Institute for Advanced Studies in Basic Sciences,
Iran)
- Eleftherios Avramidis (DFKI, Germany)
- Gabriel Bernier-Colborne (National Research Council, Canada)
- Thierry Etchegoyhen (Vicomtech, Spain)
- Alex Fraser (University of Munich, Germany)
- Natalia Grabar (University of Lille, France)
- Amal Haddad Haddad (Universidad de Granada, Spain)
- Amir Hazem (University of Tokyo, Japan)
- Kyo Kageura (University of Tokyo, Japan)
- Natalie Kübler (Université Paris Cité, France)
- Philippe Langlais (Université de Montréal, Canada)
- Yves Lepage (Waseda University, Japan).
- Shervin Malmasi (Amazon, USA)
- Michael Mohler (Language Computer Corporation, USA)
- Emmanuel Morin (Nantes Université, France)
- Dragos Stefan Munteanu (RWS, USA)
- Ted Pedersen (University of Minnesota, Duluth, USA)
- Nasredine Semmar (CEA LIST, Paris, France)
- Silvia Severini (Leonardo Labs, Italy)
- Pranaydeep Singh (University of Gent, Belgium)
- Richard Sproat (Google, USA)
- Marko Tadić (University of Zagreb, Croatia)
- François Yvon (Sorbonne Université, France)

Related Resources

NIAI 2025   6th International Conference on Natural Language Processing, Information Retrieval and AI
BUCC 2024   17th Workshop on Building and Using Comparable Corpora
CSEA 2024   10th International Conference on Computer Science, Engineering and Applications
ICBSTS 2024   2024 5th International Conference on Building Science, Technology and Sustainability (ICBSTS 2024)
NLPTA 2024   5th International Conference on NLP Techniques and Applications
ICoGB 2025   2025 3rd International Conference on Green Building (ICoGB 2025)
BIOEN 2025   8th International Conference on Biomedical Engineering and Science
SI AID 2024   SPECIAL ISSUE on Adaptive Intrusion Detection System using Machine Learning in Wireless Sensor Networks
ICCBM 2025   2025 The 9th International Conference on Civil and Building Materials (ICCBM 2025)
ICCSEA 2024   14th International Conference on Computer Science, Engineering and Applications