SDP 2025 : 5th Workshop on Scholarly Document Processing

posted by user: grupocole || 2724 views || tracked by 3 users: [display]

SDP 2025 : 5th Workshop on Scholarly Document Processing

Link: https://sdproc.org/2025/

When	Jul 31, 2025 - Aug 1, 2025
Where	Vienna, Austria
Submission Deadline	Mar 1, 2025
Notification Due	Apr 17, 2025
Final Version Due	May 16, 2025

Categories NLP computational linguistics artificial intelligence

Call For Papers

5th Workshop on Scholarly Document Processing (SDP 2025) @ ACL 2025

Call for Papers

Dear colleagues – you are invited to participate in the 5th Workshop on Scholarly Document Processing (SDP 2025) to be held at ACL 2025 in Vienna, Austria. SDP 2025 will consist of a research track and five shared tasks. The call for research papers is described below, and more details can be found on our website, https://sdproc.org/2025/.

Papers must follow the ACL format and conform to the ACL 2025 Submission Guidelines. Paper submission has to be done through OpenReview: https://openreview.net/group?id=aclweb.org/ACL/2025/Workshop/SDProc

Website: https://sdproc.org/2025/
Submission site: https://openreview.net/group?id=aclweb.org/ACL/2025/Workshop/SDProc
X (Twitter): https://twitter.com/sdpworkshop
Shared tasks: https://sdproc.org/2025/sharedtasks.html
Paper submission deadline: March 1 (Saturday), 2025

Call for Research Papers
Scholarly literature is the chief means by which scientists and academics document and communicate their results and is therefore critical to the advancement of knowledge and improvement of human well-being. At the same time, this literature poses challenges to NLP uncommon in other genres, such as specialized language and high background knowledge requirements, long documents and strong structural conventions, multimodal presentation, citation relationships among documents, an emphasis on rational argumentation, and the frequent availability of detailed metadata and experimental data. These challenges necessitate the development of NLP methods and resources optimized for this domain. The Scholarly Document Processing (SDP) workshop provides a venue for discussing these challenges, bringing together stakeholders from different communities including computational linguistics, machine learning, text mining, information retrieval, digital libraries, scientometrics and others, to develop methods, tasks, and resources in support of these goals.

This workshop builds on the success of prior workshops: SDP workshops held at EMNLP 2020, NAACL 2021, COLING 2022, and ACL 2024, and the 1st and 2nd SciNLP workshops held at AKBC 2020 and 2021. In addition to having broad appeal within the NLP community, we hope the SDP workshop will attract researchers from other relevant fields including meta-science, scientometrics, data mining, information retrieval, and digital libraries, bringing together these disparate communities within ACL.

Topics of Interest
We invite submissions from all communities demonstrating usage of and challenges associated with natural language processing, information retrieval, and data mining of scholarly and scientific documents. Relevant topics include (but are not limited to):

Large Language Models (LLMs) for science
Representation learning and language modeling
Information extraction and NER
Document understanding
Summarization and generation
Question-answering
Discourse modeling/argumentation mining
Network analysis
Bibliometrics, scientometrics, and altmetrics
Reproducibility and research integrity, including new challenges posed by generative AI
Peer review tools, principles and technology
Metadata and indexing
Inclusion of datasets and computational resources
Research infrastructures and digital libraries
Increasing the representation in scholarly work of disadvantaged populations
LLM-based interfaces to consume/produce scholarly documents
Impact of scholarly communication on popular discourse

Submission Information
Authors are invited to submit full and short papers with unpublished, original work. Submissions will be subject to a double-blind peer-review process. Accepted papers will be presented by the authors at the workshop either as a talk or a poster. All accepted papers will be published in the workshop proceedings (proceedings from previous years can be found here:https://aclanthology.org/venues/sdp/), which will be published in the ACL Anthology.

The submissions must be in PDF format and anonymized for review. All submissions must be written in English and follow the ACL 2025 formatting requirements:

Long paper submissions: up to 8 pages of content, plus unlimited references.
Short paper submissions: up to 4 pages of content, plus unlimited references.

Submission Website: Paper submission has to be done through openreview:
https://openreview.net/group?id=aclweb.org/ACL/2025/Workshop/SDProc

Final versions of accepted papers will be allowed 1 additional page of content so that reviewer comments can be taken into account.
Important Dates (Main Research Track)

First call for workshop papers: December 19, 2024
Second call for workshop papers: January 24, 2025
Third call for workshop papers: February 24, 2025
Paper submission deadline: March 1, 2025
Pre-reviewed (ARR) submission deadline: March 25, 2025
Notification of acceptance: April 17, 2025
Camera-ready paper due: May 16, 2025
Workshop dates: July 31 – August 1, 2025

Note: Shared task submission deadlines and other important dates to be announced.

SDP 2024 Keynote Speakers
We are excited to have several keynote speakers at SDP 2025.

Tom Hope, Assistant Professor at Hebrew University of Jerusalem and Research Scientist at Allen Institute for AI.
James A. Evans, Professor and Director of the Knowledge Lab at University of Chicago and External Professor at the Santa Fe Institute.
TBA

SDP 2025 Shared Tasks
SDP 2025 will host five exciting shared tasks. More information about all shared tasks is provided on the workshop website:https://sdproc.org/2025/sharedtasks.html

Detecting automatically generated scientific papers (DAGPap 25)
A big problem with the ubiquity of Generative AI is that it has now become very easy to generate fake scientific papers. This can erode public trust in science and attack the foundations of science: are we standing on the shoulders of robots? The Detecting Automatically Generated Papers (DAGPAP) competition aims to encourage the development of robust, reliable AI-generated scientific text detection systems, utilizing a diverse dataset and varied machine learning models in a number of scientific domains.
Organizers: Savvas Chamezopoulos, Dan Li, Anita de Waard (Elsevier).

Contextualizing Scientific Figures and Tables (Context 25)
Interpreting scientific claims in the context of empirical findings is a valuable practice, yet extremely time-consuming for researchers. Such interpretation requires identifying key results (often captured in tables and figures) that provide supporting evidence from research papers, and contextualizing these results with associated methodological details (e.g., measures, sample, etc.). During the previous version of this shared task in 2024, we released datasets to support the development of methods for automatic identification of key result figures or tables as well as additional grounding context to make claim interpretation more efficient. However, the released datasets contained tables and images already extracted from the scientific papers to allow participants to bypass PDF pre-processing issues. In Context 2025, given recent advances in multimodal LLMs, we plan to extend the difficulty of this task by requiring participants to identify key results from paper PDFs directly, and add a new sub-task on multi-hop reasoning over scientific evidence.
Organizers: Joel Chan, Matthew Akamatsu, Aakanksha Naik

Scientific Visual Question Answering (SciVQA)
Scholarly articles convey valuable information not only through unstructured text but also via (semi-)structured figures such as charts and diagrams. Automatically interpreting the semantics of knowledge encoded in these figures can be beneficial for downstream tasks such as question answering (QA). In the SciVQA challenge, the participants will develop multimodal systems capable of efficiently processing both visual (i.e., addressing attributes such as colour, shape, size, etc.) and non-visual QA pairs based on images of scientific figures and their captions.
Organizers: Ekaterina Borisova, Georg Rehm

Scientific Fact-checking of Social Media Posts on Climate Change (ClimateCheck)
The ClimateCheck shared task focuses on fact-checking claims from social media about climate change against peer-reviewed scholarly articles. Participants will retrieve relevant publications from a corpus of 400,000 climate research articles and classify each abstract as supporting, refuting, or not having enough information about the claim. Training data will include human-annotated claim-publication pairs, and the evaluation will combine nDCG@K and Bpref for retrieval and F1 score for classification. The task aims to develop models that link social media claims to scientific evidence, promoting informed and evidence-based discussions on climate change.
Organizers: Raia Abu Ahmad, Georg Rehm

Software Mention Detection in Scholarly Publications (SOMD 2)
Software plays an essential role in computational research methods and is considered one of the crucial entities in scholarly documents. However, software mentions are not always cited in academic documents, resulting in various informal mentions of software across a paper. Automatic identification of such software mention contributes to the better understanding, accessibility, and reproducibility of the research work. In addition to the mention of software, to understand the research context, it is necessary to understand the purpose of a software mention and its attributes, making software mention detection a comprehensive task.
We are extending our first iteration of the shared task SOMD 2024 with new challenges. In addition to information extraction techniques, our extended focus would be on Joint Named Entity and Relation Classification techniques.
Organizers: Sharmila Upadhyaya, Frank Krueger, Stefan Dietze

Organizing Committee

Tirthankar Ghosal, Oak Ridge National Laboratory, USA
Philipp Mayr, GESIS – Leibniz Institute for the Social Sciences, Germany
Aakanksha Naik, Allen Institute for AI, USA
Amanpreet Singh, Allen Institute for AI, USA
Anita de Waard, Elsevier, Netherlands
Dayne Freitag, SRI International, USA
Georg Rehm, German Research Center for Artificial Intelligence (DFKI), Germany
Sonja Schimmler, Fraunhofer FOKUS, Germany
Dan Li, Elsevier, Netherlands