UZH Shared Task @ ArgMining Workshop 2026

Team	F1 Score	LLM-Judge	Final
🥇 LLM-Instruct	1	5	1
🥈 Prompteam	5	1	2
🥉 Argchestrators	2	6	3
🥉 HybridArguer	4	3	3
POINTERS	3	9	5
ResolveNow	9	2	7
TypeCoT	6	8	8
Ockham	8	7	9

Overview

United Nations resolutions encode collective reasoning at scale: negotiated positions, implicit premises, and carefully structured conclusions. This shared task evaluates how well modern systems can recover these underlying argumentative structures from text.

What you do: predict paragraph-level labels and argumentative relations on a held-out test set.
Who can participate: anyone (students are very much welcome).
Model policy: systems must rely exclusively on open-weight models ≤ 8B; closed/commercial models are not permitted.

Tasks

The shared task consists of two subtasks aligned with the workshop theme “Understanding and evaluating arguments in both human and machine reasoning.”

Subtask 1: Argumentative Paragraph Classification

For each paragraph, predict (a) whether it is preambular or operative, and (b) assign a subset of 141 predefined tags as a multi-label classification problem.

Subtask 2: Argumentative Relation Prediction

Given a paragraph, predict which other paragraphs it is related to (indices), and label each link with one or more relation types: contradictive, supporting, complemental, modifying.

Data

We provide a training set and a held-out test set. Both in JSON schema to enable easy processing and reproducible development. We encourage participants to explore the data and design their systems accordingly. To make the task more accessible to non-French speakers, we provide English translations for the dataset.

The shared task draws on DICED, a structured resource for exploring intergovernmental cooperation in education. The broader DICED curation work currently covers the following institutional sources.

Institutional source	Years	Status
UNESCO International Bureau of Education: Resolutions and Recommendations of the International Conference on Education	1934-2008	100% completed
Council of Europe Standing Conference of Education Ministers	1959-2023	100% completed
United Nations General Assembly Resolutions on Education	1945-2025	80% completed
United Nations Commission on Human Rights and United Nations Human Rights Council Resolutions addressing Education	1946-2025	60% completed
UNESCO Legal Instruments Pertaining to Education	1946-2025	75% completed
UNESCO General Conference resolutions pertaining to Education	1946-2025	20% completed
European Council Conclusions addressing Education	2009-2025	Planned
European Council Recommendations addressing Education	2009-2025	Planned
European Parliament Resolutions addressing Education	1958-2025	Planned
OECD Council Recommendations addressing Education	1961-2025	Planned

Training data: drawn from the UN-RES dataset (Gao et. al 2025), 2,695 parsed UN resolutions as raw text in French (plus machine-generated English translations using Helsinki-NLP/opus-mt-fr-en). The use of the training data is unlimited, as we encourage technical routines with strong LLM reasoning focuses (e.g., RAG, in-context learning, etc.) to align with this year workshop's spirit.
Test data: 45 parsed documents (resolutions and recommendations) from the UNESCO International Bureau of Education’s International Conference on Education (1934–2008), each contains up to three resolutions, annotated at paragraph level in French (we provide machine-generated English translations for the test set using gpt-4.1-mini). Below is an example from the test set:
{ "TEXT_ID": "ICPE-25-1962_RES1-FR_res_54", "RECOMMENDATION": 54, "TITLE": "LA PLANIFICATION DE L'ÉDUCATION", "METADATA": { "structure": { "doc_title": "ICPE-25-1962_RES1-FR", "nb_paras": 58, "preambular_para": [], "operative_para": [], "think": "" % how paragraphs are classified into preambular and operative } }, "body": { "paras": [ { "para_number": 1, "para": "La Conférence internationale de l'instruction publique, Convoquée à...", "type": null, "tags": [], "matched_paras": {}, "think": "", % how tags are assigned to the paragraph "para_en": "The International Conference on Education, convened in..." }, ... ] } }
All submissions must fill in the values for the following fields:

METADATA.structure:
- preambular_paras: list of paragraph indices (int) classified as preambular
- operative_paras: list of paragraph indices (int) classified as operative
- think: string describing the reasoning process (e.g., LLM thinking output)
paras:
- type: "preambular" or "operative"
- tags: list of tag labels (strings), one than more tags from different dimensions and categories are possible.
- matched_paras: dictionary of paragraph indices (int) linked by content or reference as keys, and relation types ("contradictive", "supporting", "complemental", "modifying") as values
- think: string describing the reasoning process (e.g., LLM thinking output)
Participants must enable the thinking mode of the their LLMs to reason about the relationships between paragraphs.

Tags are provided in a separate CSV file named evaluation_dimensions_updated.csv, along with dimensional and categorical metadata that participants may use in their systems.

Important Clarification: The "complemental" relation is meant to indicate that one paragraph is adding some information to the themes discussed in another paragraph. The "modifying" relation shows that one paragraph adjusts/changes/modifies the themes expressed in another paragraph.

University of Zurich, Department of Computational Linguistics

www.cl.uzh.ch cl_uzh ZurichNLP

Train and Test Set

Download on Hugging Face

Licensing note: training data follow a restricted UN license; by participating, teams agree not to redistribute the training data publicly.

Evaluation

Systems are evaluated using a combination of automated metrics and empirical auditing.

Automated Metric: F1 scores (scikit-learn) for classification accuracy.
Empirical Metric: LLM-as-a-Judge using an open-weight LLM with a fixed prompt (0-100 scale) to assess reasoning quality.

Final ranking is based on the average of both metrics. We will update the leaderboard live during the evaluation phase.

Submission

Participants submit predictions for the test set in the required JSON format.

Submission package

Predictions: strict JSON outputs conforming to the schema
System paper: non-anonymous paper with 4 pages (ACL format), excl. references; optional unlimited appendices
Code: in the paper add a link to a public repository (e.g., GitHub)

Compress your filled-out JSON test set and system paper into a single ZIP file for upload.

Allowed techniques are flexible (e.g., in-context learning, retrieval-augmented generation, etc.), but only open-source LLMs ≤ 8B may be used. Please also include a team name in your system paper for the leaderboard announcement.

Important dates

All deadlines are 11:59 PM UTC-12:00 (“anywhere on Earth”).

1 Feb 2026

Train and test data release

18 March 2026

Evaluation and submission starts

1 April 2026

Submission ends

15 April 2026

Evaluation ends; results notification

24 April 2026

Paper submission due

1 May 2026

Reviews to authors

12 May 2026

Camera-ready version due

July 2026

ArgMining 2026 Workshop

Organizers

University of Zurich, Zurich, Switzerland.

Yingqiang Gao — Postdoctoral Researcher, Linguistic Research Infrastructure
Anastassia Shaitarova — Postdoctoral Researcher, Department of Computational Linguistics
Reto Gubelmann — Research Group Leader, Digital Society Initiative & Department of Computational Linguistics
Patrick Montjouridès — Postdoctoral Researcher, Institute of Education

FAQ

Can I use closed-source or commercial models? No. Submissions must rely exclusively on open-source models ≤ 8B.
Can I use external data? The task does not impose strict constraints on use of unsupervised training data, but please document what you use in your system paper.
How do I submit? Submit JSON predictions for the test set via the evaluation platform (link above) and upload your system paper by the paper deadline.
What if my JSON does not match the schema? Non-conforming submissions will not be evaluated.
Where do I ask questions? Email the organizers (anyone).

Citation

If you use the shared task data or results, please cite the overview paper.

@inproceedings{shaitarova2026overview,
title={Overview of the {UZH} Shared Task 2026 on Reconstructing the Reasoning in {United Nations} Resolutions},
author={Shaitarova, Anastassia and Gao, Yingqiang and Rezkellah, Fatma-Zohra and Gubelmann, Reto and Montjourid{`e}s, Patrick},
booktitle={Proceedings of the 13th Workshop on Argument Mining and Reasoning},
year={2026},
publisher={Association for Computational Linguistics}
}

Reconstructing the Reasoning in United Nations Resolutions

Leaderboard