Skip to content

WebNLG Challenge 2023


The challenge has started. The deadline for the submission of systems output is June 15th, 2023.

The new edition of WebNLG focuses on four under-resourced languages which are severely under-represented in research on text generation, namely Maltese, Irish, Breton and Welsh. In addition, WebNLG 2023 will once again include Russian, which was first featured in WebNLG 2020.

Important Dates

  • 24 February 2023: Release of noisy training , gold development data and evaluation scripts.
  • 8 June 2023: Release of test data
  • 15 June 2023: Deadline for submission of system outputs.
  • 30 June 2023: Automatic evaluation results are released to participants
  • 1 August 2023: Deadline for submission of short papers describing systems.


The challenge focuses on RDF-to-text generation, similarly to WebNLG 2017 but targeting Breton, Irish, Maltese, Welsh, and Russian;

Given the four RDF triples shown in (1), the aim is to generate a text such as (b) or (c).



(a) Set of RDF triples

<entry category="Company" eid="Id21" shape="(X (X) (X) (X) (X))" shape_type="sibling" size="4">
        <mtriple>Trane | foundingDate | 1913-01-01</mtriple>
        <mtriple>Trane | location | Ireland</mtriple>
        <mtriple>Trane | foundationPlace | La_Crosse,_Wisconsin</mtriple>
        <mtriple>Trane | numberOfEmployees | 29000</mtriple>

(b) English text

Trane, which was founded on January 1st 1913 in La Crosse, Wisconsin, is based in Ireland. It has 29,000 employees.

(c) Russian text

Компания "Тране", основанная 1 января 1913 года в Ла-Кроссе в штате Висконсин, находится в Ирландии. В компании работают 29 тысяч человек.


If you download the data, please fill in this form 👍


The WebNLG 2023 dataset for training comprises 1,399 data-text pairs for Breton and 1,665 for Welsh, Irish and Maltese. The Russian data includes all data made available for the WebNLG 2020 Challenge.

See corpus documentation for the WebNLG format.


The challenge overview and results report can be found here.

System Descriptions


System outputs are assessed with automatic and human evaluation.

Automatic Evaluation

Generation is evaluated with automatic metrics: BLEU, METEOR, chrF++, TER, and BERT-Score. The evaluation scripts can be found here.

Human Evaluation

System outputs are assessed according to criteria such as grammaticality/correctness, appropriateness/adequacy, fluency/naturalness, etc., by native speakers.

Submission Format

System Output

Your submission file must be a .txt file (UTF-8 encoding) where each text is true-cased and detokenised. Example for English.

Each line should correspond to a verbalisation of a DBpedia triple set. Line 1 should represent the verbalisation of the DBpedia triple set with the ID=1, line 2 — the DBpedia triple set with the ID=2, etc.

System Descriptions

  • System descriptions are due on August 1st, 2023.
  • System Descriptions (2 to 4 pages) should use the ACL 2023 templates and should include a link to the system code, ideally including a checkpoint in case of a neural model.
  • System descriptions are mandatory and should clearly highlight the key elements of the approach. Submissions lacking a system description will not be included in the Challenge Results and Report.
  • Submissions should be made to the MM-NLG START website.


The challenge results will be presented at the MM-NLG 2023 workshop to take place at INLG 2023 on September 12th, 2023 in Prague.


Organising Committee

  • Enrico Aquilina, University of Malta
  • Anya Belz, Dublin City University, Ireland
  • Claudia Borg, University of Malta, Malta
  • Liam Cripwell, CNRS/LORIA and Lorraine University, France
  • Anna Nikiforovskaja, CNRS/LORIA and Lorraine University, France
  • Claire Gardent, CNRS/LORIA, France
  • Albert Gatt, Utrech University, The Netherlands
  • John Judge, Dublin City University, Ireland
  • William Soto-Martinez, CNRS/LORIA and Lorraine University, France

Participant FAQ

Which resources are allowed?

There are no restrictions for any task. E.g., you may use a pre-trained language model, external corpora, etc.

Can I submit multiple outputs?

Yes, given that they stem from substantially different systems. However, for human assessment we may ask you to provide a primary system that will be evaluated.

Can I participate in for one language only?

Yes. You can participate only in, say, RDF-to-text generation for Breton.

Can I download the data without participating in the challenge?


Will it be possible to withdraw my results if my team's performance is unsatisfactory?

Yes. We will first announce the results to participants anonymously, and you will have an opportunity to withdraw your results.