WebNLG Challenge 2017¶

Info

The WebNLG Challenge 2017 is over. It was held in April-October 2017.

General Information¶

Task¶

The WebNLG challenge consists in mapping data to text. The training data consists of Data/Text pairs where the data is a set of triples extracted from DBpedia and the text is a verbalisation of these triples. For instance, given the 3 DBpedia triples shown in (a), the aim is to generate a text such as (b).

a. (John_E_Blaha birthDate 1942_08_26) (John_E_Blaha birthPlace San_Antonio) (John_E_Blaha occupation Fighter_pilot)
b. John E Blaha, born in San Antonio on 1942-08-26, worked as a fighter pilot

As the example illustrates, the task involves specific NLG subtasks such as sentence segmentation (how to chunk the input data into sentences), lexicalisation (of the DBpedia properties), aggregation (how to avoid repetitions) and surface realisation (how to build a syntactically correct and natural sounding text).

Data¶

The WebNLG Challenge dataset consists of 21,855 data/text pairs with a total of 8,372 distinct data input. The input describes entities belonging to 9 distinct DBpedia categories namely, Astronaut, University, Monument, Building, ComicsCharacter, Food, Airport, SportsTeam and WrittenWork. The WebNLG data is licensed under the following license: CC Attribution-Noncommercial-Share Alike 4.0 International. For a more detailed description of the dataset, see here.

After the challenge had finished, a larger dataset was released, describing 15 DBpedia categories. New categories include CelestialBody, MeanOfTransportation, City, Athlete, Politician, Artist.

References¶

Creating Training Corpora for NLG Micro-Planners. Claire Gardent, Anastasia Shimorina, Shashi Narayan and Laura Perez-Beltrachini. Proceedings of ACL 2017. PDF
Building RDF Content for Data-to-Text Generation. Laura Perez-Beltrachini, Rania Sayed and Claire Gardent. Proceedings of COLING 2016. Osaka (Japan). PDF
The WebNLG Challenge: Generating Text from DBpedia Data. Emilie Colin, Claire Gardent, Yassine Mrabet, Shashi Narayan and Laura Perez-Beltrachini. Proceedings of INLG 2016 PDF

To cite the dataset and/or challenge, use:

@inproceedings{gardent2017creating,
    title = "Creating Training Corpora for {NLG} Micro-Planners",
    author = "Gardent, Claire  and
      Shimorina, Anastasia  and
      Narayan, Shashi  and
      Perez-Beltrachini, Laura",
    booktitle = "Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2017",
    address = "Vancouver, Canada",
    publisher = "Association for Computational Linguistics",
    doi = "10.18653/v1/P17-1017",
    pages = "179--188",
    url = "https://www.aclweb.org/anthology/P17-1017.pdf"
}

Important Dates¶

14 April 2017: Release of Training and Development Data
30 April 2017: Release of Baseline System
~~18 August 2017: Release of Test Data~~ 1 July - 22 August 2017: Test data submission period
- Fill in the form and retrieve data
- Submit test data outputs at the latest 48 hours after download and no later than August 22^nd.
25 22 August 2017: Entry submission deadline
5 September 2017: WebNLG meeting at INLG 2017 and presentation of the results of the automatic evaluation
~~30 September~~ October 2017: Results of human evaluation

Organising Committee¶

Claire Gardent, CNRS/LORIA, Nancy, France
Anastasia Shimorina, CNRS/LORIA, Nancy, France
Shashi Narayan, School of Informatics, University of Edinburgh, UK
Laura Perez-Beltrachini, School of Informatics, University of Edinburgh, UK

Contacts¶

webnlg2017@inria.fr

Acknowledgments¶

The WebNLG challenge is funded by the WebNLG ANR Project.

Participation in the Challenge¶

If you plan to participate in the WebNLG challenge, here is how it goes. All requests should be sent to webnlg2017@inria.fr.

Registration¶

Please register using the following form.

Test Data¶

The test data will consist of around 1700 meaning representations (sets of DBpedia triples) equally distributed in terms of size (1 to 7 triples) and divided into two halves. The first half will contain inputs from DBpedia categories that have been seen in the training data (Astronaut, University, Monument, Building, ComicsCharacter, Food, Airport, SportsTeam, City, and WrittenWork), the second half will contain input extracted for entities belonging to 5 unseen categories.

Submitting Results¶

The results must be submitted (email to webnlg2017@inria.fr) to the organisers 48 hours after the organisers have sent the data. To allow for a fair comparison, late submissions will be rejected.

In addition to system outputs, the participants are requested to send by email (webnlg2017@inria.fr) to the organisers a 2 page description of their system. This description will be made available on the WebNLG challenge portal.

Data Format¶

Test Data

Test data will be in the same format as training data (see documentation), but without <lex> sections. Each set of DBpedia triples has an ID.

The example of test data is here.
Submission Entry

Your submission file must be in plain text, lowercased and tokenised. Multiple verbalisations per set of DBpedia triples are not allowed.

The example of submission file is here.

Each line corresponds to a verbalisation of a DBpedia triple set. Line 1 must represent the verbalisation of the DBpedia triple set with the ID=1, line 2 — the DBpedia triple set with the ID=2, etc.

Evaluation¶

Evaluation will proceed in two steps.

First, the results of automatic metrics (BLEU, TER, METEOR) will be provided. We will provide global and detailed results (per DBpedia category, per input size, per Category and Input Size, etc.). These results will be presented at the INLG conference in Santiago de Compostelle, Spain on September 5^th.

Second, the results of a human evaluation will be provided. The human evaluation will seek to assess such criteria as fluency, grammaticality and appropriateness (does the text correctly verbalise the input?)

WebNLG Baseline System¶

For the WebNLG challenge, we provide a baseline system which can serve as a starting point for your experiments.

Scripts to reproduce our experiments are available on GitLab.

Preparing data¶

Linearisation, tokenisation, delexicalisation¶

Unpack the archive with the WebNLG dataset into a data-directory folder.

Run a preprocessing script.

python3 webnlg_baseline_input.py -i <data-directory>

The script extracts tripleset-lexicalisation pairs, linearises triples, performs tokenisation and delexicalisation using the exact match, and writes source and target files.

After the preprocessing, an original pair "tripleset-lexicalisation" [1] is modified into a pair of a source and target sequence [2].

Original [1]¶

<modifiedtripleset>
    <mtriple>Indonesia | leaderName | Jusuf_Kalla</mtriple>
    <mtriple>Bakso | region | Indonesia</mtriple>
    <mtriple>Bakso | ingredient | Noodle</mtriple>
    <mtriple>Bakso | country | Indonesia</mtriple>
</modifiedtripleset>
<lex>
Bakso is a food containing noodles;it is found in Indonesia where Jusuf Kalla is the leader.
</lex>

Modified [2]¶

source files .triple_:

COUNTRY leaderName LEADERNAME FOOD region COUNTRY FOOD ingredient INGREDIENT FOOD country COUNTRY

target files _.lex:

FOOD is a food containing noodles ; it is found in COUNTRY where LEADERNAME is the leader .

The script writes training and validation files which are used as input to neural generation, as well as reference files for evaluation.

Training a model and generating verbalisations¶

A simple sequence-to-sequence model with the attention mechanism was trained using the OpenNMT toolkit using the default parameters for training and translating.

Install OpenNMT.
Navigate to the OpenNMT directory.

Process data files and convert them to the OpenNMT format.

th preprocess.lua \
-train_src <data-directory>/train-webnlg-all-delex.triple \
-train_tgt <data-directory>/train-webnlg-all-delex.lex \
-valid_src <data-directory>/dev-webnlg-all-delex.triple \
-valid_tgt <data-directory>/dev-webnlg-all-delex.lex \
-src_seq_length 70 \
-tgt_seq_length 70 \
-save_data baseline

baseline-train.t7 file will be generated, which is used in the training phase

Train the model.
```
th train.lua -data baseline-train.t7 -save_model baseline
```
After training for 13 epochs, the script outputs the model file baseline_epoch13_*.t7. Training takes several hours on a GPU.

Translating.

th translate.lua -model baseline_epoch13_*.t7 -src <data-directory>/dev-webnlg-all-delex.triple -output baseline_predictions.txt

The script generates the file baseline_predictions.txt.

Relexicalisation¶

Relexicalise data.
```
python3 webnlg_relexicalise.py -i <data-directory> -f <OpenNMT-directory>/baseline_predictions.txt
```
The script generates the file relexicalised_predictions.txt with the initial RDF subjects and objects.

Evaluating on a development set¶

BLEU-score

Calculate BLEU on the development set. We use multi-bleu.pl from Moses SMT. (Note that the official script for MT evaluations is mteval-v13a.pl)
```
./calculate_bleu_dev.sh
```
BLEU = 54.03

Additional note about BLEU scoring: multi-bleu.pl does not work properly in case of references of different length (e.g., one test instance has 3 references, and another has 5), that's why the challenge evaluation was done with three references only.

Consider using other scripts to calculate BLEU:
- SacreBLEU (produces official WMT scores)
- BLEU from NLTK (different smoothing methods available)
- Maluuba metrics for NLG
- metrics used for E2E Challenge
Prepare input files for other evaluation metrics.
```
python3 metrics.py
```

METEOR

Download and install METEOR.

Navigate to the METEOR directory (cd meteor-1.5/).

java -Xmx2G -jar meteor-1.5.jar <data-directory>/relexicalised_predictions.txt <data-directory>/all-notdelex-refs-meteor.txt -l en -norm -r 8

METEOR = 0.39

TER

Download and install TER.

Navigate to the TER directory (cd tercom-0.7.25/).

java -jar tercom.7.25.jar -h <data-directory>/relexicalised_predictions-ter.txt -r <data-directory>/all-notdelex-refs-ter.txt

TER = 0.40

Challenge Results¶

Participant Submissions¶

Download .zip with all the submissions, teams' reports, and a baseline output; or download the same data per team.

ADAPT Centre [submission] [report]
PKUWriter [submission] [report]
Tilburg University [submission-smt] [submission-nmt] [submission-pipeline] [report]
UIT-VNU-HCM [submission] [report]
University of Melbourne [submission] [report]
UPF-FORGe [submission] [report]
Baseline [output]

Automatic Evaluation Results¶

The WebNLG Challenge: Generating Text from RDF Data. C. Gardent, A. Shimorina, S. Narayan, L. Perez-Beltrachini. Proceedings of INLG 2017. PDF
Scripts to reproduce results are here.
The Jupyter notebook with automatic results can be found here. (credit: Abelardo Vieira Mota)

Human Evaluation Results¶

WebNLG Challenge: Human Evaluation Results. A. Shimorina, C. Gardent, S. Narayan, L. Perez-Beltrachini. Technical report. 2018. ~~PDF-v1~~ PDF-v3 (corrected version)
Human scores, references and scripts are here.