Skip to content

What is WebNLG Challenge?

Generating Text from RDF Data

The WebNLG challenge consists in mapping data to text. The training data consists of Data/Text pairs where the data is a set of triples extracted from DBpedia and the text is a verbalisation of these triples. For instance, given the 3 DBpedia triples shown in (a), the aim is to generate a text such as (b).

a. (John_E_Blaha birthDate 1942_08_26) (John_E_Blaha birthPlace San_Antonio) (John_E_Blaha occupation Fighter_pilot)
b. John E Blaha, born in San Antonio on 1942-08-26, worked as a fighter pilot

As the example illustrates, the task involves specific NLG subtasks such as sentence segmentation (how to chunk the input data into sentences), lexicalisation (of the DBpedia properties), aggregation (how to avoid repetitions) and surface realisation (how to build a syntactically correct and natural sounding text).

Motivations

The WebNLG data (Gardent el al., 2017) was created to promote the development (i) of RDF verbalisers and (ii) of microplanners able to handle a wide range of linguistic constructions.

KB Verbalizers. The RDF language in which DBpedia is encoded is widely used within the Linked Data framework. Many large scale datasets are encoded in this language; e.g., MusicBrainz, FOAF, LinkedGeoData and official institutions increasingly publish their data in this format. Being able to generate good quality text from RDF data would permit e.g., making this data more accessible to lay users, enriching existing text with information drawn from knowledge bases such as DBpedia or describing, comparing and relating entities present in these knowledge bases.

Microplanning. While many recent datasets for generation takes as input dialogue act meaning representations which can be viewed as trees of depth one, the WebNLG data was carefully constructed to allow for input trees of various shapes and depth and thereby allow for greater syntactic diversity in the corresponding text (see Gardent el al., 2017). We hope that the WebNLG challenge will drive the deep learning community to take up this new challenge and work on the development of neural generators that can handle the generation of linguistically rich texts.