top of page

WORKSHOP

We are happy to introduce the workshop Wikipedia meets NLP on September 27th

as part of this year's conference!

 

The workshop is organised by the Miljon+ initiative and will focus on Wikipedia as an open language resource. We will address how Wikipedia is being linked to lexical ontologies and how natural language processing is being used in the world's biggest encyclopedia. The practical part will present on how to access the information stored on Wikimedia projects via APIs.

The workshop takes place at the Tartu Environmental Education Center (Loodusmaja) Lille 10, Tartu 51010

SPEAKERS

EDUARD BARBU

University of Tartu / Estonia

Wikipedia Concept Linking

With the advent of the Linked Data initiative and with the increasing speed of globalization it becomes urgent to link the unconnected data sets in possession of public and private institutions. The last ten years saw an expanding effort for connecting several data sets to the world's biggest encyclopedia, Wikipedia (e.g., YAGO). This linking effort it is not new, but it is part of continuing work to interlink lexical ontologies like Wordnet to more formal ones like SUMO or CYC. In this talk, we will discuss the techniques that were used for automatically connected lexical ontologies such as Wordnet to Wikipedia or link entities in text documents to corresponding Wikipedia concepts. Moreover, we will present methods to evaluate the accuracy of the linking process. Knowing the linking methods and their pitfalls will enable data custodians to choose the right tools and techniques for connecting heterogeneous data sets to Wikipedia.

NIKLAS LAXSTRÖM

University of Helsinki and Wikimedia Foundation / Finland

NLP in Wikipedia

Wikipedia and the other Wikimedia projects are rich in natural language. Its strength is multilinguality: Wikipedia is available in close to 300 languages. Even though the data is publicly available, it's not always easy to figure out the best format and way to use it. In this presentation we will explore the relation between wikitext markup and structured data and how to access it via APIs or through
database dumps. We will do this mainly through examples of how natural language processing is used at Wikimedia projects. Special emphasis will be on translation technology. After this presentation you will
have a better understanding of the various kind of

data and how to use it.

WORKSHOP PROGRAM

14:00 Opening

14:15 Invited speaker: Eduard Barbu

15:00 Coffee break

15:30 Invited speaker: Niklas Laxström

16:15 Discussion

16:30 Demos 

17:30 Discussion

19:00 Welcome reception of BHLT18

bottom of page