WORKSHOP
We are happy to introduce the workshop Wikipedia meets NLP on September 27th
as part of this year's conference!
The workshop is organised by the Miljon+ initiative and will focus on Wikipedia as an open language resource. We will address how Wikipedia is being linked to lexical ontologies and how natural language processing is being used in the world's biggest encyclopedia. The practical part will present on how to access the information stored on Wikimedia projects via APIs.
The workshop takes place at the Tartu Environmental Education Center (Loodusmaja) Lille 10, Tartu 51010
SPEAKERS
EDUARD BARBU
University of Tartu / Estonia
Wikipedia Concept Linking
With the advent of the Linked Data initiative and with the increasing speed of globalization it becomes urgent to link the unconnected data sets in possession of public and private institutions. The last ten years saw an expanding effort for connecting several data sets to the world's biggest encyclopedia, Wikipedia (e.g., YAGO). This linking effort it is not new, but it is part of continuing work to interlink lexical ontologies like Wordnet to more formal ones like SUMO or CYC. In this talk, we will discuss the techniques that were used for automatically connected lexical ontologies such as Wordnet to Wikipedia or link entities in text documents to corresponding Wikipedia concepts. Moreover, we will present methods to evaluate the accuracy of the linking process. Knowing the linking methods and their pitfalls will enable data custodians to choose the right tools and techniques for connecting heterogeneous data sets to Wikipedia.
NIKLAS LAXSTRÖM
University of Helsinki and Wikimedia Foundation / Finland
NLP in Wikipedia
Wikipedia and the other Wikimedia projects are rich in natural language. Its strength is multilinguality: Wikipedia is available in close to 300 languages. Even though the data is publicly available, it's not always easy to figure out the best format and way to use it. In this presentation we will explore the relation between wikitext markup and structured data and how to access it via APIs or through
database dumps. We will do this mainly through examples of how natural language processing is used at Wikimedia projects. Special emphasis will be on translation technology. After this presentation you will
have a better understanding of the various kind of
data and how to use it.