Digital Services        F A Q


Digital tools for researching Wikipedia

Centre for Translation (December 16, 2021)


Digital research methods to identify, analyze and visualize conflicts in ...
Corpus-based Wikipedia studies: Theoretical and methodological challenges ...
Tales of the “fish in your ear”: How does Wikipedia help to shape a ...
Scraping Wikipedia articles
Digital tools for researching Wikipedia
MAJOR SPEAKER : Shuttleworth, Mark
LENGTH : 84 min.
ACCESS : Open to all
SUMMARY : Wikipedia is the world's largest online encyclopaedia. It has 303 active language editions, which were accessed from 1.7bn unique devices during October 2020. Now over twenty years old, the encyclopaedia has been studied by academics working within a range of disciplines since the mid-2000s, although it is only relatively recently that it has started attracting the attention of translation scholars too. During a short space of time we have learnt a considerable amount about topics such as translation quality, translation and cultural remembrance, multilingual knowledge production and point of view, the prominent role played by narratives in articles reporting on news stories, and how translation is portrayed in multiple language versions of the Wikipedia article on the term itself. However, translation largely remains Wikipedia's "dark matter": not only is it difficult to locate, but researchers have so far struggled to map out the full extent of its contribution to this multilingual resource. Our aim in organising this international event is to allow the research community to take stock of the progress made so far and to identify new avenues for future work.

Unlike other Wikipedia research that focuses on big data analytics, research on the “dark matter” of Wikipedia attaches importance to the distinctive features and evolution of one or several articles across interlingual versions. This implies that the scraping method should avoid overlooking any fragments or details of the article while keeping the text clean and readable for further processing. In this workshop, several implementations of scraping Wikipedia articles will be introduced for a wide variety of research scenarios. Most of these methods are supported by official documentation. With the help of interfaces and parsers provided by Wikipedia and other developers, users are able to control exactly what Wikipedia content they want to get, such as tables, quotations, illustrations, etc. The basis of programming and data science will also be introduced in this workshop. Based on Google's Colab platform and Python's rich libraries, participants will be able to get the idea of scraping Wikipedia without installing any additional software on their computer.  [Go to the full record in the library's catalogue]

  ●  Persistent link:
  ●  XML Dublin Core code for metadata harvesting

Recommended for You

This video is presented here with the permission of the speakers. Any downloading, storage, reproduction, and redistribution are strictly prohibited without the prior permission of the respective speakers. Go to Full Disclaimer.

  For enquiries, please contact Digital and Multimedia Services Section

© 2009-2023 All rights reserved