Wikipedia is the world's largest online encyclopaedia. It has 303 active language editions, which were accessed from 1.7bn unique devices during October 2020. Now over twenty years old, the encyclopaedia has been studied by academics working within a range of disciplines since the mid-2000s, although it is only relatively recently that it has started attracting the attention of translation scholars too. During a short space of time we have learnt a considerable amount about topics such as translation quality, translation and cultural remembrance, multilingual knowledge production and point of view, the prominent role played by narratives in articles reporting on news stories, and how translation is portrayed in multiple language versions of the Wikipedia article on the term itself. However, translation largely remains Wikipedia's "dark matter": not only is it difficult to locate, but researchers have so far struggled to map out the full extent of its contribution to this multilingual resource. Our aim in organising this international event is to allow the research community to take stock of the progress made so far and to identify new avenues for future work.
Unlike other Wikipedia research that focuses on big data analytics, research on the “dark matter” of Wikipedia attaches importance to the distinctive features and evolution of one or several articles across interlingual versions. This implies that the scraping method should avoid overlooking any fragments or details of the article while keeping the text clean and readable for further processing. In this workshop, several implementations of scraping Wikipedia articles will be introduced for a wide variety of research scenarios. Most of these methods are supported by official documentation. With the help of interfaces and parsers provided by Wikipedia and other developers, users are able to control exactly what Wikipedia content they want to get, such as tables, quotations, illustrations, etc. The basis of programming and data science will also be introduced in this workshop. Based on Google's Colab platform and Python's rich libraries, participants will be able to get the idea of scraping Wikipedia without installing any additional software on their computer. [Go to the full record in the library's catalogue]
This video is presented here with the permission of the speakers.
Any downloading, storage, reproduction, and redistribution are strictly prohibited
without the prior permission of the respective speakers.
Go to Full Disclaimer.
Full Disclaimer
This video is archived and disseminated for educational purposes only. It is presented here with the permission of the speakers, who have mandated the means of dissemination.
Statements of fact and opinions expressed are those of the inditextual participants. The HKBU and its Library assume no responsibility for the accuracy, validity, or completeness of the information presented.
Any downloading, storage, reproduction, and redistribution, in part or in whole, are strictly prohibited without the prior permission of the respective speakers. Please strictly observe the copyright law.