Digital technology has created a profound impact on traditional humanities research, giving rise to the field of Digital Humanities. One notable application of digital humanities in linguistics is the construction of linguistic corpora with a massive amount of authentic and natural language data. Corpora offer quantitative and qualitative data on language use, enabling detailed analysis of language structures, and providing valuable insights into language variations, change over time, and (new) linguistic patterns.
This talk is about corpus-based studies of Cantonese, a language spoken by nearly 90% of the Hong Kong’s population as the first language. In the past, Cantonese was mainly studied as a Chinese dialect under the Chinese dialectological framework. Since the 1990s, various Cantonese corpora of different nature were constructed for linguistic and humanities research.
In 2013, under the support of RGC’s ECS, I developed The Corpus of Mid-20th Century Hong Kong Cantonese (https://hkcc.eduhk.hk/). The corpus, with a size of nearly 900,000 characters, was constructed by transcribing the dialogues from 80 black-and-white Cantonese movies produced in Hong Kong between 1940 and 1970. The corpus provides real time language data for documenting, preserving, and revitalizing the Cantonese language and its culture of the then Hong Kong. [Go to the full record in the library's catalogue]
This video is presented here with the permission of the speakers.
Any downloading, storage, reproduction, and redistribution are strictly prohibited
without the prior permission of the respective speakers.
Go to Full Disclaimer.
This video is archived and disseminated for educational purposes only. It is presented here with the permission of the speakers, who have mandated the means of dissemination.
Statements of fact and opinions expressed are those of the inditextual participants. The HKBU and its Library assume no responsibility for the accuracy, validity, or completeness of the information presented.
Any downloading, storage, reproduction, and redistribution, in part or in whole, are strictly prohibited without the prior permission of the respective speakers. Please strictly observe the copyright law.