Digital Services        F A Q


Transforming Humanities Research into Digital Humanities Research: The Corpus of Mid-20th Century Hong Kong Cantonese

University Library (October 27, 2023)

CONFERENCE / SYMPOSIUM : Fall Symposium on Digital Scholarship 2023

Fall Symposium on Digital Scholarship 2023@HKBU
AI as the Publics’ Shadow in an Inquiry-Driven Society
Transforming Humanities Research into Digital Humanities Research: The ...
100 Chinese Protestant Christian Hymns from Qing China (天使歌聲: ...
Synergy between Tradition and the Contemporary – Brush-and-Ink, ...
HKBU Digital Project Series on Sun Yat-sen Studies
Bridging Art Archive to the World Largest Linked Open Database - The ...
Legislative Council Archives: Digital Initiatives in preserving and ...
Creating a Digital Database on Daoist Literati and Gentry in Guangdong
LENGTH : 43 min.
ACCESS : Open to all
SUMMARY : Digital technology has created a profound impact on traditional humanities research, giving rise to the field of Digital Humanities. One notable application of digital humanities in linguistics is the construction of linguistic corpora with a massive amount of authentic and natural language data. Corpora offer quantitative and qualitative data on language use, enabling detailed analysis of language structures, and providing valuable insights into language variations, change over time, and (new) linguistic patterns.

This talk is about corpus-based studies of Cantonese, a language spoken by nearly 90% of the Hong Kong’s population as the first language. In the past, Cantonese was mainly studied as a Chinese dialect under the Chinese dialectological framework. Since the 1990s, various Cantonese corpora of different nature were constructed for linguistic and humanities research.

In 2013, under the support of RGC’s ECS, I developed The Corpus of Mid-20th Century Hong Kong Cantonese ( The corpus, with a size of nearly 900,000 characters, was constructed by transcribing the dialogues from 80 black-and-white Cantonese movies produced in Hong Kong between 1940 and 1970. The corpus provides real time language data for documenting, preserving, and revitalizing the Cantonese language and its culture of the then Hong Kong.  [Go to the full record in the library's catalogue]

  ●  Persistent link:
  ●  XML Dublin Core code for metadata harvesting

Recommended for You

This video is presented here with the permission of the speakers. Any downloading, storage, reproduction, and redistribution are strictly prohibited without the prior permission of the respective speakers. Go to Full Disclaimer.

  For enquiries, please contact Digital and Multimedia Services Section

© 2009-2023 All rights reserved