LIVAC Synchronous Corpus

Daga Wikipedia, Insakulofidiya ta kyauta.
LIVAC Synchronous Corpus
software
Bayanai
Farawa ga Yuli, 1995
Amfani text corpus (en) Fassara
Ranar wallafa ga Yuli, 1995
Operating system (en) Fassara cross-platform (en) Fassara
Shafin yanar gizo livac.org

LIVAC wani harshe ne da ba a saba gani ba wanda ake kiyaye shi sosai tun 1995. Bambanta da sauran kamfanoni masu zaman kansu, LIVAC ta rungumi tsarin "Windows" mai tsauri kuma na yau da kullun wajen sarrafawa da tace manyan rubuce-rubucen kafofin watsa labarai daga wakilan jama'ar Sinanci kamar Beijing, Hong Kong, Macau, Taipei, Singapore, Shanghai, da Guangzhou, da kuma Shenzhen . [1] Abubuwan da ke ciki suna maimaituwa da gangan a mafi yawan lokuta, wakilta ta samfuran rubutu da aka zana daga editoci, labarai na gida da na duniya, labaran giciye- Mashigin tekun Taiwan, da labarai kan kudi, wasanni da nishaɗi. [2] Ya zuwa shekarar 2023, an tace sama da haruffa biliyan 3 na rubutun labarai, wanda aka sarrafa da kuma tantance haruffa miliyan 700 kuma sun samar da fadada ƙamus na Pan-China na kalmomi miliyan 2.5 daga kafofin watsa labaru na Pan-Chinese. Ta hanyar tsattsauran bincike bisa tsarin ilimin lissafi, LIVAC a lokaci guda ta tattara cikakkun bayanai masu ma'ana masu ma'ana game da yaren Sinanci da kuma al'ummominsu na magana daban-daban a cikin mahallin Pan-Sin, kuma sakamakon ya nuna tsayi mai tsayi da muhimmanci. tsaye da kuma sauye-sauye masu tasowa. [3] [4]

Hanyar "Windows" ita ce mafi kyawun fasalin LIVAC kuma ya ba da damar nazarin rubutun kafofin watsa labaru na Pan-Chin don ƙididdige su bisa ga halaye daban-daban kamar wurare, lokaci da yanki . Don haka, nau'ikan nazarin kwatance daban-daban da aikace-aikace a cikin fasahar bayanai gami da haɓaka sabbin aikace-aikacen sabbin abubuwa sun kasance mai yiwuwa. [5] [6] Bugu da ƙari, LIVAC ta ba da damar yin la'akari da ci gaban tsayin daka, sauƙaƙe bincike mai mahimmanci a cikin Mahimmanci (KWIC) da kuma cikakken nazarin kalmomin da aka yi niyya da abubuwan da ke cikin su da kuma tsarin harshe a cikin shekaru 25 da suka gabata, dangane da abubuwan da aka ambata a sama. wuri, lokaci da batun . Sakamako daga ɗimbin bayanan tattara bayanai da ke ƙunshe a cikin LIVAC sun ba da damar noman bayanai na rubutu na sunaye masu kyau, sunayen wuri, sunayen ƙungiyoyi, sabbin kalmomi, da jerin sunayen mako-mako da na shekara-shekara na ƙididdigar kafofin watsa labarai. Aikace-aikacen da ke da alaƙa sun haɗa da kafa bayanan fi'ili da bayanan sifa, ƙirƙira fihirisar jin daɗi, da ma'adinan ra'ayi masu alaƙa, don aunawa da kwatanta shaharar da manyan kafofin watsa labaru na duniya ke da shi a cikin kafofin watsa labaru na kasar Sin (LIVAC Annual Pan-Chinese Celebrity Rosters, daga baya aka sake masa suna a matsayin Pan-Chinese Newsmaker Rosters), [7] [8] [9] da kuma hada sabbin bayanan bayanan kalmomi (LIVAC Annual Pan-Chinese New Word Rosters). [10] [11] [12] A kan wannan, ana yin nazarin fitowar, yaduwa da canza sabbin kalmomi, da buga kamus na neologisms . [13] [14]

An mayar da hankali a kwanan nan kan ma'auni tsakanin kalmomin dissyllabic da girma kalmomin trisyllabic a cikin yaren Sinanci, da nazarin kwatancen fi'ili masu haske a cikin al'ummomin Sinawa guda uku. da kuma alakar amfani da harshe da kuma amfani da harshe a matsayin abin da ke nuni da sauyin zamanin da aka yi a kasar Sin. An ƙaddamar da sabon nau'in LIVAC 3.1 a cikin Fabrairu 2024.,..

sarrafa bayanan Corpus[gyara sashe | gyara masomin]

  1. Accessing media texts, manual input, etc.
  2. Text unification including conversion from simplified to traditional Chinese characters, stored as Big5 and Unicode versions
  3. Automatic word segmentation
  4. Automatic alignment of parallel texts
  5. Manual verification, part-of-speech tagging
  6. Extraction of words and addition to regional sub-corpora
  7. Combination of regional sub-corpora to update the LIVAC corpus, and master lexical database

Lakabi don sarrafa bayanai[gyara sashe | gyara masomin]

  1. Rukunin da aka yi amfani da su sun haɗa da gabaɗaya sharuɗɗa da sunaye masu dacewa, kamar: gaba ɗaya sunaye, sunayen sunaye, ƙananan lakabi; yanki, kungiyoyi da ƙungiyoyin kasuwanci, da dai sauransu; lokaci, prepositions, wurare, da dai sauransu; tari-kalmomi; kalmomin lamuni; harka-kalma; lambobi, da dai sauransu.
  2. Gina rumbun adana bayanai na sunayen da suka dace, sunayen wuri, da takamaiman sharuɗɗan, da sauransu.
  3. Ƙirƙirar rosters: "sabbin rubutun kalmomi", "shahararru ko halayen kafofin watsa labaru", "maganin sunan wuri", kalmomi masu haɗaka da kalmomin da suka dace
  4. Sauran sassan magana tagging ga sub-database, kamar gama-gari sunaye, lambobi, ƙididdiga na lamba, nau'ikan fi'ili daban-daban, da na siffa, karin magana, lallausan gabaɗaya, haɗin kai, barbashi alamar yanayi, onomatopoeia, interjection, da sauransu.

Aikace-aikace[gyara sashe | gyara masomin]

  1. Haɗa ƙamus na Pan- Sinanci ko ƙamus na gida
  2. Binciken fasahar sadarwa, kamar shigar da rubutun Sinanci mai tsinkaya don wayoyin hannu, magana ta atomatik zuwa canza rubutu, ma'adinan ra'ayi
  3. Nazarin kwatancen kan ci gaban harshe da al'adu a yankunan Pan-China, musamman a wani muhimmin lokaci na tarihi a kasar Sin ta zamani.
  4. Koyarwar harshe da bincike koyo, da jujjuya magana zuwa rubutu
  5. Sabis na musamman akan bincike na harshe da binciken ƙamus na ƙungiyoyin ƙasa da ƙasa da hukumomin gwamnati


</br>Ana samar da aikace-aikacen da ke sama ta ayyuka masu zuwa:

  • Binciken Rabe-raben Kalma
  • Binciken Kalma
  • Misali Zaɓin Jumla
  • Kwatanta kalmomi da yawa
  • Kalmar Cloud

Duba kuma[gyara sashe | gyara masomin]

  • British National Corpus
  • Oxford English Corpus
  • Corpus of Contemporary American English (COCA)
  • 語料庫

Manazarta[gyara sashe | gyara masomin]

  1. Tsou, Benjamin; Lai, Tom; Chan, Samuel; and Wang, William S.-Y. (Eds). (1998). Quantitative and Computational Studies on the Chinese Language 《漢語計量與計算研究》. Language Information Sciences Research Centre, City University Press.
  2. Tsou, B. K., Kwong, O.Y. (Eds). (2015). Linguistic Corpus and Corpus Linguistics in the Chinese Context (Journal of Chinese Linguistics Monograph Series Number 25), Hong Kong: Chinese University Press.
  3. Tsou, Benjamin. (2004). "Chinese Language Processing at the Dawn of the 21st Century", in C R Huang and W Lenders (eds) Language and Linguistics Monograph Series B: Frontiers in Linguistics I, pp.189–207. Institute of Linguistics, Academia Sinica.
  4. Tsou, B. K. (2017). Loanwords in Mandarin Through Other Chinese Dialects. In R. Sybesma, W. Behr, Y. Gu, Z. Handel, C.-T. Huang & J. Myers (Eds.), The Encyclopaedia of Chinese Language and Linguistics (Vol. 2, pp. 641-647). Leiden; Boston: BRILL
  5. Tsou, Benjamin, and Kwong, Olivia. (2015). LIVAC as a Monitoring Corpus for Tracking Trends beyond Linguistics. In Tsou, Benjamin, and Kwong, Olivia., (eds.), Linguistic Corpus and Corpus Linguistics in the Chinese Context (Journal of Chinese Linguistics Monograph Series No.25). Hong Kong: The Chinese University Press, pp. 447-471.
  6. Tsou, Benjamin. (2016). Skipantism Revisited: Along with Neologisms and Terminological Truncation. In Chin, Chi-on Andy and Kwok, Bit-chee and Tsou, Benjamin K., (eds.), Commemorative Essays for Professor Yuen-Ren Chao: Father of Modern Chinese Linguistics. Taiwan: Crane Publishing. pp. 343-357.
  7. CityU releases 2015 LIVAC Pan-Chinese Media Personality Roster, City University of Hong Kong, Hong Kong, 28 December 2015.
  8. CityU releases 2016 LIVAC Pan-Chinese Media Personality Roster Archived 2017-07-15 at the Wayback Machine, City University of Hong Kong, Hong Kong, 02 January 2017.
  9. CityU releases 2019 LIVAC Pan-Chinese Media Personality Roster, City University of Hong Kong, Hong Kong, 07 January 2019.
  10. CityU releases 2014 Pan-Chinese New Word Rosters, City University of Hong Kong, Hong Kong, 12 February 2015.
  11. CityU releases 2015 LIVAC Pan-Chinese New Word Rosters, City University of Hong Kong, Hong Kong, 04 February 2016.
  12. CityU releases 2019 LIVAC Pan-Chinese New Word Rosters, City University of Hong Kong, Hong Kong, 09 January 2019.
  13. 鄒嘉彥、游汝杰(編)(2007),《21世紀華語新詞語詞典》(簡體字版),上海,復旦大學出版社。
  14. 鄒嘉彥、游汝杰(編)(2010),《全球華語新詞語詞典》,北京,商務印書館。

Hanyoyin haɗi na waje[gyara sashe | gyara masomin]