Jump to content

Harshen Larabci

Daga Wikipedia, Insakulofidiya ta kyauta.
Harshen Larabci

Larabci Speech Corpus shine Tsarin Magana na Larabci na Zamani (MSA) don haɗa magana . Ƙungiyar ta ƙunshi rubutattun sauti da rubutu na fiye da sa'o'i 3.7 na magana ta MSA daidai da jawabin da aka yi rikodi akan matakin sautin wayar. Bayanan bayanan sun haɗa da alamun damuwa na kalmomi akan ɗayan wayoyi. [1]

An gina jikin gawar ne don dalilai na haɗa magana, musamman Maganar Magana, amma an yi amfani da gawar don gina muryoyin HMM a cikin Larabci. An kuma yi amfani da shi don daidaita sauran ƙungiyoyin magana ta atomatik tare da kwafin sautinsu kuma ana iya amfani da shi azaman wani yanki na babban ƙungiyar don horar da tsarin tantance magana.

Abubuwan da ke ciki

[gyara sashe | gyara masomin]

Kunshin ya ƙunshi abubuwa masu zuwa:

  • 1813 .wav fayiloli dauke da maganganun magana.
  • 1813 .Lab files dauke da kalaman rubutu.
  • 1813 .TextGrid fayiloli dauke da lakabin wayar hannu tare da tambarin lokaci na iyakokin inda waɗannan ke faruwa a cikin fayilolin .wav.
  • phonetic-transcript.txt wanda ke da nau'in "[wav_filename]" "[Sequence na waya]" a kowane layi.
  • orthographic-transcript.txt wanda ke da nau'in "[wav_filename]" "[Orthographic Transcript]" a kowane layi. Orthography yana cikin tsarin Buckwalter wanda ya fi abokantaka inda akwai software da ba ta karanta rubutun larabci. Ana iya mayar da shi cikin sauƙi zuwa Larabci.
  • Akwai ƙarin mintuna 18 na cikakken bayanin corpus (raba da sama amma tare da tsari iri ɗaya kamar na sama) wanda aka yi amfani da shi don kimanta ƙungiyar (duba karatun PhD).

An kuma yi amfani da gawar don tabbatar da cewa ta yin amfani da cirewa ta atomatik, alamomin damuwa na tushen rubutun suna inganta ingancin haɗin magana a cikin MSA.

  • Kwatanta bayanan bayanai a cikin koyon injin

[1]

  1. Halabi, Nawar (2016). Modern Standard Arabic Phonetics for Speech Synthesis (PDF) (PhD Thesis). University of Southampton, School of Electronics and Computer Science.