Supervised Grapheme-to-Phoneme Conversion of Orthographic Schwas in Hindi and Punjabi

Aryaman Arora, Luke Gessler, Nathan Schneider

Abstract Paper Share

Phonology, Morphology and Word Segmentation Short Paper

Session 13B: Jul 8 (13:00-14:00 GMT)
Session 14A: Jul 8 (17:00-18:00 GMT)
Abstract: Hindi grapheme-to-phoneme (G2P) conversion is mostly trivial, with one exception: whether a schwa represented in the orthography is pronounced or unpronounced (deleted). Previous work has attempted to predict schwa deletion in a rule-based fashion using prosodic or phonetic analysis. We present the first statistical schwa deletion classifier for Hindi, which relies solely on the orthography as the input and outperforms previous approaches. We trained our model on a newly-compiled pronunciation lexicon extracted from various online dictionaries. Our best Hindi model achieves state of the art performance, and also achieves good performance on a closely related language, Punjabi, without modification.
You can open the pre-recorded video in a separate window.
NOTE: The SlidesLive video may display a random order of the authors. The correct author list is shown at the top of this webpage.

Similar Papers

SpellGCN: Incorporating Phonological and Visual Similarities into Language Models for Chinese Spelling Check
Xingyi Cheng, Weidi Xu, Kunlong Chen, Shaohua Jiang, Feng Wang, Taifeng Wang, Wei Chu, Yuan Qi,
A representative figure from paper main.81
Unsupervised Paraphasia Classification in Aphasic Speech
Sharan Pai, Nikhil Sachdeva, Prince Sachdeva, Rajiv Ratn Shah,
A representative figure from paper srw.19
Building a Japanese Typo Dataset from Wikipedia's Revision History
Yu Tanaka, Yugo Murawaki, Daisuke Kawahara, Sadao Kurohashi,
A representative figure from paper srw.129