Supervised Grapheme-to-Phoneme Conversion of Orthographic Schwas in Hindi and Punjabi
Aryaman Arora, Luke Gessler, Nathan Schneider
Phonology, Morphology and Word Segmentation Short Paper
Session 13B: Jul 8
(13:00-14:00 GMT)
Session 14A: Jul 8
(17:00-18:00 GMT)
Abstract:
Hindi grapheme-to-phoneme (G2P) conversion is mostly trivial, with one exception: whether a schwa represented in the orthography is pronounced or unpronounced (deleted). Previous work has attempted to predict schwa deletion in a rule-based fashion using prosodic or phonetic analysis. We present the first statistical schwa deletion classifier for Hindi, which relies solely on the orthography as the input and outperforms previous approaches. We trained our model on a newly-compiled pronunciation lexicon extracted from various online dictionaries. Our best Hindi model achieves state of the art performance, and also achieves good performance on a closely related language, Punjabi, without modification.
You can open the
pre-recorded video
in a separate window.
NOTE: The SlidesLive video may display a random order of the authors.
The correct author list is shown at the top of this webpage.