25-27th October 2019, Devin Hoesen (Prosa.ai Lead of Speech Engineers) and Dessi Puji Lestari (Prosa.ai Chief Scientist of Speech) attended the 22nd Oriental COCOSDA held at the University of San Carlos (Talamban Campus), Cebu City, Philippines. Devin Hoesen attended the conference as one of the presenters, while Dessi Puji Lestari also represented Indonesia as a lecturer from Institut Teknologi Bandung.
COCOSDA (The International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques) aims to promote international cooperation in developing speech corpora and coordinating assessment methods of speech input/output systems in the East and Southeast Asia. It also accommodates its participants to exchange ideas, share information and discuss regional matters on creation, utilization, dissemination of spoken language corpora of oriental languages, and also on the assessment methods of speech recognition/synthesis systems as well as to promote speech research on oriental languages.
As one of the presenters, Devin Hoesen presented a paper titled “Automatic Pronunciation Generator for Indonesian Speech Recognition System Based on Sequence-to-Sequence Model” written by Devin Hoesen, Fanda Yuliana Putri, and Dessi Puji Lestari. The paper offered a sequence-to-sequence (seq2seq) approach as a solution for automatically generating pronunciation for each Indonesian word in an Automatic Speech Recognition (ASR) lexicon. This approach could supplant the manual method which needed a large number of resources and time.
The cross-validation experiment for validating the resulting phoneme sequences achieved a 4.15-6.24% phone error rate (PER). An ASR using the automatically-produced lexicon yielded a minor degradation of 2.22 percentage point of word accuracy compared to the one using the manually-produced lexicon. Therefore, the proposed model could efficiently create a new large pronunciation dictionary for Bahasa Indonesia ASR without degrading the recognition accuracy significantly.
Moreover, Dessi Puji Lestari presented a paper titled “Indonesian-English Code-switching Handling Using Polyglot Technique on Indonesian Text-to-Speech System”, written by Gisela Supardi, Guntario Sukma Cahyani, and Dessi Puji Lestari. The paper offered a solution for the code-switching phenomenon, a condition where language switching occurs between the main language (L1) and one or more foreign language(s) (L2), by using the polyglot technique. The CS phenomenon causes a degradation in system performance because the Indonesian text-to-speech (TTS) system cannot pronounce foreign words.
For this research, English was the chosen L2 because it is one of the most frequent foreign languages used in Indonesia. The English speech database was obtained by using the polyglot speaker inventory approach. The performance of the polyglot TTS system was measured by comparing it against a monolingual Indonesian TTS system as a baseline. The subjective test results show that the polyglot technique successfully overcomes code-switching. However, the polyglot technique also has an impact on system performance reduction in terms of prosody quality and system efficiency.
Both papers focus on speech processing for Bahasa Indonesia, one of Prosa.ai’s research and business focus along with natural language processing for Bahasa Indonesia and computer vision.