Paris (6/12), Dessi Puji Lestari (Chief Scientist of Speech Prosa.ai), Teguh Eko Budiarto (CEO of Prosa.ai), and Totok Suhardijanto (Chief Scientist of Linguists Prosa.ai), attended the International Conference Language Technologies for All (LT4All): Enabling Linguistic Diversity and Multilingualism Worldwide held by UNESCO at the UNESCO Headquarters 7 place de Fontenoy, 75005 Paris, France. Dessi represents Institut Teknologi Bandung (ITB) and Prosa.ai, Teguh represents Prosa.ai, while Totok represents Universitas Indonesia (UI) at the conference. The LT4All conference is a three-day event within the framework of the 2019 International Year of Indigenous Language.
Dessi presented two articles at the conference. The first is a paper titled “Indonesian Phoneme Set, Vocabulary, and Pronunciation for Automatic Speech Recognition and Speech Synthesizer” written by Dessi Puji Lestari, Roland Hartanto, Devin Hoesen, Guntario Sukma Cahyani and Sakriani Sakti. The paper describes a design for the Indonesian phoneme set, vocabulary and pronunciation applied for two main speech technology applications, automatic speech recognition and automatic speech synthesizer research for Bahasa Indonesia. The second one is a poster titled “InaNLP: Indonesian Natural Language Processing Tools API”, prepared by Ayu Purwarianti, Dessi Puji Lestari, Teguh Eko Budiarto and Prosa NLP team. InaNLP Is a Natural Language Processing Tool API for Bahasa Indonesia developed by Indonesian scientists, engineers, and linguists. It consists of several NLP tools such as lexical, syntactical, and text classification. It is built with deep learning algorithms, is easy to integrate and combine, and contains manually conducted data annotation by linguists specifying in Bahasa Indonesia literature.
Furthermore, Totok presented a project titled "Building Corpora for Under-Resourced Languages in Indonesia", written by Totok Suhardijanto and Arawinda Dinakaramani. The project is an effort to compile and develop local language resources in Indonesia with funding support collected through research funding from various sources. The corpus management application system has so far three main functions which is; to save the corpus text data of regional languages in digital form, to process and store corpus metadata so that it can be accessed by other software, and to analyze corpus text data by corpus methods such as keyword lists, concordances, n-grams, etc.
The articles presented by Dessi and Totok is in the realm of empowering indigenous language, in line with the purpose of LT4All, which is to take concrete measures for the promotion of linguistic diversity, truly multilingual internet and language technology, focusing on indigenous languages. UNESCO believes that the development of Language Technologies (LT) should provide opportunities to improve the free flow of ideas by word and image in all languages, leaving no one behind, regardless of their age, gender, abilities, language, or location. All language users should have the right to access and use LT in their own languages. The conference itself aims to identify recommendations on how to best harness technology to preserve, support and promote languages, including lesser-used ones, as well as to increase and facilitate communication between them.