Cybertools and Sinhala archive will improve analysis of world’s 7,000 languages

By Karene Booker
Reprinted from the Cornell Chronicle, October 31, 2011

Sri-Lankan child with researcher

Kamal de Abrew, Cornell Ph.D. '81, and a professor at the American National College in Sri Lanka, tests a child in the acquisition of the Sinhala language. Photo by James Gair

A new generation of cybertools developed at Cornell will help researchers share and analyze rare Sri Lankan language recordings important for studying language acquisition in children.

The Sinhala language, only spoken on the island of Sri Lanka, is "very precious" because of the unique way it is structured, said project leader Barbara Lust, professor of human development in the College of Human Ecology and director of the Cornell Language Acquisition Lab. It provides an invaluable opportunity to research which aspects of language acquisition are universal or biologically programmed and which are culturally determined, she said.

The electronic tools, which were developed in collaboration with Cornell's language acquisition program and other U.S. and international researchers, will be particularly helpful to researchers studying the world's nearly 7,000 languages, Lust said. "They may facilitate widespread archiving and research collaboration across languages and teach a new generation of students the collaborative process of data collection and data management, aided by these tools."

Lust's research team has partnered with Cornell's Albert R. Mann Library, where the Sinhala child language data archive is being used to test Internet-based data sharing tools. Using specialized software, semantic Web technologies and sound metadata systems and standards, the library's DataStaR project is designed to help researchers store, share and combine their data more easily, making it available for further analysis.

Of interest not only to language scholars but also to psychologists, anthropologists and scholars of South Asian studies, the Sinhala child language data include more than 150 hours of audio recordings from about 450 Sri Lankan children aged 2 to 6 years of age, collected between 1980 and 1989. The audio samples, including both natural speech and experimentally elicited sentences, are accompanied by transcriptions in Sinhala and English.

"This project would never have developed without the world-renown program of Sinhala linguistic study developed here at Cornell by Professor Emeritus James Gair and all the students who were part of that project," said Lust, also crediting Cornell undergraduates as being critical to the archiving process, and Maria Blume, Ph.D. '02, now assistant professor at University of Texas at El Paso, who led much of the cybertool development and is now developing a similar Spanish database.

The project was supported by the National Science Foundation and the American Institute of Sri Lankan Studies.

Karene Booker is an extension support specialist in the Department of Human Development.

Related Links:
Barbara Lust
Virtual Center for Language Acquisition
Albert R. Mann Library DataStaR project
College of Human Ecology