The Berkeley Restaurant Project (BeRP) Transcripts
I recently updated the BeRP Transcripts repository with the most recent transcripts of the audio files contained in the Berkeley Restaurant Project (BeRP) corpus.
What is BeRP?
The Berkeley Restaurant Project (BeRP) was a testbed for a speech recognition system developed by the International Computer Science Institute (ICSI) in Berkeley, CA, in the 1990s. The system was designed to be an automated consultant for restaurants in the city of Berkeley.
It served as a platform for research into robust feature extraction, neural-net based phonetic likelihood estimation, automatic induction of multiple pronunciation lexicons, and more.
You can also see a video demonstration of the system in action.
The Corpus
The BeRP corpus contains approximately 8566 utterances, comprising about 7 hours of speech with around 1900 unique words.
You can download the entire ~443MB corpus (audio and transcripts) here.
Probabilistic Context-free Grammar Rules
The repository also includes a set of 1412 hand-written context-free rules dated August 1995 (amer_trn2456_1412rules_aug95.cfg), which were used in the original system.
References
For more history, check out the original papers:
- D. Jurafsky, C. Wooters, et al., “The Berkeley Restaurant Project”, Proceedings ICSLP, 1994.
- C. Wooters, “Lexical Modeling in a Speaker Independent Speech Understanding System”, PhD thesis, 1993.