RWTH ASR (short "RASR") is a software package containing a speech recognition decoder together with tools for the development of acoustic models, for use in speech recognition systems. It has been developed by the Human Language Technology and Pattern Recognition Group at the RWTH Aachen University since 2001. Speech recognition systems developed using this framework have been applied successfully in several international research projects and corresponding evaluations.
RASR consists of several libraries and tools written in C++. Currently, Linux (x86 and x86-64) and Mac OS X (Intel) platforms are supported.
Features
decoder for large vocabulary continuous speech recognition
word conditioned tree search (supporting across-word models)
optimized HMM emission probability calculation using SIMD instructions
refined acoustic pruning using language model lookahead
word lattice generation
feature extraction
a flexible framework for data processing: Flow
MFCC features
PLP features
Gammatone features
voicedness feature
vocal tract length normalization (VTLN)
support for several feature dimension reduction methods (e.g. LDA, PCA)
easy implementation of new features as well as easy integration of external features using Flow networks
acoustic modeling
Gaussian mixture distributions for HMM emission probabilities
phoneme in triphone context (or shorter context)
across-word context dependency of phonemes
allophone parameter tying using phonetic decision trees (classification and regression trees, CART)
globally pooled diagonal covariance matrix (other types of covariance modelling are possible, but not fully tested)
maximum likelihood training
discriminative training (minimum phone error (MPE) criterion)
linear algebra support using LAPACK, BLAS
language modeling
support for language models in ARPA format
weighted grammars (weighted finite state automaton)
neural networks (new in v0.6)
training of arbitrarily deep feed-forward networks
CUDA support for running on GPUs
OpenMP support for running on CPUs
variety of activation functions, training criteria and optimization algorithms
sequence discriminative training, e.g. MMI or MPE (new in v0.7)
integration in feature extraction pipeline ("Tandem approach")
integration in search and lattice processing pipeline ("Hybrid NN/HMM approach")
speaker adaptation
Constrained MLLR (CMLLR, "feature space MLLR", fMLLR)
Unsupervised maximum likelihood linear regression mean adaptation (MLLR)
speaker / segment clustering using Bayesian Information Criterion (BIC) as stop criterion
lattice processing
n-best list generation
confusion network generation and decoding
lattice rescoring
lattice based system combination
input / output formats
nearly all input and output data is in easily process-able XML or plain text formats
converter tools for the generation of NIST file formats are included