speech recognition 2018-technology

Recurrent neural network language model adaptation for conversational speech recognition
free download

We propose two adaptation models for recurrent neural network language models (RNNLMs) to capture topic effects and long distance triggers for conversational automatic speech recognition (ASR). We use a fast marginal adaptation (FMA) framework to adapt a

Investigation on LSTM Recurrent N-gram Language Models for Speech Recognition
free download

Recurrent neural networks (NN) with long short-term memory (LSTM) are the current state of the art to model long term dependencies. However, recent studies indicate that NN language models (LM) need only limited length of history to achieve excellent performance

Improving Attention Based Sequence-to-Sequence Models for End-to-End English Conversational Speech Recognition
free download

In this work, we propose two improvements to attention based sequence-to-sequence models for end-to-end speech recognition systems. For the first improvement, we propose to use an input-feeding architecture which feeds not only the previous context vector but also

Semantic Lattice Processing in Contextual Automatic Speech Recognition for Google Assistant
free download

Recent interest in intelligent assistants has increased demand for Automatic Speech Recognition (ASR) systems that can utilize contextual information to adapt to the users preferences or the current device state. For example, a user might be more likely to refer to

Neural network vs. HMM speech recognition systems as models of human cross-linguistic phonetic perception
free download

The way listeners perceive speech sounds is largely determined by the language (s) they were exposed to as a child. For example, native speakers of Japanese have a hard time discriminating between American English/ô/and/l/, a phonetic contrast that has no equivalent

A Pruned RNNLM Lattice-Rescoring Algorithm for Automatic Speech Recognition
free download

Lattice-rescoring is a common approach to take advantage of recurrent neural language models in ASR, where a word lattice is generated from 1st-pass decoding and the lattice is then rescored with a neural model, and an n-gram approximation method is usually adopted

Data Augmentation Improves Recognition of Foreign Accented Speech
free download

Speech recognition of foreign accented (non-native or L2) speech remains a challenge to the state-of-the-art. The most common approach to address this scenario involves the collection and transcription of accented speech , and incorporating this into the training data

Zero-shot keyword spotting for visual speech recognition in-the-wild
free download

Visual keyword spotting (KWS) is the problem of estimating whether a text query occurs in a given recording using only video information. This paper focuses on visual KWS for words unseen during training, a real-world, practical setting which so far has received no attention

Output-Gate Projected Gated Recurrent Unit for Speech Recognition
free download

In this paper, we describe the work on accelerating decoding speed while improving the decoding accuracy. Firstly, we propose an architecture which we call Projected Gated Recurrent Unit (PGRU) for automatic speech recognition (ASR) tasks, and show that the

The combination of Sparse Principle Component Analysis and Kernel Ridge Regression methods applied to speech recognition problem.
free download

Speech recognition is the important problem in pattern recognition research field. In this paper, the combination of the Sparse Principle Component Analysis method and the kernel ridge regression method will be applied to the MFCC feature vectors of the speech dataset

Domain Adaptation Using Factorized Hidden Layer for Robust Automatic Speech Recognition
free download

Abstract Domain robustness is a challenging problem for automatic speech recognition (ASR). In this paper, we consider speech data collected for different applications as separate domains and investigate the robustness of acoustic models trained on multidomain data on

Combined Speaker Clustering and Role Recognition in Conversational Speech
free download

Abstract Speaker Role Recognition (SRR) is usually addressed either as an independent classification task, or as a subsequent step after a speaker clustering module. However, the first approach does not take speaker-specific variabilities into account, while the second one

Temporal Sensitivity Measured Shortly After Cochlear Implantation Predicts 6-Month Speech Recognition Outcome
free download

Objectives: Psychoacoustic tests assessed shortly after cochlear implantation are useful predictors of the rehabilitative speech outcome. While largely independent, both spectral and temporal resolution tests are important to provide an accurate prediction of speech

Domain-Adversarial Training for Session Independent EMG-based Speech Recognition
free download

We present our research on continuous speech recognition based on Surface Electromyography (EMG), where speech information is captured by electrodes attached to the speakers face. This method allows speech processing without requiring that an acoustic

End-to-end speech recognition using lattice-free MMI
free download

We present our work on end-to-end training of acoustic models using the lattice-free maximum mutual information (LF-MMI) objective function in the context of hidden Markov models. By end-to-end training, we mean flat-start training of a single DNN in one stage

The Hitachi/JHU CHiME-5 system: Advances in speech recognition for everyday home environments using multiple microphone arrays
free download

Speed: 0.9, 1.0, 1.1 Volume: 0.125 2.0 Reverberation: Generate impulse responses of simulated rooms by image method. Follow the settings of {small, medium}-size rooms in . Noise: Add non- speech region of array data with SNR of {20, 15, 10, 5, 0} Bandpass

Language Recognition for Telephone and Video Speech : The JHU HLTCOE Submission for NIST LRE17
free download

This paper presents our newest language recognition systems developed for NIST LRE17. For this challenging limited data multidomain task, we were able to get very good performance with our state-of-the-art DNN senone and bottleneck joint ivector systems by

Automatic speech recognition system development in the wild ,
free download

The standard framework for developing an automatic speech recognition (ASR) system is to generate training and development data for building the system, and evaluation data for the final performance analysis. All the data is assumed to come from the domain of interest

Gaussian Process Neural Networks for Speech Recognition
free download

Deep neural networks (DNNs) play an important role in stateof-the-art speech recognition systems. One important issue associated with DNNs and artificial neural networks in general is the selection of suitable model structures, for example, the form of hidden node activation

Improved Accented Speech Recognition Using Accent Embeddings and Multi-task Learning
free download

One of the major remaining challenges in modern automatic speech recognition (ASR) systems for English is to be able to handle speech from users with a diverse set of accents. ASR systems that are trained on speech from multiple English accents still underperform


COMMENT technology

2017 papers
2016 papers
2015 papers