null
US
Sign In
Sign Up for Free
Sign Up
We have detected that Javascript is not enabled in your browser. The dynamic nature of our site means that Javascript must be enabled to function properly. Please read our
terms and conditions
for more information.
Next up
Copy and Edit
You need to log in to complete this action!
Register for Free
3240236
Trans-dimensional Random Fields for Language Modeling
Description
Paper by Bin Wang, Zhijian Ou, Zhigiang Tan
No tags specified
random fields
language modeling
probability
Mind Map by
Ivan Zapreev
, updated more than 1 year ago
More
Less
Created by
Ivan Zapreev
over 8 years ago
20
0
0
Resource summary
Trans-dimensional Random Fields for Language Modeling
Introduction
language Modelling (LM)
Joint probability of words
Dominant
Conditional approach
Represent joint probability in terms of conditionals
Alternatives
Random field (RF)
Used in
Whole-sentence maximum entropy (WSME LMs)
Is a Markov Random Field
?
Challenge in Fitting
Evaluating gradient of log likelihood
?
Approximate
Used Sampling methods
Gibbs
?
Independent Metropolis-hashing
?
Importance
?
Can not work efficiently with complex high-dimensional distributions
Empirical results
Not satisfactory
Poor fitted to the data
Exact
Requires high-dimensional integration
?
Poor fitted to the data
Potential benefits
Naturally express sentence level phenomena
?
Integrate features from variety knowledge sources
?
Crucial for
Computational linguistics
Speech recognition
Information retrieval
Etc.
Research
Revisit
Random field (RF)
Innovations
Propose
Trans Dimensional RF model (TDRF)
Idea
Take account of empirical distributions of lengths
?
Allows to develop
Markov Chain Monte Carlo technique
Trans-dimensional mixture sampling
Develop
Training Algorithm
Using
Trans-dimensional mixture sampling
?
Stochastic Approximation (SA) framework
?
Additional innovations
Estimation of Diagonal elements of hessian matrix.
Estimated during SA iterations to rescale the gradient
Improves convergence
Word classing
?
Accelerate sampling
Improve the smoothing behavior
Sharing statistical strength between similar woords
Using multiple CPUs
Parallelize training of RF model
Fitting to the data
Simultaneously update
Model parameters
Normalizing constants
Experiments
Wall Street Journal 92 data
1000 best lists
Oracle WER is 3.4%
Kaldi toolkit, DNN acoustic model
MORE: In paper!
Comparison
TDRF
Performance
As Good As
Recurrent neural networks
Computational
More efficient than
Recurrent neural networks
Computing sentence probability
Recurrent neural networks
?
Show full summary
Hide full summary
Want to create your own
Mind Maps
for
free
with GoConqr?
Learn more
.
Similar
Maths Probability
Will Thorpe
Probability S1
Alice Kimpton
Maths Exponents and Logarithms
Will Thorpe
New GCSE Maths required formulae
Sarah Egan
GCSE Maths: Statistics & Probability
Andrea Leyden
Counting and Probability
Culan O'Meara
Teoría de Conteo
ISABELLA OSPINA SAENZ
Mathematics Prep for maths exam
Lulwah Elhariry
Probability
Dami Alvarez
Higher-order Cognition
Sneha Mittal
Probability
Ravindra Patidar
Browse Library