Trans-dimensional Random Fields for Language Modeling

Description

Paper by Bin Wang, Zhijian Ou, Zhigiang Tan
Ivan Zapreev
Mind Map by Ivan Zapreev, updated more than 1 year ago
Ivan Zapreev
Created by Ivan Zapreev over 8 years ago
20
0

Resource summary

Trans-dimensional Random Fields for Language Modeling
  1. Introduction
    1. language Modelling (LM)
      1. Joint probability of words
        1. Dominant
          1. Conditional approach
            1. Represent joint probability in terms of conditionals
          2. Alternatives
            1. Random field (RF)
              1. Used in
                1. Whole-sentence maximum entropy (WSME LMs)
                  1. Is a Markov Random Field
                    1. ?
                    2. Challenge in Fitting
                      1. Evaluating gradient of log likelihood
                        1. ?
                          1. Approximate
                            1. Used Sampling methods
                              1. Gibbs
                                1. ?
                                2. Independent Metropolis-hashing
                                  1. ?
                                  2. Importance
                                    1. ?
                                    2. Can not work efficiently with complex high-dimensional distributions
                                      1. Empirical results
                                        1. Not satisfactory
                                          1. Poor fitted to the data
                                  3. Exact
                                    1. Requires high-dimensional integration
                                      1. ?
                                  4. Poor fitted to the data
                              2. Potential benefits
                                1. Naturally express sentence level phenomena
                                  1. ?
                                  2. Integrate features from variety knowledge sources
                                    1. ?
                            2. Crucial for
                              1. Computational linguistics
                                1. Speech recognition
                                  1. Information retrieval
                                    1. Etc.
                              2. Research
                                1. Revisit
                                  1. Random field (RF)
                                    1. Innovations
                                      1. Propose
                                        1. Trans Dimensional RF model (TDRF)
                                          1. Idea
                                            1. Take account of empirical distributions of lengths
                                              1. ?
                                              2. Allows to develop
                                                1. Markov Chain Monte Carlo technique
                                                  1. Trans-dimensional mixture sampling
                                          2. Develop
                                            1. Training Algorithm
                                              1. Using
                                                1. Trans-dimensional mixture sampling
                                                  1. ?
                                                  2. Stochastic Approximation (SA) framework
                                                    1. ?
                                                    2. Additional innovations
                                                      1. Estimation of Diagonal elements of hessian matrix.
                                                        1. Estimated during SA iterations to rescale the gradient
                                                          1. Improves convergence
                                                        2. Word classing
                                                          1. ?
                                                            1. Accelerate sampling
                                                              1. Improve the smoothing behavior
                                                                1. Sharing statistical strength between similar woords
                                                              2. Using multiple CPUs
                                                                1. Parallelize training of RF model
                                                            2. Fitting to the data
                                                              1. Simultaneously update
                                                                1. Model parameters
                                                                  1. Normalizing constants
                                                            3. Experiments
                                                              1. Wall Street Journal 92 data
                                                                1. 1000 best lists
                                                                  1. Oracle WER is 3.4%
                                                                    1. Kaldi toolkit, DNN acoustic model
                                                                  2. MORE: In paper!
                                                                  3. Comparison
                                                                    1. TDRF
                                                                      1. Performance
                                                                        1. As Good As
                                                                          1. Recurrent neural networks
                                                                        2. Computational
                                                                          1. More efficient than
                                                                            1. Recurrent neural networks
                                                                              1. Computing sentence probability
                                                                          2. Recurrent neural networks
                                                                            1. ?
                                                                    Show full summary Hide full summary

                                                                    Similar

                                                                    Maths Probability
                                                                    Will Thorpe
                                                                    Probability S1
                                                                    Alice Kimpton
                                                                    Maths Exponents and Logarithms
                                                                    Will Thorpe
                                                                    New GCSE Maths required formulae
                                                                    Sarah Egan
                                                                    GCSE Maths: Statistics & Probability
                                                                    Andrea Leyden
                                                                    Counting and Probability
                                                                    Culan O'Meara
                                                                    Teoría de Conteo
                                                                    ISABELLA OSPINA SAENZ
                                                                    Mathematics Prep for maths exam
                                                                    Lulwah Elhariry
                                                                    Probability
                                                                    Dami Alvarez
                                                                    Higher-order Cognition
                                                                    Sneha Mittal
                                                                    Probability
                                                                    Ravindra Patidar