Trans-dimensional Random Fields for Language Modeling

Introduction
1. language Modelling (LM)
  1. Joint probability of words
    1. Dominant
      1. Conditional approach
        Represent joint probability in terms of conditionals
    2. Alternatives
      1. Random field (RF)
        Used in
        Whole-sentence maximum entropy (WSME LMs)
        Is a Markov Random Field
        ?
        Challenge in Fitting
        Evaluating gradient of log likelihood
        ?
        Approximate
        Used Sampling methods
        Gibbs
        ?
        Independent Metropolis-hashing
        ?
        Importance
        ?
        Can not work efficiently with complex high-dimensional distributions
        Empirical results
        Not satisfactory
        Poor fitted to the data
        Exact
        Requires high-dimensional integration
        ?
        Poor fitted to the data
        Potential benefits
        Naturally express sentence level phenomena
        ?
        Integrate features from variety knowledge sources
        ?
  2. Crucial for
    1. Computational linguistics
      1. Speech recognition
      2. Information retrieval
      3. Etc.
Research
1. Revisit
  1. Random field (RF)
  2. Innovations
    1. Propose
      1. Trans Dimensional RF model (TDRF)
        Idea
        Take account of empirical distributions of lengths
        ?
        Allows to develop
        Markov Chain Monte Carlo technique
        Trans-dimensional mixture sampling
    2. Develop
      1. Training Algorithm
        Using
        Trans-dimensional mixture sampling
        ?
        Stochastic Approximation (SA) framework
        ?
        Additional innovations
        Estimation of Diagonal elements of hessian matrix.
        Estimated during SA iterations to rescale the gradient
        Improves convergence
        Word classing
        ?
        Accelerate sampling
        Improve the smoothing behavior
        Sharing statistical strength between similar woords
        Using multiple CPUs
        Parallelize training of RF model
        Fitting to the data
        Simultaneously update
        Model parameters
        Normalizing constants
  3. Experiments
    1. Wall Street Journal 92 data
      1. 1000 best lists
        Oracle WER is 3.4%
        Kaldi toolkit, DNN acoustic model
      2. MORE: In paper!
    2. Comparison
      1. TDRF
        Performance
        As Good As
        Recurrent neural networks
        Computational
        More efficient than
        Recurrent neural networks
        Computing sentence probability
      2. Recurrent neural networks
        ?

Next up

Trans-dimensional Random Fields for Language Modeling

Description

Resource summary

Similar

	Created by Ivan Zapreev over 8 years ago