Probabilistic parsing

Statistical Parsing ==================================================================================== -- using statistical models for AMBIGUITY (PP-attachment), guiding parsing, most likely parse -- we extract the grammar from corpora -- parsing of free, non-restricted texts with >90 accuracy and efficiently -- we need to have a tagged POS corpora -- analyzed corpora (treebanks) -- PTB, Ancora -- lexical approaches -- context free (unigrams) -- context dependent (ngrams, HMM) -- syntactic approaches -- SCFG (inside and outside and viterbi algorithm, learning models) = Stohastic context-free grammar -- hybrid approaches -- stochastic lexicalized tags - computing the most probable parse - Vitebri algorithm - Parameter learning -- Supervised - tagged corpora (some linguist) (from treebanks) -- Unsupervised - Baum-Welch (Fw-Bw) for HMM - Inside-Outside for SCFG

SCFG ==================================================================================== -- associate probability p to each rule and each lexical entry - restriction for context-free grammar: binary rules: Ap -> Aq Ar matrix Bpqr unary rules : Ap -> BM matrix Upm = CHOMSKY NORMAL FORM (CNF) -- assign a p to each left-most derivation or parse tree, allowed by the underlying CFG argmax p(t) is the most likely parse tree - How to obtain PCFG from a treebank? prob of a sentence: sum of p(t) for all t in T, best parse is max prob from all p(t) PROS: some idea of the probability of a parse (but not very good), can be learned without negative examples provides language model for a language CONS: provides worse language model than a 3-gram -- robust, we can combing SCFG with 3-grams -- assigns a lot of probabilities to short sentences, small trees are more probable -- parameter estimation (probabliities) -- problem of sparseness, volume -- asociation to the rule, information about the point of application of the rule in the derivation tree lost -- low frequency constructions are penalized -- probab of a derivation, contextual independence is assumed (CF grammar, but also prob assigment) -- the possibility of relax conditional independence -> sensitivity to structure, lexicalization -- node expansion depends on its position in the tree -- 2 models Condition/Discriminative model: probability of parse tree directly estimated probs are conditioned on a concrete sentence no sentence distrib probs are assumed sum of all probs is 1 Generative/Joined model assigns probs to all the trees generated by the grammar sum of all probs is 1 Probability of a sentence is the sum of the probabilities of all the valid parse trees of the sentence prob of a subtree is independent of its position in the derivation tree -- positional invariance -- context-free, independence from ancestors MLE - Maximum Likelihood Estimation - e.g. TreeBank Grammars P(A -> a) = number of (A-> a) / sum(all rules A-> a in the grammar)

HMM vs. PCFG =============================================================================== Prob distribution over strings of a certain length | Probability distribution over the set of strings in the | language L Forward/Backward | Inside/Outside Forward: αi(t) = P(w1(t-1), Xt=i) | Outside: αj (p,q) = P(w1(p-1), Nj pq, w(q+1)m | G) Backward: βi(t) = P(wtT|Xt=i) | Inside: βj (p,q) = P(wpq | Nj pq, G)

Next up

Probabilistic parsing

Description

Resource summary

Page 1

0 comments

Similar

	Created by Doria Šarić almost 7 years ago