Vector Model & TF/IDF

Description

Lecture 4
Sam Wells
Flashcards by Sam Wells, updated more than 1 year ago
Sam Wells
Created by Sam Wells over 9 years ago
2
0

Resource summary

Question Answer
Vector Space Model for words in Shakespeare plays
Problems with boolean matching? - Docs. either match or don't - Not good for majority of users (only experts, most can't write boolean queries)
What is the main issue with boolean searches? Feast of Famine Either gives too many or too few results
What is used to assign a score to a query/document pair? A proximity measure, can be defined by similarity or dissimilarity
What is a commonly used proximity measure and what are 3 example equations? The Jacquard Coefficient: Measure of overlap of 2 sets A and B jaccard(A,B) = |A intersect B| / |A union B| jaccard(A,A) = 1 jaccard(A,B) = 0 if A intersect B = 0
Issues with Jaccard Coefficient - Doesn't consider term frequency - Doesn't consider rare terms' value over frequent terms
Term Document Count Matrix Table
Term Frequency TF Definition tf(t,d) of term t in document d is the number of times that t occurs in d
Log-frequency weighting formula 1+log10(tf(t,d))
Document Frequency DF Definition The number of documents that contain term t
idf of term t formula idf(t) = log10(N/df(t)) N = number of documents in the collection
How many idf values are there for each term in a collection? 1
Does idf affect the ranking of one term queries? No idf affects the ranking of documents for queries with at least 2 terms
Collection Frequency Definition Number of occurrences of a term in a collection, counting multiple occurrences
tf-idf formula for weight W(t,d)
We need a proximity distance between 2 vectors, why is euclidian distance a bad idea for this? What can we use instead? Distance is large for vectors of different lengths Use angle instead of distance
How do we find the angle between 2 vectors? The Dot Product
What is the Dot Product formula? a•b=|a||b|cosθ Where θ is the angle between the vectors a and b
What is the formula to get the angle between vectors A and B? cosθ = a • b / |a||b|
Show full summary Hide full summary

Similar

Revising scanning & information retrieval skills
Sarah Holmes