| Question | Answer |
| Vector Space Model for words in Shakespeare plays | |
| Problems with boolean matching? | - Docs. either match or don't - Not good for majority of users (only experts, most can't write boolean queries) |
| What is the main issue with boolean searches? | Feast of Famine Either gives too many or too few results |
| What is used to assign a score to a query/document pair? | A proximity measure, can be defined by similarity or dissimilarity |
| What is a commonly used proximity measure and what are 3 example equations? | The Jacquard Coefficient: Measure of overlap of 2 sets A and B jaccard(A,B) = |A intersect B| / |A union B| jaccard(A,A) = 1 jaccard(A,B) = 0 if A intersect B = 0 |
| Issues with Jaccard Coefficient | - Doesn't consider term frequency - Doesn't consider rare terms' value over frequent terms |
| Term Document Count Matrix Table | |
| Term Frequency TF Definition | tf(t,d) of term t in document d is the number of times that t occurs in d |
| Log-frequency weighting formula | 1+log10(tf(t,d)) |
| Document Frequency DF Definition | The number of documents that contain term t |
| idf of term t formula | idf(t) = log10(N/df(t)) N = number of documents in the collection |
| How many idf values are there for each term in a collection? | 1 |
| Does idf affect the ranking of one term queries? | No idf affects the ranking of documents for queries with at least 2 terms |
| Collection Frequency Definition | Number of occurrences of a term in a collection, counting multiple occurrences |
| tf-idf formula for weight W(t,d) | |
| We need a proximity distance between 2 vectors, why is euclidian distance a bad idea for this? What can we use instead? | Distance is large for vectors of different lengths Use angle instead of distance |
| How do we find the angle between 2 vectors? | The Dot Product |
| What is the Dot Product formula? | a•b=|a||b|cosθ Where θ is the angle between the vectors a and b |
| What is the formula to get the angle between vectors A and B? | cosθ = a • b / |a||b| |
Want to create your own Flashcards for free with GoConqr? Learn more.