notes 1.0 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647
  1. 1 - "Mr. Green killed Colonel Mustard in the study with the candlestick. Mr. Green is not a very nice fellow."
  2. 2 - "Professor Plumb has a green plant in his study."
  3. 3 - "Miss Scarlett watered Professor Plumb's green plant while he was away from his office last week."
  4. l1 = 19
  5. l2 = 9
  6. l3 = 16
  7. q1 - "green"
  8. q1 = [0.0, 0.71]
  9. 1 = [0.0, 0.0747]
  10. 2 = [0.0, 0.1555]
  11. 3 = [0.0, 0.0875]
  12. green : total count = 4, idf = 0.71
  13. mr : total count = 2, idf = 1.40
  14. the : total count = 2, idf = 1.40
  15. plant : total count = 2, idf = 1.40
  16. q2 = "Mr. Green"
  17. q2 = [1.4, 0.71]
  18. 1 = [0.147, 0.0747]
  19. 2 = [0, 0.1555]
  20. 3 = [0, 0.0875]
  21. q3 = "the green plant"
  22. q3 = [0.5, 0.25, 0.5]
  23. 1 = [1, 0.5, 0]
  24. 2 = [0, 0.25, 0.5]
  25. 3 = [0, 0.25, 0.5]
  26. Inverse Index as a trie
  27. values are {docId: score} where score is the sum of tf across fields, with multipliers applied
  28. when querying calculate the idf and multiply it by the tf
  29. for a multi term query generate a vector using the idf
  30. find all the documents that match both queries, and generate a tf*idf
  31. word: {
  32. totalCount: 123,
  33. docs:
  34. }