March 22, 2015

Resnicks WordNet Similarity Measure

Resnik Measure

Information content based relatedness measure. Higher information content specific to particular topics, lower ones specific to more general topics

Semantic similarity measures is used for performing tasks such as term like disambiguation, text segmentation, and for checking ontologies for consistency or coherency.

Motivating Resnick’s measure: through hypernymy (is-a) hierarchy

Sense 1
lock -- (a fastener fitted to a door or drawer to keep it firmly closed)
       => fastener, fastening, holdfast, fixing -- (restraint that attaches to something or holds something in place)
           => restraint, constraint -- (a device that retards something's motion; "the car did not have proper restraints fitted")
               => device -- (an instrumentality invented for a particular purpose; "the device is small enough to wear on your wrist"; "a device intended to conserve water")
                   => instrumentality, instrumentation -- (an artifact (or system of artifacts) that is instrumental in accomplishing some end)
                       => artifact, artefact -- (a man-made object taken as a whole)
                           => whole, unit -- (an assemblage of parts that is regarded as a single entity; "how big is that part compared to the whole?"; "the team is a unit")
                               => object, physical object -- (a tangible and visible entity; an entity that can cast a shadow; "it was full of rackets, balls and other objects")
                                   => physical entity -- (an entity that has physical existence)
                                       => entity -- (that which is perceived or known or inferred to have its own distinct existence (living or nonliving))

Resnik similarity :- information-content word-similarity

  • relies on structure of thesaurus
  • Refines path-based approach using normalizations based  on hierarchy depth
  • Represents distance associated with each edge
  • Adds probabilistic information derived from a corpus

Resnik similarity measure:

simResnik(c1, c2) = − log P(LCS(c1, c2))

• estimates common amount of information between  words by information content of lowest common  subsumer

No comments:
Write comments