An Empirical Study of Chinese Name matching and Applications

paper by Nanyun Peng, Mo Yu, Mark Dredze

1 Introduction
1.1 Name matching
1.1.1 Important Downstream tasks Entity linking Includes context of mentions Entity clustering Includes context of mentions ? Entity coreference ? Name transliteration Identifying names for mining paraphrases ? Standalone name matching Context independen Entity disambiguation Determine if two mentioned strings refer to the same entity
1.1.2 Methods Language type Alphabetic languages Focused on Example English Indo-European Logogram languages Example Chinese Hanzi Challenge A small number of hanzi represents an entire name There are X*10.000 hanzi in use Current methods Largely UNTESTED Coreference resolution errors Caused by Chinese name matching errors Focus on persons names
1.1.3 Challenge Issue: Name variations Nicknames Aliases Acronyms Differences in translation Exact string matching POOR results!
1.1.4 Determine whether two strings refer to the same entity based on the strings above.
2 Research
2.1 Evaluate Name Matching methods
2.1.1 In Chineese
2.1.2 Approaches Existing String matching ? Learnig ? New New Representation for Chinese
2.1.3 Experiments New Representation for Chinese Improves name matching Entity clustering No details?!
2.2 Newly developed data sets
2.2.1 Matched Chinese name pairs
2.3 Mingpipe
2.3.1 Name matching tool Python package Usage As stand alone Integrated in a larger system
