Ideogram Based Sentiment Analysis in Japanese Text Tyler Thornblade
Introduction Many papers apply similar techniques across
differing languages Two papers in this class introduced a novel technique: assign sentiment at the character (sub-word) level Opinion Extraction, Summarization and
Tracking in News and Blog Corpora, Ku, L., Liang, Y. & Chen, H. AAAI 2006 Symposium. (John) Experiments with sentimental dictionary based classifier and CRF model, Huang, R., Sun, L. & Pan, L. Sixth NTCIR Workshop, 2007. (Cem)
Why are ideograms different? Unlike phonetic characters, ideograms have
innate meanings Are they sentiment bearing?
Example: 気 Spirit Mind Air Mood
Note that ideograms seldom have just one
meaning; more typical to have a synset or group of related synsets
Ku et al. Create a sentiment dictionary General Inquirer, Chinese Network Sentiment
Dictionary Expanded dictionary using thesauri Tong2yi4ci2ci2lin2 (Mei et al. 1982) Academia Sinica Bilingual Ontological Wordnet
(Huang et al. 2008)
Performance of Ku et al. & Huang et al. Results were fair but not impressive Neither paper outlined results at the word
level
Hypothesis These techniques will not be as effective in
Japanese as in Chinese Why? Bag-of-words type approach ignores
compositional understanding Japanese uses script in addition to ideograms
A Short Background on Japanese Although linguistically unrelated, Chinese
and Japanese both use Chinese characters extensively Many multi-character compounds in Japanese are borrowings Writing systems Chinese characters (Kanji) Script (Hiragana, Katakana) Words that mix characters with script (okurigana) Words that are entirely script (kana)
Japanese Compound Composition Five classes 1. Both characters have the same meaning. 2. The characters have opposite meanings. 3. The top character modifies the bottom
character. 4. The bottom character is the target, direct object, or complement of the top character. 5. The top character negates (“flips”) the meaning of the bottom character. First two are ok, last three could present
problem for Ku et al.
Experiment Start with sentiment dictionary of Kaji and
Kitsuregawa (2007) (Presented by Tyler), => 10,000 words Clean to remove bigrams, trigrams => 2386 words Apply Ku et al. Generate sentiment scores for the 954
Chinese characters Generate sentiment scores for the words in the dictionary Ignore magnitude and score result by comparing sign of Kaji & Kitsuregawa to sign of program output
Caveats This is a proof of concept; there was
insufficient time (and resources) to develop a new sentiment dictionary and/or perform an annotation study Train and Test on same data Results not comparable to other systems Should interpret as an upper bound on
performance of this method We start with essentially perfect knowledge of the
sentiment value of words Our results should be near optimal for this method
Results
Oh no! Weren’t we expecting poor results?
Detailed results for characters
Error Analysis 20% of the errors were selected for detailed
analysis 50 false positives 50 false negatives These were further pruned so that only multi-
character compounds were considered
Error Analysis, False Positives 33% of errors
explained by lack of compositional knowledge 6.7% class 5 27% class 3
Error Analysis, False Negatives 54.8% of errors
explained by lack of compositional knowledge 3.2% class 5 51.6% due to “ 的”
Other errors Script characters We can’t analyze words entirely made up of
script 34.7% of all errors were due to this
Words that mix script with characters may
introduce additional noise
Problems with source data After cleaning, the dictionary still contained 4-
5% bigrams Some data from Kaji & Kitsuregawa is unintuitive
E.g. 無用 and 不用 , both of which mean “useless” yet
received high positive sentiment scores and showed
Evaluation of lexicon Pulled a list of 500 adjective phrases
randomly selected from Web After removing parse errors and duplicates,
405 unique phrases No overlap with development set Balance: 158 positive, 150 negative, 97 neutral Based on human annotation Two annotators, Kappa 0.73
Baseline: Turney 2002, co-occurrence in a
window
Turney used “excellent” and “poor”, they use
最高 “ best” and 最低 “ worst”
Conclusions Overall: results were good. As a proof of
concept, this provides support for additional work in this area. Hypothesis was accurate in that approximately 60% of the errors were explainable in terms of missing linguistic knowledge
Next steps Perform a more rigorous study of this nature Use Kaji & Kitsuregawa dictionary and do an
annotation study to show the true performance of this approach Create a better sentiment dictionary and do the same Kobayashi’s Evaldic might be one resource
Apply compositional features Unclear if lexical data of this nature is
available Apply word-based techniques to script
characters
Questions