Sentiment Analysis
Use this tool to determine the opinion of an English document thanks to the Syuzhet package. Available in Excel using the XLSTAT software.
Description of the Sentiment Analysis
Sentiment analysis is the process of extracting an author's emotional intent from the text (Ted Kwarler, 2017). Sentiment analysis allows you to label a comment, a book, or in general a document. The document can be labeled as a positive, negative, or neutral opinion.
When to use sentiment analysis?
Sentiment analysis helps companies to understand customers' reviews or feedback, product review, or analyze comments on the web (as tweets, or posts), and political discussions. In general, sentiment analysis answers "How do people feel about something?".
What does sentiment analysis use?
Sentiment analysis uses a dictionary where terms are scored or categorized in a polarity way (positive, negative, or neutral). Dictionaries use different scales which is why XLSTAT suggests four sentiment dictionaries to assign sentiment values to terms:
Sentiment analysis with Bing dictionary: 6789 English terms are labeled as "negative", "neutral" or "positive" in the Bing dictionary. A term labeled as "negative" get a score of -1, a term labeled as "neutral" get a score of 0, and on the contrary, a term labeled as "positive" get a score of 1.
Sentiment analysis with Syuzhet dictionary: 10748 English terms are rated between -1 and 1 in the Syuzhet dictionary. A term is labeled as "negative" if its score is lower than 0, and on the contrary, a term is labeled as "positive" if its score is greater than 0.
Sentiment analysis with AFINN dictionary: 3382 English terms are rated between -5 and 5 (integer only) in the AFINN dictionary. A term is labeled as "negative" if its score is lower than 0, and on the contrary, a term is labeled as "positive" if its score is greater than 0.
Sentiment analysis with NRC dictionary (emotion scale): This dictionary labels 13901 English terms with eight basic emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) and two sentiments (negative and positive).
Besides a sentiment dictionary, sentiment analysis needs tokenized documents. XLSTAT suggests using the Feature extraction tool, before going on sentiment analysis to get the document-term matrix.
How is the document score computed?
The score of each term present in the document is multiplied by its frequency, then scores are summed to compute the document score.
OPTIONS OF THE SENTIMENT ANALYSIS IN XLSTAT
Term frequencies: Select in this field the term frequency matrix. One column corresponds to the frequencies of one term in each document. If the "Column labels" option is activated, you need to include a header in the selection.
Sentiment dictionary: Choose among four sentiment dictionaries.
Custom scores: Select in this field two columns including the term and its score. If you choose the Bing dictionary as the sentiment dictionary, you must enter "negative", "neutral" or "positive". This option allows you to define the sentiment of a term independently of the dictionary previously selected. If the "Column labels" option is activated, you need to include a header in the selection. For this field, missing values are read as "neutral" or zero. Note: Not available for the NRC dictionary.
Term frequencies and scores: Activate this option to display a table showing the total frequency and the score of each term included in the term frequency selection. Note: Not available for the NRC dictionary.
Term frequencies and associated emotion: Activate this option to display a table showing the total frequency and the associated emotion of each term including in the term frequencies selection. Note: Only available for the NRC dictionary.
- Display sentiment terms only: Activate this option to display only the sentiment terms. Terms with a neutral sentiment, which means their score is equal to zero or they are not associated with an emotion, are not displayed.
Overall emotion frequencies: Activate this option to display the total frequency of each emotion present in all documents. Note: Only available for the NRC dictionary.
Document scores: Activate this option to display a table showing the score of each document (row) according to the sentiment dictionary chosen in the General tab. Note: Only available for the NRC dictionary.
- Sort by score (descending): Activate this option to sort the document scores in descending order.
Emotion frequencies by document: Activate this option to display a table that indicates the frequency of each emotion in each document. Note: Only available for the NRC dictionary.
Result interpretations: Activate this option to display, under the result tables short interpretation.
Term frequencies: Activate this option to display a bar chart showing the total term frequencies.
- Minimum frequency: Enter the minimum frequency a term should have to be displayed in the term frequencies bar chart. We suggest increasing the minimum frequency when the number of terms increases.
Term scores: Activate this option to display a bar chart showing the term score.
Document scores: Activate this option to display a bar chart showing the document score. If the Sort by score (descending) option is activated the bar chat is also sorted.
Document scores distribution: Activate this option to display a histogram showing the distribution of the document scores.
Overall emotion frequencies: Activate this option to display a bar chart showing the total emotion frequencies. Note: Only available for the NRC dictionary.
Sentiment-based word cloud: Activate this option to display a word cloud where terms are colored according to their sentiment (positive, negative, or the associated emotion).
- Maximum terms: Enter the number maximum of terms to include in the sentiment-based word cloud.
Result interpretations: Activate this option to display, under the charts short interpretation.
RESULTS OF THE SENTIMENT ANALYSIS IN XLSTAT
Results regarding the document scores: The table and the chart associated with the document scores are displayed to give you a view of the sentiment of each document according to the sentiment dictionary scale. If the Sort by score (descending) option is not activated, you can see the evolution of the document scores, especially if the documents are entered chronologically.
Results regarding the document and the associated emotion: With the emotion scale (NRC), a table is displayed showing the frequencies of each emotion present in a document. This table can be completed with the document scores obtained by another sentiment dictionary and allows you to put natural words on the sentiment and intensity of an opinion present in a document.
Result regarding the document scores distribution: The histogram displayed helps to know the frequency of the scores. In case the scores are centered at 0, it means that on average the documents have many neutral words in them. In another hand, if the scores are centered at a value higher (resp. lower) than 0, it means that on average each document has at least a single positive (resp. negative) word in it.
Results regarding the term frequencies: The table and the chart associated with the term frequencies are displayed to give you a view of the total frequency of a term, in other words, it shows the number of occurrences of the term among all documents.
Results regarding the term scores: The table and the chart associated with the term scores are displayed to give you a view of the sentiment of each term according to the sentiment dictionary scale. In the case of the emotion scale, a term can be associated with zero, one, or several emotions. Neutral terms have a blank case in the "Score" column. Custom scores are shown in bold.
analyze your data with xlstat
Included in
Related features