Sentiment Analysis — Supervised and Unsupervised Learning Comparision
Hi there geeks out there, hope you all are enjoying and learning some new stuff daily. Today I will be demonstrating the project I had recently done on Sentiment Analysis using both Supervised and Unsupervised Machine Learning Techniques, on later stages I can also come up with some deep learning techniques as well.
So First I applied Unsupervised learning Techniques. They are as follows-
- AFINN Lexicon https://github.com/fnielsen/afinn/blob/master/afinn/data/
- SentiWordNet Lexicon http://sentiwordnet.isti.cnr.it
- VADER Lexicon https://github.com/cjhutto/vaderSentiment
I have used the movies_reviews dataset for this purpose which can be found here https://www.kaggle.com/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews?select=IMDB+Dataset.csv
So First things first, I had cleaned the dataset and applied some embeddings, tokenization, lemmatization, and some other data cleaning tasks.
Sample reviews are as follows-
Afinn library can be installed using pip and then can be applied easily.
The performance of AFINN Lexicon is as follows —
The model and performance of SentiWordNet is —
def analyze_sentiment_sentiwordnet_lexicon(review,
verbose=False):
# tokenize and POS tag text tokens
tagged_text = [(token.text, token.tag_) for token in tn.nlp(review)]
pos_score = neg_score = token_count = obj_score = 0
# get wordnet synsets based on POS tags
# get sentiment scores if synsets are found
for word, tag in tagged_text:
ss_set = None
if ‘NN’ in tag and list(swn.senti_synsets(word, ‘n’)):
ss_set = list(swn.senti_synsets(word, ‘n’))[0]
elif ‘VB’ in tag and list(swn.senti_synsets(word, ‘v’)):
ss_set = list(swn.senti_synsets(word, ‘v’))[0]
elif ‘JJ’ in tag and list(swn.senti_synsets(word, ‘a’)):
ss_set = list(swn.senti_synsets(word, ‘a’))[0]
elif ‘RB’ in tag and list(swn.senti_synsets(word, ‘r’)):
ss_set = list(swn.senti_synsets(word, ‘r’))[0]
# if senti-synset is found
if ss_set:
# add scores for all found synsets
pos_score += ss_set.pos_score()
neg_score += ss_set.neg_score()
obj_score += ss_set.obj_score()
token_count += 1
# aggregate final scores
final_score = pos_score — neg_score
norm_final_score = round(float(final_score) / token_count, 2)
final_sentiment = ‘positive’ if norm_final_score >= 0 else ‘negative’
if verbose:
norm_obj_score = round(float(obj_score) / token_count, 2)
norm_pos_score = round(float(pos_score) / token_count, 2)
norm_neg_score = round(float(neg_score) / token_count, 2)
# to display results in a nice table
sentiment_frame = pd.DataFrame([[final_sentiment, norm_obj_score, norm_pos_score,
norm_neg_score, norm_final_score]],
columns=pd.MultiIndex(levels=[[‘SENTIMENT STATS:’],
[‘Predicted Sentiment’, ‘Objectivity’,
‘Positive’, ‘Negative’, ‘Overall’]],
labels=[[0,0,0,0,0],[0,1,2,3,4]]))
print(sentiment_frame)
return final_sentiment
The model and performance of Vader is —
def analyze_sentiment_vader_lexicon(review,
threshold=0.1,
verbose=False):
# pre-process text
review = tn.strip_html_tags(review)
review = tn.remove_accented_chars(review)
review = tn.expand_contractions(review)
# analyze the sentiment for review
analyzer = SentimentIntensityAnalyzer()
scores = analyzer.polarity_scores(review)
# get aggregate scores and final sentiment
agg_score = scores[‘compound’]
final_sentiment = ‘positive’ if agg_score >= threshold\
else ‘negative’
if verbose:
# display detailed sentiment statistics
positive = str(round(scores[‘pos’], 2)*100)+’%’
final = round(agg_score, 2)
negative = str(round(scores[‘neg’], 2)*100)+’%’
neutral = str(round(scores[‘neu’], 2)*100)+’%’
sentiment_frame = pd.DataFrame([[final_sentiment, final, positive,
negative, neutral]],
columns=pd.MultiIndex(levels=[[‘SENTIMENT STATS:’],
[‘Predicted Sentiment’, ‘Polarity Score’,
‘Positive’, ‘Negative’, ‘Neutral’]],
labels=[[0,0,0,0,0],[0,1,2,3,4]]))
print(sentiment_frame)
return final_sentiment
Here we can easily see that the F1 score of Vader is more superior to the other two models. It has done better than the other becauseVADER
uses a combination of a sentiment lexicon is a list of lexical features (e.g., words).
Now we will see the metrics of various Supervised Learning Models —
- Logistic Regression Evaluation Metrics —
2. SVM on BOW Evaluation Metrics —
3. SVM on TF-IDF Evaluation Metrics —
Results and Conclusion —
So here we can easily see that Supervised Learning Techniques surpassed with high F1-score as compared to Unsupervised Learning Techniques.
The reason is very much clear as we have labeled data in Supervised Learning which is real and not predicted but on the other hand in Unsupervised Learning, we have to predict the labels in the dataset and then apply models on the datasets.
Thank you all for reading the article.
You can contact me at uditdeo1670@gmail.com for a code.