Sentiment Analysis — Supervised and Unsupervised Learning Comparision

Hi there geeks out there, hope you all are enjoying and learning some new stuff daily. Today I will be demonstrating the project I had recently done on Sentiment Analysis using both Supervised and Unsupervised Machine Learning Techniques, on later stages I can also come up with some deep learning techniques as well.

So First I applied Unsupervised learning Techniques. They are as follows-

  1. AFINN Lexicon
  2. SentiWordNet Lexicon
  3. VADER Lexicon

I have used the movies_reviews dataset for this purpose which can be found here

So First things first, I had cleaned the dataset and applied some embeddings, tokenization, lemmatization, and some other data cleaning tasks.

Sample reviews are as follows-

Movies Reviews

Afinn library can be installed using pip and then can be applied easily.

The performance of AFINN Lexicon is as follows —

Afinn Metrics

The model and performance of SentiWordNet is —

def analyze_sentiment_sentiwordnet_lexicon(review,


# tokenize and POS tag text tokens

tagged_text = [(token.text, token.tag_) for token in tn.nlp(review)]

pos_score = neg_score = token_count = obj_score = 0

# get wordnet synsets based on POS tags

# get sentiment scores if synsets are found

for word, tag in tagged_text:

ss_set = None

if ‘NN’ in tag and list(swn.senti_synsets(word, ‘n’)):

ss_set = list(swn.senti_synsets(word, ‘n’))[0]

elif ‘VB’ in tag and list(swn.senti_synsets(word, ‘v’)):

ss_set = list(swn.senti_synsets(word, ‘v’))[0]

elif ‘JJ’ in tag and list(swn.senti_synsets(word, ‘a’)):

ss_set = list(swn.senti_synsets(word, ‘a’))[0]

elif ‘RB’ in tag and list(swn.senti_synsets(word, ‘r’)):

ss_set = list(swn.senti_synsets(word, ‘r’))[0]

# if senti-synset is found

if ss_set:

# add scores for all found synsets

pos_score += ss_set.pos_score()

neg_score += ss_set.neg_score()

obj_score += ss_set.obj_score()

token_count += 1

# aggregate final scores

final_score = pos_score — neg_score

norm_final_score = round(float(final_score) / token_count, 2)

final_sentiment = ‘positive’ if norm_final_score >= 0 else ‘negative’

if verbose:

norm_obj_score = round(float(obj_score) / token_count, 2)

norm_pos_score = round(float(pos_score) / token_count, 2)

norm_neg_score = round(float(neg_score) / token_count, 2)

# to display results in a nice table

sentiment_frame = pd.DataFrame([[final_sentiment, norm_obj_score, norm_pos_score,

norm_neg_score, norm_final_score]],

columns=pd.MultiIndex(levels=[[‘SENTIMENT STATS:’],

[‘Predicted Sentiment’, ‘Objectivity’,

‘Positive’, ‘Negative’, ‘Overall’]],



return final_sentiment


The model and performance of Vader is —

def analyze_sentiment_vader_lexicon(review,



# pre-process text

review = tn.strip_html_tags(review)

review = tn.remove_accented_chars(review)

review = tn.expand_contractions(review)

# analyze the sentiment for review

analyzer = SentimentIntensityAnalyzer()

scores = analyzer.polarity_scores(review)

# get aggregate scores and final sentiment

agg_score = scores[‘compound’]

final_sentiment = ‘positive’ if agg_score >= threshold\

else ‘negative’

if verbose:

# display detailed sentiment statistics

positive = str(round(scores[‘pos’], 2)*100)+’%’

final = round(agg_score, 2)

negative = str(round(scores[‘neg’], 2)*100)+’%’

neutral = str(round(scores[‘neu’], 2)*100)+’%’

sentiment_frame = pd.DataFrame([[final_sentiment, final, positive,

negative, neutral]],

columns=pd.MultiIndex(levels=[[‘SENTIMENT STATS:’],

[‘Predicted Sentiment’, ‘Polarity Score’,

‘Positive’, ‘Negative’, ‘Neutral’]],



return final_sentiment

Vader Metrics

Here we can easily see that the F1 score of Vader is more superior to the other two models. It has done better than the other becauseVADER uses a combination of a sentiment lexicon is a list of lexical features (e.g., words).

Now we will see the metrics of various Supervised Learning Models —

  1. Logistic Regression Evaluation Metrics —
Logistic Metrics

2. SVM on BOW Evaluation Metrics —


3. SVM on TF-IDF Evaluation Metrics —


Results and Conclusion —

So here we can easily see that Supervised Learning Techniques surpassed with high F1-score as compared to Unsupervised Learning Techniques.

The reason is very much clear as we have labeled data in Supervised Learning which is real and not predicted but on the other hand in Unsupervised Learning, we have to predict the labels in the dataset and then apply models on the datasets.

Thank you all for reading the article.

You can contact me at for a code.

Hi there I am Udit and love to deep dive into CP, ML, DL and NLP.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store