Jump to content

Sentiment Analysis of Tweets Using Power BI


Recommended Posts

Guest Abisola_Agboola
Posted

largevv2px999.png.41387e6e2fa0da7d71094895003904ea.png

 

 

 

By following the step-by-step guide in this article, you will be able to build the stunning dashboard as can be seen in the above image.

 

 

 

Outline

 

  • Introduction
  • The Workflow
  • Data Gathering and Transformation
  • Sentiment Analysis
  • Data Modelling in Power BI
  • Data Analysis
  • Recommendation
  • Conclusion

 

 

 

Introduction

 

 

As a data analyst, there will be scenarios where your data will come from secondary sources, e.g. social media data (Twitter). Data like these are gotten through web scraping. A simple use case here; A business is interested in understanding their customer perception and emotion about their brand based on their activities on twitter. To get the data for the analysis, you have to find a way to scrape this data first, clean it, analyze it, and then use a visualization tool to present it to the business.

 

 

 

This project is a collaboration between myself and Flora Oladipupo (@Flora_Oladipupo). We are both Beta Microsoft Learn Student Ambassadors. This article contains embedded links that will lead to Part 1 of this work (Scraping and transforming the twitter data using Python).

 

 

The Workflow

 

 

This analysis is not meant for the prediction of the outcome of the Nigeria 2023 president but just to show details and sentiments surrounding the election based solely on people’s tweets about the top 3 presidential candidates.

 

 

 

Identifying the process and steps taken is very important in each stage to make a meaningful, useful piece from the report and which brings about an interactive insight.

 

The Documentation includes:

 

  • Data Gathering and Transformation
  • Data Modelling
  • Recommendation
  • Conclusion

 

 

 

Data Gathering and Transformation

 

 

This was accomplished by my collaboration partner @Flora_Oladipupo .

 

 

 

Check out this article How to Scrape Twitter Data for Sentiment Analysis with Python and Power BI for full guide on how to scrape and clean the twitter data using python. You will also learn how to export the data to Excel/CSV for visualization.

 

 

 

The cleaned data has 58,633 records. The data includes Date, ID, URL, Username, Location, Tweet, source, number of likes, Number of Retweet, processed tweets, Sentiment, Names of candidates (Peter obi, Tinubu and Atiku), Names of party (PDP, APC and LP) which will allow me to properly visualize and dimension the data.

 

 

 

Because the data has been pre-processed with python, I only had to remove duplicate tweets, create some additional columns, and connect the data to date tables.

 

 

 

 

Sentiment Analysis

 

 

This was already done by my partner using the TextBlob library with Python @Flora_Oladipupo. check it out here How to Scrape Twitter Data for Sentiment Analysis with Python and Power BI. From the dataset received there's a column called sentiment which has the types of sentiment expressed in people's tweet i.e., Positive, negative, and neutral. Positive represents a good sentiment, negative represent a bad sentiment while neutral indicates no interest. A donut chart was plotted to represent the sentiment analysis.

 

 

 

Importing the Data to Power BI Desktop

 

 

Watch this video to import the data into Power BI Desktop

 

 

 

 

 

 

 

 

 

 

Summary of Transformation Carried as can be seen in the Video

 

After extraction of the data set, so much work was done on removing the outliers, duplicates for proper visualizations and insights. Here are some of the steps taken to get a good result and insight.

 

  • Removing the blank spaces found in location and renaming them to Unknown: A total number of 16603 of empty column was found in the location row and this was replaced with Unknown since the users didn’t provide their location and this was done by
  • Extracting a new column: A new column was extracting through the existing column and the column extracted include
  • Year and day: This were done by creating a new column and using the available Date column that was provided initially to extract the year and the day.

  • Month: month was also extracted from the date column, but it was provided in numbers after extractions. A new column was created to rename the month number to their name, and this was done by conditional formatting.

 

 

 

Conditional Formatting

 

 

 

 

 

 

 

 

 

Steps taking in the Video

 

 

 

 

largevv2px999.png.ca4f2a34d52ea67d9cb2af028b1c95fd.png

 

 

 

STEP 1: Using the steps used above, same step was also used for month. After this was done, the month needs to be renamed and this was done using conditional column , shown below.

 

 

 

largevv2px999.png.b19394695a407a7d858ccf73f55c3214.png

 

 

 

STEP 2: The if statement was used to rename the month number and this was how it was represented above.

 

 

 

largevv2px999.png.03694137bcb45a9056fc71f4bcaea707.png

 

 

 

STEP 3: Shows the outcome of the conditional formatting represented above.

 

 

 

  • Time: time was also extracted from the date column by creating a new column

 

 

 

Data Modelling

 

 

After data was transformed into the Power BI desktop, designs that would be needed for visualizations was downloaded. A new date measure was created in other to build a relationship with data set that was available. After creating a relationship, proceed into building my dashboard. Green and white is the Nigeria color, that was why it was used on the dashboard.

 

Total number of likes, retweet, tweets and the total amount of tweet Labor Party, PDP, and APC appeared in a tweet were all visualized using cards.

 

 

 

Steps taken for data modelling are as follows:

 

 

 

Step one : Create a new Table in the data view

 

largevv2px999.png.00fc3e23289f6f1f9948fa2bbe0e046b.png

 

 

 

Step Two: Rename the title and write out your formula( Date = calendarauto) . It automatoically fills the date for you that is the start date and the end date.

 

largevv2px999.png.8f48ba66f8018c702693038435fa2fec.png

 

 

 

Step Three: Enter the formula and here is the result below

 

largevv2px999.png.cbf20cdc1cc0c6fa915a7a001454f963.png

 

 

 

Step Four: creating a model and relationship. the model looks like this

 

largevv2px999.png.7831ec6e47efca377becc7d932a25356.png

 

 

 

 

 

Visualizing the Data in Power BI

 

 

below are steps taken to visualize the data

 

 

 

Sentiment Analysis

 

 

 

 

 

 

 

 

 

Word Cloud

 

 

 

 

 

 

 

 

 

Tweet by Month

 

 

 

 

 

 

 

 

 

Results of the Analysis

 

 

 

 

A visualization showing the top 3 tweet based on source and the result shows that a percentage of 69.68% tweet are from Android users, follow by iPhone users with a percentage of 20.24% while twitter for web has a percentage of 10.09%.

 

 

 

514x370vv2.png.289b338de45c0cd792b212b81b7cb47b.png

 

 

 

 

 

Most talk about words: this visualization was used to show the most talk about tweet Peter obi and labor party has the highest number of words and stop words was used to remove some words to make words visible

 

Also, a visual on the highest number of candidate that was talked about peter obi has the highest number with a total number of 67k while Atiku has a total number of 7k follow by Tinubu with a total number of 2k.

 

An Analysis on tweet by time was also done to know the time most users tweet the most, the result shows that most users tweet by 3pm in the afternoon, 9pm in the evening and 8am in the morning.

 

 

 

Visualization on the Top 5 location where the tweet is coming from: Location Unknown has the highest number but was excluded because there was no precise location. Lagos has the highest number follow by people what filled Nigeria as their location, but it was excluded also, Abuja was the second follow by port Harcourt and then United Kingdom which was also excluded, Enugu was the next location and Ibadan was the last location.

 

 

 

499x366vv2.png.fa3d61ea0e20b28fc0e12ba5836fa25a.png

 

 

 

Recommendation

 

 

Based on the result from the sentiment analysis, people are encouraged to talk more positively about the election, and they should not see it as something they are indifferent about since the election will impact them. The sentiment Analysis shows that 51.95% Tweets were positive, 20.69% Tweets were Neutral and 17.35% were Negative.

 

 

 

Conclusion

 

 

This tutorial shows the impact of sentiment analysis in politics. Although the use case extends beyond politics it can be applied in businesses to determine customer sentiments based on their review thereby letting the business owner know how their business is perceived by customers. Using the right tool to analyze sentiment is also as important as getting the intended result and you can't miss it when you combine Python with Power BI to accomplish that.

 

 

 

Resources

Power BI Learning Overview | Microsoft Power BI

Azure for Students – Free Account Credit | Microsoft Azure

 

Continue reading...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...