H
HadilBENAmor
Data Analytics
Analytics can transform raw data into an extensive collection of information that categorizes data to identify and analyze behavioral data
and patterns. Data analytics consists of converting raw data into actionable insights.
Although having data analytics competencies may be required for some job and optional for others, it makes all data related work easier. If you are a data scientist, for example, a quick and informative analysis could give you an idea related to preprocessing and to the modeling algorithm.
Data analytics has several benefits for different segments such as:
- Organization: By examining historical data and recognizing trends, businesses can gain a better understanding of their customers, market conditions and products. Analytics has capacity to help with informed decision-making leading to cost effectiveness.
- Developers: data analytics help in getting insights into how to develop the business.
- Product: data analytics may help in product design and feature development.
Previously, data analytics was a complex task. However, thanks to technological advancement, new tools were developed to support accessibility and its comprehension by, many people across all teams despite their technical backgrounds. One of these tools is Power BI developed by Microsoft.
Power BI for Data Analysis
Power BI is an exceptional tool for quickly pulling actionable insights from data. It allows you to build visuals (as well as specifying new measure metrics) for your data in reports and dashboards which could be shared to gather insights at a high level to get more detailed information. We have:
- Power BI Desktop: Desktop is a complete data analysis and report creation tool that is used to connect to, transform, visualize, and analyze data.
- Power BI service: a cloud-based service, or software as a service (SaaS). Teams and organizations should use it because it facilitates report editing and collaboration. You can connect to data sources in the Power BI service, too, but modeling is limited.
In this blog, we are going to work with Power BI Desktop: Download Power BI Desktop
The data can be downloaded from: Financial Data or imported directly from Power BI Desktop available samples.
Analyzing data has been always correlated with statistics that show distribution or helps in detecting outliers, for example. Exploring the statistical summary gives you a high-level view of the available data, where you can remarque clusters, discover patterns on behavioral data, calculate data averages, min, max and more. Based on this need, Power BI has many functions that guide in conducting a statistical analysis, such as Data Analysis Expressions (DAX) functions and visuals (histograms and bell curves…).
The list below presents some types of visualizations:
- Histograms can be used to depict the frequency distribution of variables in a dataset. For example, we use the column chart visual to present a histogram that determines the sum of sales per country.
2. Charts Bar or column chart visual in Power BI relates two data points: a measure and a dimension. It only visualizes a single data point and to compare discrete or categorical variables in a graphical format.
Histograms and bell curves (distribution charts) are the most common way to display statistics about the semantic models. In Power BI, you can represent a histogram with one of the bar or column chart visuals and represent a bell curve with an area chart visual, as illustrated in the following image.
3. Statistical functions| Data Analysis Expressions (DAX): calculate values related to statistical distributions and probability, such as standard deviation (StdevP) and max (Max) ( statistical_functions and function-aggregates)
TOPN: one of the most known DAX functions. It returns the top N rows of a specified table(dataset). The Top N analysis is a great way to present data that might be important, such as the top 10 selling products, top 3 employees in an organization, or top 1 dangerous contamination. On the other hand, it may present the bottom N items in a list such as the worst sellers. It depends on your perspective and business requirements. In this example, we visualized respectively the top 10 countries by sales and top 10 countries by discounts
Also, you can apply a new customized filter in the Filter section. In this example, we visualized the countries with Sum of Gross Sales (Variable) greater than (Operation) 25M (threshold).
Outliers Identification with Power BI Visuals
We define an outlier as a type of anomaly in the data, something unexpected or surprising, based on historical averages or previous normal results. It is important to identify outliers to isolate data points that significantly differ from other data points to not bias the future model and insights. Then, we need to take action to investigate the reasons for the presence of those outliers. The results of this analysis can make a significant impact on business decision making.
Let’s consider our scenario, where we are analyzing units sold by country. The countries that stand out in terms of units sold are the ones we want to take note of.
To that point, Power BI allows you to identify outliers in your data. The process involves:
- Segmenting data into two groups: the outlier data | normal data
- Use calculated columns to identify outliers à results would be static until à refresh the data.
Solution: use a visualization or DAX function. These methods will ensure that your results are dynamic.
- After identifying outliers, you can use slicers or filters to highlight them.
- Add a legend to your visuals so that the outliers can be identified among the other data.
- Dive deeper into the reasons for outlier’s presence to gain more insights.
For example, in our case sum of units sold and discounts by country. Why has the sum of units sold in the country been different from others? What are the reasons behind the difference? Was there any inflation or economic perturbation during a specific period in this country? Why, despite discounts, was the sum of units sold not important? What is the specificity that may impact the value of the units sold?
We can also use a DAX function to add information related to variance.
Clustering Techniques in Power BI
Clustering is used to identify groups of similar objects in datasets with two or more variable quantities. It outputs a segment (cluster) of data that is like each other but dissimilar to the rest of the data.
Spoiler (Highlight to read)
There is always confusion between Grouping and Clustering. Clusters are formed by similarity (unsupervised technique). However, Groups are organized based on common characteristics (supervised method)
There is always confusion between Grouping and Clustering. Clusters are formed by similarity (unsupervised technique). However, Groups are organized based on common characteristics (supervised method)
The Power BI clustering feature allows you to analyze your semantic model to identify similarities and dissimilarities in the attribute values, and then it separates the data that has similarities into a subset of the data. These subsets of data are referred to as clusters.
In our example, we look for patterns in our financial data, such as sales overview. We segmented the countries into clusters according to their similarities: Sum of units sold by segment.
- Start by adding the scatter chart visualization to the report.
- Add the required fields to the visual. In this example: sum of units sold field to the x-axis, the sum of sales Sales field to the y-axis and Segment to the Legend field. The following image shows clusters in the scatter chart, so it is difficult to discern any natural groups. Here we plot sum of units sold and sum of sales by segment.
Time Series Analysis with Power BI
For as long as we have been recording data, time has been a crucial factor. In time series analysis, time is a significant variable of the data. Times series analysis helps us study our inputs and learn how we progress within it
Dynamic data.
Time series analysis often involves visuals like Gantt charts, project planning, and stock movement semantic models. In Power BI, you can use visuals to view how the data is progressing over time, which in turn allows you to make observations like whether any significant occurrences affected your data.
Suitable visualizations in Power BI for Time Series analysis: line chart, area chart, or scatter chart because they are particularly useful for representing cumulative data over time and can be customized to highlight specific aspects of the time series.
Additionally, Microsoft AppSource has an animation custom visual called Play Axis that works like a dynamic slicer and is a compelling way to display time trends and patterns in your data without user interaction. In our example we:
- Add a scatter visual to the report page to show the sales data by product during the months.
- Import the animation custom visual from Appsource to use with the visuals.
Visualizations pane
Get more visuals icon
On the Power BI Visuals window search Play axis Add Play Axis (Dynamic Slicer) visual.
- Select the field Quarter (e.g., Month) that you want to use as the slicer in the Play Axis animation.
- Animation controls become available on the visual. An animation will be displayed such showing in our examples:
Analyze Feature in Power BI
The Analyze feature provides you with additional analysis that is generated by Power BI for a selected data point. This feature is useful to discover the insights provided by PowerBI that you may miss. It may be considered as a starting point for analyzing why your data distribution looks the way that it does.
Instead of exploring the data manually, you can use the Analyze feature to get fast, automated, insightful analysis of your data. To use the Analyze feature:
Click a data point on the visual à hover over the Analyze option to display two further options depending on the data point selected:
- Explain the increase: when your focus is on understanding the reasons behind a change in a specific metric. This is especially relevant when a single data point has changed noticeably, and you want to know why. for exploring reasons behind specific changes in data points
- Find where the distribution is different: when your focus is on comparing how data is distributed across different categories or groups. This is more about understanding differences in patterns or behaviors across subsets of your data, rather than focusing on a single change for analyzing and comparing distributions across categories or groups. In our example where we analyze sum of sales by country:
If you find any of the provided analysis useful, you can add it to your report so that other users can view it. Here, we found that segment analysis is useful, because it demonstrated the sum of sales of government per country.
What to Explore Next?
AI and Power BI
Power BI includes several specialized visuals that provide a considerable interactive experience for report consumers. Often, these specialized visuals are called AI visuals.
Why?
Because Power BI uses machine learning (ML) to discover and display insights from data. These visuals provide a simple way to deliver an interactive experience to your report.
For further reading, the three main AI visuals are:
Key influencers
Decomposition tree
Q&A
References:
Benefits-of-data-analytics.
What is Clustering?.
How-To-create-Distribution-Chart-Bell-chart-in-Power-BI
Grouping-and-binning-PowerBI
PowerBI Service vs Desktop
Continue reading...