From Data to Insight: Visualizing Quantities, Proportions, Relationships, and Distributions with Python’s Matplotlib and Seaborn

python
data visualization
matplotlib
seaborn
Author

Lukman Aliyu Jibril

Published

May 13, 2023

Data visualization is a crucial part of data analysis as it aids in our ability to comprehend and convey to a larger audience complicated data correlations, patterns, and insights. Thanks to its potent modules, such as Matplotlib and Seaborn, Python has gained popularity as a language for data visualization. In this post, we’ll look at how to visualize quantities, proportions, relationships, and distributions using these libraries and how to get useful insights out of them.

Visualizing Quantities

Any information that is measurable and able to be expressed numerically is considered quantitative. Understanding the distribution and dispersion of values, the existence of outliers, and the link between various variables is made easier by visualizing quantitative data.

Histograms are one of the most used tools for visualizing quantitative data. A graph that displays the frequency distribution of a collection of continuous data is called a histogram. Histograms can be easily made with only a few lines of code thanks to Matplotlib and Seaborn. Let’s look at an illustration:

import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

data = np.random.normal(size=1000)

# Create a histogram using Matplotlib
plt.hist(data, bins=30)
plt.title('Histogram of Random Data')
plt.xlabel('Values')
plt.ylabel('Frequency')
plt.show()

# Create a histogram using Seaborn
sns.histplot(data, bins=30)
plt.title('Histogram of Random Data')
plt.xlabel('Values')
plt.ylabel('Frequency')
plt.show()

Using NumPy’s random.normal function, we produce some random data in this example. Then, we use Matplotlib and Seaborn to produce two histograms, one using each program. While the histograms generated by the two libraries are comparable, Seaborn’s histplot function includes a few more capabilities, such as the ability to plot a KDE (Kernel Density Estimate) in addition to the histogram.

Visualizing Proportions

Any information that may be stated as a percentage or a fraction of a whole is considered proportional data. Understanding the relative sizes of several categories or groups and how they affect the overall picture is made easier with the aid of proportional data visualization. Using a pie chart, donut chart, or waffle chart, you can see proportions.

Pie charts are one of the most commonly used tools for representing proportional data. A circular graph called a pie chart demonstrates the proportional breakdown of a group of categorical data. Both Matplotlib and Seaborn provide pie chart creation routines. Let’s look at an illustration:

data = [25, 40, 10, 25]
labels = ['A', 'B', 'C', 'D']

# Create a pie chart using Matplotlib
plt.pie(data, labels=labels)
plt.title('Proportional Data')
plt.show()

# Create a pie chart using Seaborn
sns.set_style('whitegrid')
plt.title('Proportional Data')
sns.color_palette('pastel')
plt.pie(data, labels=labels, colors=sns.color_palette())
plt.show()

In this illustration, we generate a set of labels and random data. After that, we use Seaborn and Matplotlib, respectively, to construct two pie charts. We can color-code each category using a specified palette thanks to Seaborn’s color_palette function.

Visualizing Relationships

Any data that demonstrates the connection between two or more variables is referred to as relationship data. Visualizing connection data enables us to comprehend the correlation, directionality, and linear or nonlinear nature of the relationship.

Scatter plots are among the most commonly used tools for displaying relationship data. An illustration of the relationship between two sets of data is a scatter plot. Both Matplotlib and Seaborn include tools for making scatter plots. Let’s look at an illustration:

# Create some random data
x = np.random.normal(size=1000)
y = 2 * x + np.random.normal(size=1000)

# Create a scatter plot using Matplotlib
plt.scatter(x, y)
plt.title('Scatter Plot of Random Data')
plt.xlabel('X')
plt.ylabel('Y')
plt.show()

# Create a scatter plot using Seaborn
sns.scatterplot(x=x, y=y)
plt.title('Scatter Plot of Random Data')
plt.xlabel('X')
plt.ylabel('Y')
plt.show()

In this illustration, we generate the random data sets x and y. Then, we use Matplotlib and Seaborn to construct two scatter plots, one using each program. The scatterplot function in Seaborn contains a few extra capabilities, like the ability to color-code points according to a third variable.

Visualizing Distributions

Any data that demonstrates the frequency of occurrence of various values or ranges of values is referred to as distribution data. Understanding the distribution’s shape, the existence of outliers, and the likelihood of specific values or ranges of values are all made easier by visualizing distribution data.

Density plots are one of the most popular tools for displaying distribution data. A graph that displays the probability density function of a collection of data is called a density plot. Both Matplotlib and Seaborn include tools for making density graphs. Let’s look at an illustration:

data = np.random.normal(size=1000)

# Create a density plot using Matplotlib
plt.hist(data, density=True, alpha=0.5, bins=30)
sns.kdeplot(data, color='red')
plt.title('Density Plot of Random Data')
plt.xlabel('Values')
plt.ylabel('Probability Density')
plt.show()

# Create a density plot using Seaborn
sns.histplot(data, kde=True, stat='density', alpha=0.5, bins=30)
plt.title('Density Plot of Random Data')
plt.xlabel('Values')
plt.ylabel('Probability Density')
plt.show()

Using NumPy’s random.normal function, we generate a set of random data in this example. Then, we use Matplotlib and Seaborn to produce two density charts, one using each program. We may combine a histogram and a density plot into one visualization using Seaborn’s histplot function.

Conclusion

In this article, we looked at how to depict quantities, proportions, relationships, and distributions using Matplotlib and Seaborn. Data visualization is a potent tool for comprehending and expressing complicated data relationships, trends, and insights. We can get valuable insights from our data using these libraries, and we can use those insights to make decisions that are well-informed.