Introduction to Data Visualization with Matplotlib

0
205

What is Data Visualization?

Data visualization is the representation of data or information in a graph, chart, or other visual format. It communicates relationships of the data with images.It is the graphic representation of data. It involves producing images that communicate relationships among the represented data to viewers of the images. This communication is achieved through the use of a systematic mapping between graphic marks and data values in the creation of the visualization.

Why Data Visualization

We need data visualization because a visual summary of information makes it easier to identify patterns and trends than looking through thousands of rows on a spreadsheet. It’s the way the human brain works.With so much data, it’s become increasingly difficult to manage and make sense of it all. It would be impossible for any single person to wade through data line-by-line and see distinct patterns and make observations. 

What is Matplotlib

Matplotlib is the most popular Python package for data visualization. It provides a quick way to visualize data from Python and create publication-quality figures in various different formats. Matplotlib is a multi-platform data visualization library built on NumPy arrays. It can generate plots, histograms, power spectra, bar charts, error charts, scatter plots, etc., with just a few lines of code. It supports different graphics platforms and toolkits, as well as all the common vector and raster graphics formats. Matplotlib can be used in Python scripts, IPython REPL, PySpark, and Jupyter notebooks.

OUTPUT:

Types of plots

  • bar for bar charts
  • hist for histograms
  • box for boxplots
  • density for density plots
  • area for area plots
  • scatter for scatter plots
  • hexbin for hexagonal bin plots
  • pie for pie charts

Bar plot

A bar plot (or barchart) is one of the most common type of plot. It shows the relationship between a numerical variable and a categorical variable. For example, you can display the height of several individuals using bar chart. 

OUTPUT:

Note: you can give any label on x and y axis.

Histogram

Matplotlib can be used to create histograms. Histograms are plot type used to show the frequency across a continuous or discrete variable.It shows the frequency on the vertical axis and the horizontal axis is another dimension Usually it has bins, where every bin has a minimum and maximum value. Each bin also has a frequency between x and infinite.

OUTPUT:

Scatter plot

scatter plot is a type of plot that shows the data as a collection of points. The position of a point depends on its two-dimensional value, where each value is a position on either the horizontal or vertical dimension.

OUTPUT:

Pie Charts

Pie charts show the size of items (called wedge) in one data series, proportional to the sum of the items. Pie charts show percentage or proportion of data. This percentage represented by each category is right next to its corresponding slice of pie. For pie charts in Matplotlib, the slices are ordered and plotted counter-clockwise.

OUTPUT:

Area plot

A stacked area chart is the extension of a basic area chart to display the evolution of the value of several groups on the same graphic.Area charts are used to represent cumulative totals using numbers or percentages over time. Since these plot by default are stacked they need each column to be either all positive or all negative values.

OUTPUT:

Box plot

 A boxplot is a graph that gives you a good indication of how the values in the data are spread out. Boxplots are a standardized way of displaying the distribution of data based on a five number summary (“minimum”, first quartile (Q1), median, third quartile (Q3), and “maximum”).

OUTPUT:

FAQ

  1. What is bins in histogram Python?

    It is a kind of bar graph. To construct a histogram, the first step is to “bin” the range of values — that is, divide the entire range of values into a series of intervals — and then count how many values fall into each interval. The bins are usually specified as consecutive, non-overlapping intervals of a variable.

  2. What is Alpha in python plot?

    Matplotlib allows you to adjust the transparency of a graph plot using the alpha attribute. By default, alpha=1. If you want to make the graph plot more transparent, then you can make alpha less than 1, such as 0.5 or 0.25.

  3. What is Autopct Python?

    autopct enables you to display the percent value using Python string formatting. For example, if autopct=’%. 2f’ , then for each pie wedge, the format string is ‘%. 2f’ and the numerical percent value for that wedge is pct , so the wedge label is set to the string ‘%.