All posts by jeenal suthar

Python Matplotlib Library with Examples

What Is Python Matplotlib?

Matplotlib is a plotting library for the Python programming language and its numerical mathematics extension NumPy. It provides an object-oriented API for embedding plots into applications using general-purpose GUI toolkits like Tkinter, wxPython, Qt, or GTK+.

Pyplot is a Matplotlib module which provides a MATLAB-like interface. Matplotlib is designed to be as usable as MATLAB, with the ability to use Python and the advantage of being free and open-source. matplotlib.pyplot is a plotting library used for 2D graphics in the python programming language. It can be used in python scripts, shell, web application servers, and other graphical user interface toolkits.

There are several toolkits that are available that extend python Matplotlib functionality.

  • Basemap: It is a map plotting toolkit with various map projections, coastlines, and political boundaries.
  • Cartopy: It is a mapping library featuring object-oriented map projection definitions, and arbitrary point, line, polygon and image transformation capabilities.
  • Excel tools: Matplotlib provides utilities for exchanging data with Microsoft Excel.
    Mplot3d: It is used for 3-D plots.
  • Natgrid: It is an interface to the “natgrid” library for irregular gridding of the spaced data.
  • GTK tools: mpl_toolkits.gtktools provides some utilities for working with GTK. This toolkit ships with matplotlib, but requires pygtk.
  • Qt interface
  • Mplot3d: The mplot3d toolkit adds simple 3D plotting capabilities to matplotlib by supplying an axes object that can create a 2D projection of a 3D scene.
  • matplotlib2tikz: export to Pgfplots for smooth integration into LaTeX documents.

Types of Plots
There are various plots which can be created using python Matplotlib. Some of them are listed below:

  • Bar Graph
  • Histogram
  • Scatter Plot
  • Line Plot
  • 3D plot
  • Area Plot
  • Pie Plot
  • Image Plot

We will demonstrate some of them in detail.

But before that, let me show you elementary codes in python matplotlib in order to generate a simple graph.

from matplotlib import pyplot as plt
  
#Plotting to our canvas
  
plt.plot([5,6,7],[8,9,5])
  
#Showing what we plotted
  
plt.show()

So, with three lines of code, you can generate a basic graph using python matplotlib.

Let us see how can we add title, labels to our graph created by python matplotlib library to bring in more meaning to it. Consider the below example:

from matplotlib import pyplot as plt
 
plt.plot([5,2,7],[2,16,4])
plt.title('Info')
plt.ylabel('Y axis')
plt.xlabel('X axis')
plt.show()

You can even try many styling techniques to create a better graph by changing the width or color of a particular line or what if you want to have some grid lines, there you need styling!

The style package adds support for easy-to-switch plotting “styles” with the same parameters as a matplotlibrc file.

There are a number of pre-defined styles provided by matplotlib. For example, there’s a pre-defined style called “ggplot”, which emulates the aesthetics of ggplot (a popular plotting package for R). To use this style, just add:

import matplotlib.pyplot as plt
plt.style.use('ggplot')

To list all available styles, use:

print(plt.style.available)
----------------------------
o/p:
['seaborn-darkgrid', 'Solarize_Light2', 'seaborn-notebook', 'classic', 'seaborn-ticks', 'grayscale', 'bmh', 'seaborn-talk', 'dark_background', '
ggplot', 'fivethirtyeight', '_classic_test', 'seaborn-colorblind', 'seaborn-deep', 'seaborn-whitegrid', 'seaborn-bright', 'seaborn-poster', 'sea
born-muted', 'seaborn-paper', 'seaborn-white', 'fast', 'seaborn-pastel', 'seaborn-dark', 'tableau-colorblind10', 'seaborn', 'seaborn-dark-palett
e']

So, let me show you how to add style to a graph using python matplotlib. First, you need to import the style package from python matplotlib library and then use styling functions as shown in below code:

from matplotlib import pyplot as plt
from matplotlib import style
 
style.use(('dark_background'))
x = [5,8,10]
y = [12,16,6]
x2 = [6,9,11]
y2 = [6,15,7]
plt.plot(x,y,'r-o',label='line one', linewidth=5)
plt.plot(x2,y2,'c',label='line two',linewidth=5)
plt.title('Epic Info')
plt.ylabel('Y axis')
plt.xlabel('X axis')
plt.legend()
plt.grid(True,color='w')
plt.show()

import numpy as np
import matplotlib.pyplot as plt
with plt.style.context(('dark_background')):
  plt.plot(np.sin(np.linspace(0, 2 * np.pi)), 'r-o')
# Some plotting code with the default style
plt.show()

Now, we will understand the different kinds of plots. Let’s start with the bar graph!

Matplotlib: Bar Graph
A bar graph uses bars to compare data among different categories. It is well suited when you want to measure the changes over a period of time. It can be plotted vertically or horizontally. Also, the vital thing to keep in mind is that longer the bar, the greater is the value. Now, let us practically implement it using python matplotlib.

from matplotlib import pyplot as plt
style.use('dark_background') 
plt.bar([1,2,3,4,5],[8,7,5,6,4],
label="BMW", color='b', width=0.2)
plt.bar([1.2,2.2,3.2,4.2,5.2],[2,3,5,10,8],
label="Audi", color='r',width=0.2)
plt.legend()
plt.xlabel('Days')
plt.ylabel('Distance (kms)')
plt.title('Bar Plot')
plt.show()

When I run this code, it generates a figure like below:


In the above plot, I have displayed a comparison between the distance covered by two cars BMW and Audi over a period of 5 days. Next, let us move on to another kind of plot using python matplotlib – Histogram

Matplotlib – Histogram
Let me first tell you the difference between a bar graph and a histogram. Histograms are used to show a graphical representation of the distribution of numerical data whereas a bar chart is used to compare different entities.

It is an estimate of the probability distribution of a continuous variable (quantitative variable) and was first introduced by Karl Pearson. It is a kind of bar graph.

To construct a histogram, the first step is to “bin” the range of values — that is, divide the entire range of values into a series of intervals — and then count how many values fall into each interval. The bins are usually specified as consecutive, non-overlapping intervals of a variable. The bins (intervals) must be adjacent and are often (but are not required to be) of equal size.

Basically, histograms are used to represent data given in the form of some groups or we can say when you have arrays or a very long list. X-axis is about bin ranges where Y-axis talks about frequency. So, if you want to represent the age-wise population in form of the graph then histogram suits well as it tells you how many exist in certain group range or bin if you talk in the context of histograms.

In the below code, I have created the bins in the interval of 10 which means the first bin contains elements from 0 to 9, then 10 to 19 and so on.

import matplotlib.pyplot as plt
population_age = [22,55,62,45,21,22,34,42,42,4,2,102,95,85,55,110,120,70,65,55,111,115,80,75,65,54,44,43,42,48]
bins = [0,10,20,30,40,50,60,70,80,90,100]
plt.hist(population_age, bins, rwidth=0.8)
plt.xlabel('age groups')
plt.ylabel('Number of people')
plt.title('Histogram')
plt.show()

When I run this code, it generates a figure like below:

As you can see in the above plot, Y-axis tells about the age groups that appear with respect to the bins. Our biggest age group is between 40 and 50.

Matplotlib: Scatter Plot
A scatter plot is a type of plot that shows the data as a collection of points. The position of a point depends on its two-dimensional value, where each value is a position on either the horizontal or vertical dimension. Usually, we need scatter plots in order to compare variables, for example, how much one variable is affected by another variable to build a relation out of it.
Consider the below example:

import matplotlib.pyplot as plt
x = [1,1.5,2,2.5,3,3.5,3.6]
y = [7.5,8,8.5,9,9.5,10,10.5]
 
x1=[8,8.5,9,9.5,10,10.5,11]
y1=[3,3.5,3.7,4,4.5,5,5.2]
 
plt.scatter(x,y, label='high income low saving',color='r')
plt.scatter(x1,y1,label='low income high savings',color='b')
plt.xlabel('saving*100')
plt.ylabel('income*1000')
plt.title('Scatter Plot')
plt.legend()
plt.show()

As you can see in the above graph, I have plotted two scatter plots based on the inputs specified in the above code. The data is displayed as a collection of points having ‘high-income low salary’ and ‘low-income high salary.’

Scatter plot with groups
Data can be classified into several groups. The code below demonstrates:

import numpy as np
import matplotlib.pyplot as plt

# Create data
N = 60
g1 = (0.6 + 0.6 * np.random.rand(N), np.random.rand(N))
g2 = (0.4+0.3 * np.random.rand(N), 0.5*np.random.rand(N))
g3 = (0.3*np.random.rand(N),0.3*np.random.rand(N))

data = (g1, g2, g3)
colors = ("red", "green", "blue")
groups = ("coffee", "tea", "water")

# Create plot
fig = plt.figure(figsize=(10,8))
ax = fig.add_subplot(1, 1, 1)

for data, color, group in zip(data, colors, groups):
   x, y = data
   ax.scatter(x, y, alpha=0.8, c=color, edgecolors='none', s=30, label=group)

plt.title('Matplot scatter plot')
plt.legend(loc=2)
plt.show()

The purpose of using “plt.figure()” is to create a figure object. It’s a Top-level container for all plot elements.

The whole figure is regarded as the figure object. It is necessary to explicitly use “plt.figure()”when we want to tweak the size of the figure and when we want to add multiple Axes objects in a single figure.

fig.add_subplot() is used to control the default spacing of the subplots.
For example, “111” means “1×1 grid, first subplot” and “234” means “2×3 grid, 4th subplot”.

You can easily understand by the following picture:

Next, let us understand the area plot or you can also say Stack plot using python matplotlib.

Matplotlib: Area Plot
Area plots are pretty much similar to the line plot. They are also known as stack plots. These plots can be used to display the evolution of the value of several groups on the same graphic. The values of each group are displayed on top of each other. It allows checking on the same figure the evolution of both the total of a numeric variable and the importance of each group.

A line chart forms the basis of an area plot, where the region between the axis and the line is represented by colors.

import numpy as np
import matplotlib.pyplot as plt

# Your x and y axis
x=range(1,6)
y=[ [1,4,6,8,9], [2,2,7,10,12], [2,8,5,10,6] ]# Basic stacked area chart.
plt.stackplot(x,y, labels=['A','B','C'], colors=['m','c','r'])
plt.legend(loc='upper left')
plt.show()

import matplotlib.pyplot as plt
days = [1,2,3,4,5]
Enfield =[50,40,70,80,20]
Honda = [80,20,20,50,60]
Yahama =[70,20,60,40,60]
KTM = [80,20,20,50,60]
plt.stackplot(days, Enfield, Honda, Yahama, KTM,labels=['Enfield', 'Honda', 'Yahama', 'KTM'], colors=['r','c','y','m'])
plt.xlabel('Days')
plt.ylabel('Distance in kms')
plt.title('Bikes deatils in area plot')
plt.legend()
plt.show()

The above-represented graph shows how an area plot can be plotted for the present scenario. Each shaded area in the graph shows a particular bike with the frequency variations denoting the distance covered by the bike on different days. Next, let us move to our last yet most frequently used plot – Pie chart.

Matplotlib: Pie Chart
In a pie plot, statistical data can be represented in a circular graph where the circle is divided into portions i.e. slices of pie that denote a particular data, that is, each portion is proportional to different values in the data. This sort of plot can be mainly used in mass media and business.

import matplotlib.pyplot as plt
slices = [8,5,5,6]
activities = ['Enfield','Honda','Yahama','KTM']
cols = ['c','g','y','b']
plt.pie(slices,
labels=activities,
colors=cols,
startangle=90,
shadow= True,
explode=(0,0.1,0,0),
autopct='%1.1f%%')
plt.title('Bike details in Pie Plot')
plt.show()

In the above-represented pie plot, the bikes scenario is illustrated, and I have divided the circle into 4 sectors, each slice represents a particular bike and the percentage of distance traveled by it. Now, if you have noticed these slices adds up to 24 hrs, but the calculation of pie slices is done automatically for you. In this way, the pie charts are really useful as you don’t have to be the one who calculates the percentage of the slice of the pie.

Matplotlib: 3D Plot
Plotting of data along x, y, and z axes to enhance the display of data represents the 3-dimensional plotting. 3D plotting is an advanced plotting technique that gives us a better view of the data representation along the three axes of the graph.

Line Plot 3D

import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
x = [1,2,3,4,5]
y = [50,40,70,80,20]
y2 = [80,20,20,50,60]
y3 = [70,20,60,40,60]
y4 = [80,20,20,50,60]
plt.plot(x,y,'g',label='Enfield', linewidth=5)
plt.plot(x,y2,'c',label='Honda',linewidth=5)
plt.plot(x,y3,'r',label='Yahama',linewidth=5)
plt.plot(x,y4,'y',label='KTM',linewidth=5)
plt.title('bike details in line plot')
plt.ylabel('Distance in kms')
plt.xlabel('Days')
plt.legend()
plt.show()

In the above-represented 3D graph, a line graph is illustrated in a 3-dimensional manner. We make use of a special library to plot 3D graphs which are given in the following syntax.
Syntax for plotting 3D graphs:

import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure()
ax = fig.add_subplot(111, projection=’3d’)

The import Axes3D is mainly used to create an axis by making use of the projection=3d keyword. This enables a 3-dimensional view of any data that can be written along with the above-mentioned code.

Surface Plot 3D

Axes3D.plot_surface(X, Y, Z, *args, **kwargs)

By default, it will be colored in shades of a solid color, but it also supports color mapping by supplying the cmap argument.

The rstride and cstride kwargs set the stride used to sample the input data to generate the graph. If 1k by 1k arrays are passed in, the default values for the strides will result in a 100×100 grid being plotted. Defaults to 10. Raises a ValueError if both stride and count kwargs (see next section) are provided.

from matplotlib import cm
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure()
ax = fig.gca(projection='3d')
ax.set_xlabel('$X-axis$') 
ax.set_ylabel('$Y-axis$') 
ax.set_zlabel('$Z-axis$')
X = np.arange(-5, 5, 0.25)
Y = np.arange(-5, 5, 0.25)
X, Y = np.meshgrid(X, Y)
R = np.sqrt(X**2 + Y**2)
Z = np.sin(R)
surf = ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap=cm.coolwarm)
plt.show()

Matplotlib: Image Plot

from pylab import *
from numpy import NaN

xmin, xmax, ymin, ymax = -2, 0.8, -1.5, 1.5
max_it = 100    # maximum number of iterations
px     = 3000	# vertical lines
res    = (ymax - ymin) / px   # grid resolution

figure(figsize = (10, 10))

def m(c):
	z = 0
	for n in range(1, max_it + 1):
		z = z**2 + c
		if abs(z) > 2:
			return n
	return NaN

X = arange(xmin, xmax + res, res)
Y = arange(ymin, ymax + res, res)
Z = zeros((len(Y), len(X)))

for iy, y in enumerate(Y):
	print (iy + 1, "of", len(Y))
	for ix, x in enumerate(X):
		Z[-iy - 1, ix] = m(x + 1j * y)

save("mandel", Z)	# save array to file

imshow(Z, cmap = plt.cm.prism, interpolation = 'none',
  extent = (X.min(), X.max(), Y.min(), Y.max()))
xlabel("Re(c)")
ylabel("Im(c)")
savefig("mandelbrot_python.svg")
show()

Matplotlib: Working With Multiple Plots
I have discussed multiple types of plots in python matplotlib such as bar plot, scatter plot, pie plot, area plot, etc. Now, let me show you how to handle multiple plots.

import numpy as np
import matplotlib.pyplot as plt
 
def f(t):
    return np.exp(-t) * np.cos(2*np.pi*t)
t1 = np.arange(0.0, 5.0, 0.1)
t2 = np.arange(0.0, 5.0, 0.02)
t3 = np.arange(0.0, 5.0, 0.02)

plt.subplot(221)
plt.plot(t1, f(t1), 'bo', t2, f(t2))
plt.subplot(222)
plt.plot(t2, np.cos(2*np.pi*t2))
plt.subplot(223)
plt.plot(t3, np.tan(2*np.pi*t2))
plt.show()

Data Analysis with Pandas & Python

What is Data Analysis?
Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. In today’s business world, data analysis plays a role in making decisions more scientific and helping businesses operate more effectively
Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real-world data analysis in Python.
In this article, I have used Pandas to know more about doing data analysis.
Mainly pandas have two data structures, series, data frames, and Panel.

Installation
The easiest way to install pandas is to use pip:

pip install pandas

or, Download it from here.

  • pandas Series

pandas series can be used for the one-dimensional labeled array.

import pandas as pd
index_list = ['test1', 'test2', 'test3', 'test4']
a = pd.Series([100, 98.7, 98.4, 97.7],index=index_list)
print(a)
output:
test1    100.0
test2 98.7
test3 98.4
test4 97.7
dtype: float64

Labels can be accessed using index attribute
print(a.index)

Index(['test1', 'test2', 'test3', 'test4'], dtype='object')

You can use array indexing or labels to access data in the series.
You can use array indexing or labels to access data in the series
print(a[1])
print(a[‘test4’])

98.7
97.7

You can also apply mathematical operations on pandas series.
b = a * 2
c = a ** 1.5
print(b)
print(c)

test1 200.0
test2 197.4
test3 196.8
test4 195.4
dtype: float64

test1 1000.000000
test2 980.563513
test3 976.096258
test4 965.699142
dtype: float64

You can even create a series of heterogeneous data.
s = pd.Series([‘test1’, 1.2, 3, ‘test2’], index=[‘test3’, ‘test4’, 2, ‘4.3’])

print(s)

test3   test1
test4   1.2
2       3
4.3     test2
dtype: object
  • pandas DataFrame

pandas DataFrame is a two-dimensional array with heterogeneous data.i.e., data is aligned in a tabular fashion in rows and columns.
Structure
Let us assume that we are creating a data frame with the student’s data.

Name Age Gender Rating
Steve 32 Male 3.45
Lia 28 Female 4.6
Vin 45 Male 3.9
Katie 38 Female 2

You can think of it as an SQL table or a spreadsheet data representation.
The table represents the data of a sales team of an organization with their overall performance rating. The data is represented in rows and columns. Each column represents an attribute and each row represents a person.
The data types of the four columns are as follows −

Column Type
Name String
Age Integer
Gender String
Rating Float

Key Points
• Heterogeneous data
• Size Mutable
• Data Mutable

A pandas DataFrame can be created using the following constructor −
pandas.DataFrame( data, index, columns, dtype, copy)

•  data
data takes various forms like ndarray, series, map, lists, dict, constants and also another DataFrame.
•  index
For the row labels, the Index to be used for the resulting frame is Optional Default np.arrange(n) if no index is passed.
•  columns
For column labels, the optional default syntax is – np.arrange(n). This is only true if no index is passed.
•  dtype
The data type of each column.
•  copy
This command (or whatever it is) is used for copying of data if the default is False.

There are many methods to create DataFrames.
• Lists
• dict
• Series
• Numpy ndarrays
• Another DataFrame

Creating DataFrame from the dictionary of Series
The following method can be used to create DataFrames from a dictionary of pandas series.

import pandas as pd
index_list = ['test1', 'test2', 'test3', 'test4']
a = {"column1": pd.Series([100, 98.7, 98.4, 97.7],index=index_list), "column2": pd.Series([100, 100, 100, 85.4], index=index_list)}
df = pd.DataFrame(a)

print(df)

      column1  column2
test1 100.0    100.0
test2 98.7     100.0
test3 98.4     100.0
test4 97.7     85.4

print(df.index)

Index(['test1', 'test2', 'test3', 'test4'], dtype='object')

print(df.columns)

Index(['column1', 'column2'], dtype='object')

Creating DataFrame from list of dictionaries
l = [{‘orange’: 32, ‘apple’: 42}, {‘banana’: 25, ‘carrot’: 44, ‘apple’: 34}]
df = pd.DataFrame(l, index=[‘test1’, ‘test2’])

print(df)

        apple  banana  carrot  orange
test1     42     NaN     NaN    32.0

test2     34    25.0    44.0     NaN

You might have noticed that we got a DataFrame with NaN values in it. This is because we didn’t the data for that particular row and column.

Creating DataFrame from Text/CSV files
Pandas tool comes in handy when you want to load data from a CSV or a text file. It has built-in functions to do this for use.

df = pd.read_csv(‘happiness.csv’)

Yes, we created a DataFrame from a CSV file. This dataset contains the outcome of the European quality of life survey. This dataset is available here. Now we have stored the DataFrame in df, we want to see what’s inside. First, we will see the size of the DataFrame.

print(df.shape)

(105, 4)

It has 105 Rows and 4 Columns. Instead of printing out all the data, we will see the first 10 rows.
df.head(10)

   Country  Gender  Mean    N=
0      AT    Male   7.3   471
1     NaN  Female   7.3   570
2     NaN    Both   7.3  1041
3      BE    Male   7.8   468
4     NaN  Female   7.8   542
5     NaN    Both   7.8  1010
6      BG    Male   5.8   416
7     NaN  Female   5.8   555
8     NaN    Both   5.8   971
9      CY    Male   7.8   433

There are many more methods to create a DataFrames. But now we will see the basic operation on DataFrames.

Operations on DataFrame
We’ll recall the DataFrame we made earlier.

import pandas as pd
index_list = ['test1', 'test2', 'test3', 'test4']
a = {"column1": pd.Series([100, 98.7, 98.4, 97.7],index=index_list), "column2": pd.Series([100, 100, 100, 85.4], index=index_list)}
df = pd.DataFrame(a)

print(df)

      column1 column2
test1 100.0   100.0
test2 98.7    100.0
test3 98.4    100.0
test4 97.7    85.4

Now we want to create a new row column from current columns. Let’s see how it is done.
df[‘column3’] = (2 * df[‘column1’] + 3 * df[‘column2’])/5

        column1  column2  column3
test1    100.0    100.0   100.00
test2     98.7    100.0    99.48
test3     98.4    100.0    99.36
test4     97.7     85.4    90.32

We have created a new column column3 from column1 and  column2. We’ll create one more using boolean.
df[‘flag’] = df[‘column1’] > 99.5

We can also remove columns.
column3 = df.pop(‘column3’)

print(column3)

test1    100.00
test2     99.48
test3     99.36
test4     90.32
Name: column3, dtype: float64

print(df)

       column1  column2   flag
test1    100.0    100.0   True
test2     98.7    100.0  False
test3     98.4    100.0  False
test4     97.7     85.4  False

Descriptive Statistics using pandas
It’s very easy to view descriptive statistics of a dataset using pandas. We are gonna use, Biomass data collected from this source. Let’s load the data first.

url = ‘https://raw.github.com/vincentarelbundock/Rdatasets/master/csv/DAAG/biomass.csv’
df = pd.read_csv(url)
df.head()

     Unnamed:0  dbh  wood   bark    root   rootsk  branch species     fac26
0          1    90   5528.0  NaN   460.0   NaN      NaN   E. maculata    z
1          2   106   13650.0 NaN  1500.0   665.0    NaN   E. Pilularis   2
2          3   112   11200.0 NaN  1100.0   680.0    NaN   E. Pilularis   2
3          4    34   1000.0  NaN   430.0    40.0    NaN   E. Pilularis   2
4          5   130   NaN     NaN  3000.0  1030.0    NaN   E. maculata    z

We are not interested in the unnamed column. So, let’s delete that first. Then we’ll see the statistics with one line of code.

          dbh        wood      bark        root        rootsk        branch
count 153.000000 133.000000   17.000000   54.000000   53.000000   76.000000
mean  26.352941  1569.045113  513.235294  334.383333  113.802264  54.065789
std   28.273679  4071.380720  632.467542  654.641245  247.224118  65.606369
min   3.000000   3.000000     7.000000    0.300000    0.050000    4.000000
25%   8.000000   29.000000    59.000000   11.500000   2.000000    10.750000
50%   15.000000  162.000000   328.000000  41.000000   11.000000   35.000000
75%   36.000000  1000.000000  667.000000  235.000000  45.000000   77.750000
max   145.000000 25116.000000 1808.000000 3000.000000 1030.000000 371.000000

It’s simple as that. We can see all the statistics. Count, mean, standard deviation and other statistics. Now we are gonna find some other metrics which are not available in the describe() summary.

Mean :
print(df.mean())

dbh         26.352941
wood      1569.045113
bark       513.235294
root       334.383333
rootsk     113.802264
branch      54.065789
dtype: float6

Min and Max
print(df.min())

dbh                      3
wood                     3
bark                     7
root                   0.3
rootsk                0.05
branch                   4
species    Acacia mabellae
dtype: object

print(df.max())

dbh          145
wood       25116
bark        1808
root         3000
rootsk      1030
branch      371
species    Other
dtype: object

Pairwise Correlation
df.corr()

             dbh       wood      bark      root    rootsk    branch
dbh     1.000000   0.905175  0.965413  0.899301  0.934982  0.861660
wood    0.905175   1.000000  0.971700  0.988752  0.967082  0.821731
bark    0.965413   0.971700  1.000000  0.961038  0.971341  0.943383
root    0.899301   0.988752  0.961038  1.000000  0.936935  0.679760
rootsk  0.934982   0.967082  0.971341  0.936935  1.000000  0.621550
branch  0.861660   0.821731  0.943383  0.679760  0.621550  1.000000

Data Cleaning
We need to clean our data. Our data might contain missing values, NaN values, outliers, etc. We may need to remove or replace that data. Otherwise, our data might make any sense.
We can find null values using the following method.

print(df.isnull().any())

dbh        False
wood        True
bark        True
root        True
rootsk      True
branch      True
species    False
fac26       True
dtype: bool

We have to remove these null values. This can be done by the method shown below.

newdf = df.dropna()

print(newdf.shape)

     dbh   wood   bark  root  rootsk   branch        species  fac26
123   27  550.0  105.0  44.0     9.0    59.0   B. myrtifolia     z
124   26  414.0   78.0  38.0    13.0    44.0   B. myrtifolia     z
125    9   42.0    8.0   5.0     1.3     7.0   B. myrtifolia     z
126   12   85.0   13.0  17.0     2.2    16.0   B. myrtifolia     z

print(newdf.shape)

(4, 8)

Pandas .Panel()
A panel is a 3D container of data. The term Panel data is derived from econometrics and is partially responsible for the name pandas − pan(el)-da(ta)-s.
The names for the 3 axes are intended to give some semantic meaning to describing operations involving panel data. They are −
• items − axis 0, each item corresponds to a DataFrame contained inside.
• major_axis − axis 1, it is the index (rows) of each of the DataFrames.
• minor_axis − axis 2, it is the columns of each of the DataFrames.

A Panel can be created using the following constructor −
The parameters of the constructor are as follows −
• data – Data takes various forms like ndarray, series, map, lists, dict, constants and also another DataFrame
• items – axis=0
• major_axis – axis=1
• minor_axis – axis=2
• dtype – the Data type of each column
• copy – Copy data. Default, false

A Panel can be created using multiple ways like −
• From ndarrays
• From dict of DataFrames
• From 3D ndarray

# creating an empty panel
import pandas as pd
import numpy as np
data = np.random.rand(2,4,5)
p = pd.Panel(data)

print(p)

output:
Dimensions: 2 (items) x 4 (major_axis) x 5 (minor_axis)
Items axis: 0 to 1
Major_axis axis: 0 to 3
Minor_axis axis: 0 to 4

Note − Observe the dimensions of the empty panel and the above panel, all the objects are different.

From dict of DataFrame Objects

#creating an empty panel
import pandas as pd
import numpy as np
data = {'Item1' : pd.DataFrame(np.random.randn(4, 3)),
'Item2' : pd.DataFrame(np.random.randn(4, 2))}
p = pd.Panel(data)

print(p)

output:
Dimensions: 2 (items) x 4 (major_axis) x 3 (minor_axis)
Items axis: Item1 to Item2
Major_axis axis: 0 to 3
Minor_axis axis: 0 to 2

Selecting the Data from Panel
Select the data from the panel using −
• Items
• Major_axis
• Minor_axis

Using Items

# creating an empty panel
import pandas as pd
import numpy as np
data = {'Item1' : pd.DataFrame(np.random.randn(4, 3)),
'Item2' : pd.DataFrame(np.random.randn(4, 2))}
p = pd.Panel(data)

print p[‘Item1’]

output:
        0          1          2
0 -0.006795 -1.156193 -0.524367
1 0.025610 1.533741 0.331956
2 1.067671 1.309666 1.304710
3 0.615196 1.348469 -0.410289

We have two items, and we retrieved item1. The result is a DataFrame with 4 rows and 3 columns, which are the Major_axis and Minor_axis dimensions.

Using major_axis
Data can be accessed using the method panel.major_axis(index).

     Item1     Item2
0 0.027133 -1.078773
1 0.115686 -0.253315
2 -0.473201 NaN

Using minor_axis
Data can be accessed using the method panel.minor_axis(index).

import pandas as pd
import numpy as np
data = {'Item1' : pd.DataFrame(np.random.randn(4, 3)),
'Item2' : pd.DataFrame(np.random.randn(4, 2))}
p = pd.Panel(data)

print(p.minor_xs(1))

Item1      Item2
0 0.092727 -1.633860
1 0.333863 -0.568101
2 0.388890 -0.338230
3 -0.618997 -1.01808

 

Task Notification Bot for slack with Django

Slack is a great platform for team collaboration not just that it also has one of the best API interfaces to build Chatbots.

In this post, I will walk you through building a minimal Slack Bot with Django backend. The idea is to set up a Slack Bot that will notify event when greeted through a backend.

Before we start let us understand the Slack bots life cycle:

  • If you are new to Slack, It’s a messaging platform focused on team collaboration. Slack lets you create custom applications, including bots (sort of Messaging Platform as a Service). You will run the application back end to process business logic in your own server.
  • To start with, you need to be part of the Slack team and have admin privilege to create a new Slack App. If you are not part of a Slack team you may create one.
  • GIve the name of your company or team.

  • Enter Channel Name.
  • Click on See your channel in slack
  • We will create a Slack App for the Slack team then we will add a Bot User to the app.
  • We will create a Django based backend web application to post the messages into slack.
  • After setting up the Slack App and have the backend ready to notified events.

Create a Slack App

Start by creating a Slack app here, click Create App. Then proceed with app creation, give it a name and select the Slack team.

Then you will be taken to App configuration where you need do following to get our Bot up and running.

  1. Create a Bot User
  2. Install Slack App to your Team

Create a BOT User

On the left pane click on Bot User then choose a user name for the Bot and set “Always Show My Bot as Online” slider to on. Click on Add Bot User to create our shipment bot.

Install Slack App to Team

Now on the left pane click Install App and install the app to your Slack team.

Once installed you will get Bot User OAuth Access Token, note down this token we will need it later while configuring Django app. This token is the passphrase for our Bot to interact with the Slack Team.

Slack Client Credentials

Also, note down the App Credentials from Basic Information on the left pane. These credentials let us talk to Slack API, so every time we send a message to Slack we should send our Client ID(CLIENT_ID) & Client Secret(CLIENT_SECRET) to identify ourselves to Slack. Similarly, we can verify if an incoming message is sent by Slack checking if the Verification Token (VERIFICATION_TOKEN) in the message is the same as the one in App Credentials.

Now we should have four key values with us.

  1. Client ID — SLACK_CLIENT_ID/li>
  2. Client Secret — SLACK_CLIENT_SECRET
  3. Verification Token — SLACK_VERIFICATION_TOKEN
  4. Bot User Token — SLACK_BOT_USER_TOKEN

Environment Setup

Let us create a new virtual environment “venv” for our project with python version 3.6.x and activate the virtual environment.

virtualenv venv –python=python3.6

You need to activate the virtual environment before installation of other dependencies.

source venv/bin/activate

Now let’s install required packages

pip install django
pip install slacker
pip install slacker-log-handler

Create a Django Application

django-admin startproject shipment_portal
cd shipment
django-admin startapp shipment

Configure Django Settings

we need to add our own application shipment as a dependency. Add the line mentioned below in the file slack/settings.py

# slack/settings.py

INSTALLED_APPS = [
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
'shipment', # <== add this line
]

Then add following configurations in slack_bot/settings.py with your authentication keys from Slack.

# slack/settings.py

# SLACK API Configurations
# ----------------------------------------------
# use your keys
SLACK_CLIENT_ID = '20xxxxxxxxxx.20xxxxxxxxxx'
SLACK_CLIENT_SECRET = 'd29fe85a95c9xxxxxxxxxxxxxxxxxxxxx'
SLACK_VERIFICATION_TOKEN = 'xpxxxxxxxxxxxxxxxxxxxxxxxxx'
SLACK_BOT_USER_TOKEN = 'xoxb-xxxxxxxxxx-xxxxxxxxxxxxxxxxxxxxxxxxx'

Now start the Django development server

python manage.py runserver

Once the server is started it will print something similar to this

Performing system checks…
System check identified no issues (0 silenced).
You have 13 unapplied migration(s). Your project may not work properly until you apply the migrations for app(s): admin, auth, contenttypes, sessions.
Run ‘python manage.py migrate’ to apply them.
July 03, 2017–17:30:32
Django version 1.11.3, using settings ‘shipment_portal.settings’
Starting development server at http://127.0.0.1:8000/
Quit the server with CONTROL-C.

Ignore the migration warnings and open the URL in your browser.

Create an API endpoint

Now that we have our app server up and running we need to create an endpoint for Slack to send event messages. We will create an API view with Django as follows:

Shipment/view.py

class ShipmentCreate(LoginRequiredMixin, View):

def post(self, request, *args, **kwargs):
try:
user = request.user.id
===============
Your code goes here............
===============
response = {
'status': 200,
'type': '+OK',
'message': 'Shipment created',
}
message = 'Shipment created by' + user
send_notification(message, channel='#general')

except Exception as error:
response = {
'status': 500,
'type': '-ERR',
'message': 'Internal Server Error',
}
return JsonResponse(response, status=response.get('status'))

Configure Django Routes

If you are new to web applications, routing is the way to tell web server which functions to invoke when an URL is hit with a request. When the URL is hit with a request message the corresponding function will be invoked, passing the requester message as a parameter to the function.
Add following lines in shipment/urls.py to tie shipment API class to http://localhost:8000/shipment/

from .views import *
from django.conf.urls import url
urlpatterns = [
 url(r'^shipment/$', Shipment.as_view(), name='shipment'),
 ]

slackapi.py

Functions written in slackapi.py are used to post notification/messages to slack.

import os
from datetime import datetime
from slacker import Slacker
from django.conf import settings
slack = Slacker(settings.SLACK_SECRET_KEY)
TEST_CHANNEL = os.environ.get('TEST_SLACK_CHANNEL', None)
channel_details = {
"#general": {
"username": 'shipment-' + settings.ENVIRON,
"icon_url": None,
"icon_emoji": ":email:"
},
"@Jeenal": {},
}
def post_to_slack(message, channel, username=None, icon_url=None, icon_emoji=None):
try:
channel = TEST_CHANNEL or channel
channel_info = channel_details.get(channel, dict())
slack.chat.post_message(
channel=channel,
text=message,
username=username or channel_info.get("username"),
icon_url=icon_url or ((not icon_emoji) and channel_info.get("icon_url")) or None,
icon_emoji=icon_emoji or ((not icon_url) and channel_info.get("icon_emoji")) or None,
as_user=False
)
except Exception as e:
slack_logger.error('Error post message to slack\n', exc_info=True)

def send_notification(message, channel):
try:
post_to_slack(message, channel)
except Exception as e:
slack_logger.error('Error send notification to slack\n', exc_info=True)