Tag Archives: python

How to make hello world program in wxPython

In this article we will look at creating a simple hello world program using wxPython. This program will create and display simple window with a big button on it. Up on clicking the button program will exit. Use the following code to create hello world program. You must already have wxPython library installed

Hello world program

import wx

class MainWindow(wx.Frame):
    def __init__(self,parent):
        wx.Frame.__init__(self,parent,title="Hello World")
        self.killButton = wx.Button(self,label="Kill Me")

    def kill(self,event):
        print("Bye Bye cruel world")
app = wx.App(False)
frame = MainWindow(None)


We will go line by line here and try to explain what’s going on in this program. Most of the lines of self explanatory. If you are just getting started in programming the following explanation will be helpful.

  1. Import wxpython library
  2. Inherit from wx.Frame class. This is useful style for most of the programs that you will build. You can create one base frame or window and put rest of GUI widgets on top of it like Text controls, buttons,images, tables etc.
  3. Instantiate the inherited frame the desired title. parent argument is usually None for main windows.
  4. Create a button with label “Kill Me”. The first argument is parent. In this case we use “self” which is the main window we have just created.
  5. Bind the button click event (EVT_BUTTON) of the killButton to kill method. Whenever, EVT_BUTTON event is fired aka the killButton is clicked, kill method will be called.
  6. This line will cause the window to get displayed on screen. It’s customary to call this method after being done with construction of GUI i.e. create main window, place widgets, bind event like we did here.
  7. Create wxPython application by call wx.App. Every wxPython program must have this application.
  8. Start the main loop. Which will hand over control to wxPython library. This post explains why main loop has to be called.


This program will launch the following window. The button takes all the available space on the window since there are no other widgets. You need a few more lines of code to make the button look like what users are used to – small and horizontal. You can exit the program by clicking the button.

How to install wxpython

In this post we will go over the topic of easy way to install wxpython. The following command will install wxpython easily in python3.

pip install wxpython

Python 2

Older versions of wxpython can be installed by downloading binaries (for windows) from sourceforge . The binaries are available for both Windows and MacOS.


The pip install command should work on linux also. However, if you are stuck on old linux vesions. You can try installing wxpython using your distro’s package manager like apt-get or yum. You can search for exact version and package name available on your platform using the following commands for debian variants.

apt-cache search wxpython

└─$ apt-cache search wxpython
gnumed-client - medical practice management - Client
psychopy - environment for creating psychology stimuli in Python
pyscanfcs - scientific tool for perpendicular line scanning FCS
python3-genx - differential evolution algorithm for fitting
python3-opengl - Python bindings to OpenGL (Python 3)
python3-pyface - traits-capable windowing framework
python3-squaremap - wxPython control to display hierarchic data as nested squares
python3-wxgtk-media4.0 - Python 3 interface to the wxWidgets Cross-platform C++ GUI toolkit (wx.media)
python3-wxgtk-webview4.0 - Python 3 interface to the wxWidgets Cross-platform C++ GUI toolkit (wx.html2)
python3-wxgtk4.0 - Python 3 interface to the wxWidgets Cross-platform C++ GUI toolkit
soundgrain - Graphical interface to control granular sound synthesis modules
wxglade - GUI designer written in Python with wxPython
wxpython-tools - Tools from the wxPython distribution

Successful installation

Use the following command to check if the installation is successful or not.

PS C:\Users\godson> python
Python 3.8.6 (tags/v3.8.6:db45529, Sep 23 2020, 15:52:53) [MSC v.1927 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import wx
>>> wx.version()
'4.1.1 msw (phoenix) wxWidgets 3.1.5'

As shown above it should print current version of wxpython installed if the installation is successful.

A more extensive tutorial is available here

Manhole service in Twisted Application.

What is Manhole?

Manhole is an in-process service, that will accept UNIX domain socket connections and present the stack traces for all threads and an interactive prompt.

Using it we can access and modify objects or definition in the running application, like change or add the method in any class, change the definition of any method of class or module.

This allows us to make modifications in running an application without restarting the application, it makes work easy like debugging the application, you are able to check the values of the object while the program is running.

How to configure it?

from twisted.internet import reactor
from twisted.conch import manhole, manhole_ssh
from twisted.conch.ssh.keys import Key
from twisted.cred import portal, checkers

DATA = {"Service": "Manhole"}

def get_manhole_factory(namespace, **passwords):

    def get_manhole(arg):
        return manhole.ColoredManhole(namespace)
    realm = manhole_ssh.TerminalRealm()
    realm.chainedProtocolFactory.protocolFactory = get_manhole
    p = portal.Portal(realm)
    f = manhole_ssh.ConchFactory(p)
    f.publicKeys = {"ssh-rsa": Key.fromFile("keys/manhole.pub")}
    f.privateKeys = {"ssh-rsa": Key.fromFile("keys/manhole")}
    return f

reactor.listenTCP(2222, get_manhole_factory(globals(), admin='admin'))

Once you run above snippet, the service will start on TCP port 2222.

You need to use SSH command to get login into the service.

See below how it looks like.

[lalit : ~]₹ ssh admin@localhost -p 2222
admin@localhost's password:
>>> dir() 
['DATA', '__builtins__', '__doc__', '__file__', '__name__', '__package__', 'checkers', 'get_manhole_factory', 'manhole', 'manhole_ssh', 'portal', 'reactor'] 
>>> DATA 
{'Service': 'Manhole'}
>>> DATA['Service'] = "Edited" 
>>> DATA 
{'Service': 'Edited'}
[lalit : ~]₹ ssh admin@localhost -p 2222
admin@localhost's password: 
>>> dir() 
['DATA', '__builtins__', '__doc__', '__file__', '__name__', '__package__', 'checkers', 'get_manhole_factory', 'manhole', 'manhole_ssh', 'portal', 'reactor'] 
>>> DATA 
{'Service': 'Edited'} 

Here In the first login, we change the value in DATA dictionary in running application, as we can see we get the new value in the second login.

Simple port scanner in python

a port scanner is an application designed to probe a server or host for open ports. Such an application may be used by administrators to verify the security policies of their networks and by attackers to identify network services running on a host and exploit vulnerabilities.


#!/usr/bin/env python2
from socket import * 

if __name__ == '__main__':
    target = raw_input('Enter host to scan: ')
    targetIP = gethostbyname(target)
    print 'Starting scan on host ', targetIP

    #scan reserved ports
    for i in range(20, 1025):
        s = socket(AF_INET, SOCK_STREAM)

        result = s.connect_ex((targetIP, i))

        if(result == 0) :
            print 'Port %d: OPEN' % (i,)


Python Matplotlib Library with Examples

What Is Python Matplotlib?

Matplotlib is a plotting library for the Python programming language and its numerical mathematics extension NumPy. It provides an object-oriented API for embedding plots into applications using general-purpose GUI toolkits like Tkinter, wxPython, Qt, or GTK+.

Pyplot is a Matplotlib module which provides a MATLAB-like interface. Matplotlib is designed to be as usable as MATLAB, with the ability to use Python and the advantage of being free and open-source. matplotlib.pyplot is a plotting library used for 2D graphics in the python programming language. It can be used in python scripts, shell, web application servers, and other graphical user interface toolkits.

There are several toolkits that are available that extend python Matplotlib functionality.

  • Basemap: It is a map plotting toolkit with various map projections, coastlines, and political boundaries.
  • Cartopy: It is a mapping library featuring object-oriented map projection definitions, and arbitrary point, line, polygon and image transformation capabilities.
  • Excel tools: Matplotlib provides utilities for exchanging data with Microsoft Excel.
    Mplot3d: It is used for 3-D plots.
  • Natgrid: It is an interface to the “natgrid” library for irregular gridding of the spaced data.
  • GTK tools: mpl_toolkits.gtktools provides some utilities for working with GTK. This toolkit ships with matplotlib, but requires pygtk.
  • Qt interface
  • Mplot3d: The mplot3d toolkit adds simple 3D plotting capabilities to matplotlib by supplying an axes object that can create a 2D projection of a 3D scene.
  • matplotlib2tikz: export to Pgfplots for smooth integration into LaTeX documents.

Types of Plots
There are various plots which can be created using python Matplotlib. Some of them are listed below:

  • Bar Graph
  • Histogram
  • Scatter Plot
  • Line Plot
  • 3D plot
  • Area Plot
  • Pie Plot
  • Image Plot

We will demonstrate some of them in detail.

But before that, let me show you elementary codes in python matplotlib in order to generate a simple graph.

from matplotlib import pyplot as plt
#Plotting to our canvas
#Showing what we plotted

So, with three lines of code, you can generate a basic graph using python matplotlib.

Let us see how can we add title, labels to our graph created by python matplotlib library to bring in more meaning to it. Consider the below example:

from matplotlib import pyplot as plt
plt.ylabel('Y axis')
plt.xlabel('X axis')

You can even try many styling techniques to create a better graph by changing the width or color of a particular line or what if you want to have some grid lines, there you need styling!

The style package adds support for easy-to-switch plotting “styles” with the same parameters as a matplotlibrc file.

There are a number of pre-defined styles provided by matplotlib. For example, there’s a pre-defined style called “ggplot”, which emulates the aesthetics of ggplot (a popular plotting package for R). To use this style, just add:

import matplotlib.pyplot as plt

To list all available styles, use:

['seaborn-darkgrid', 'Solarize_Light2', 'seaborn-notebook', 'classic', 'seaborn-ticks', 'grayscale', 'bmh', 'seaborn-talk', 'dark_background', '
ggplot', 'fivethirtyeight', '_classic_test', 'seaborn-colorblind', 'seaborn-deep', 'seaborn-whitegrid', 'seaborn-bright', 'seaborn-poster', 'sea
born-muted', 'seaborn-paper', 'seaborn-white', 'fast', 'seaborn-pastel', 'seaborn-dark', 'tableau-colorblind10', 'seaborn', 'seaborn-dark-palett

So, let me show you how to add style to a graph using python matplotlib. First, you need to import the style package from python matplotlib library and then use styling functions as shown in below code:

from matplotlib import pyplot as plt
from matplotlib import style
x = [5,8,10]
y = [12,16,6]
x2 = [6,9,11]
y2 = [6,15,7]
plt.plot(x,y,'r-o',label='line one', linewidth=5)
plt.plot(x2,y2,'c',label='line two',linewidth=5)
plt.title('Epic Info')
plt.ylabel('Y axis')
plt.xlabel('X axis')

import numpy as np
import matplotlib.pyplot as plt
with plt.style.context(('dark_background')):
  plt.plot(np.sin(np.linspace(0, 2 * np.pi)), 'r-o')
# Some plotting code with the default style

Now, we will understand the different kinds of plots. Let’s start with the bar graph!

Matplotlib: Bar Graph
A bar graph uses bars to compare data among different categories. It is well suited when you want to measure the changes over a period of time. It can be plotted vertically or horizontally. Also, the vital thing to keep in mind is that longer the bar, the greater is the value. Now, let us practically implement it using python matplotlib.

from matplotlib import pyplot as plt
label="BMW", color='b', width=0.2)
label="Audi", color='r',width=0.2)
plt.ylabel('Distance (kms)')
plt.title('Bar Plot')

When I run this code, it generates a figure like below:

In the above plot, I have displayed a comparison between the distance covered by two cars BMW and Audi over a period of 5 days. Next, let us move on to another kind of plot using python matplotlib – Histogram

Matplotlib – Histogram
Let me first tell you the difference between a bar graph and a histogram. Histograms are used to show a graphical representation of the distribution of numerical data whereas a bar chart is used to compare different entities.

It is an estimate of the probability distribution of a continuous variable (quantitative variable) and was first introduced by Karl Pearson. It is a kind of bar graph.

To construct a histogram, the first step is to “bin” the range of values — that is, divide the entire range of values into a series of intervals — and then count how many values fall into each interval. The bins are usually specified as consecutive, non-overlapping intervals of a variable. The bins (intervals) must be adjacent and are often (but are not required to be) of equal size.

Basically, histograms are used to represent data given in the form of some groups or we can say when you have arrays or a very long list. X-axis is about bin ranges where Y-axis talks about frequency. So, if you want to represent the age-wise population in form of the graph then histogram suits well as it tells you how many exist in certain group range or bin if you talk in the context of histograms.

In the below code, I have created the bins in the interval of 10 which means the first bin contains elements from 0 to 9, then 10 to 19 and so on.

import matplotlib.pyplot as plt
population_age = [22,55,62,45,21,22,34,42,42,4,2,102,95,85,55,110,120,70,65,55,111,115,80,75,65,54,44,43,42,48]
bins = [0,10,20,30,40,50,60,70,80,90,100]
plt.hist(population_age, bins, rwidth=0.8)
plt.xlabel('age groups')
plt.ylabel('Number of people')

When I run this code, it generates a figure like below:

As you can see in the above plot, Y-axis tells about the age groups that appear with respect to the bins. Our biggest age group is between 40 and 50.

Matplotlib: Scatter Plot
A scatter plot is a type of plot that shows the data as a collection of points. The position of a point depends on its two-dimensional value, where each value is a position on either the horizontal or vertical dimension. Usually, we need scatter plots in order to compare variables, for example, how much one variable is affected by another variable to build a relation out of it.
Consider the below example:

import matplotlib.pyplot as plt
x = [1,1.5,2,2.5,3,3.5,3.6]
y = [7.5,8,8.5,9,9.5,10,10.5]
plt.scatter(x,y, label='high income low saving',color='r')
plt.scatter(x1,y1,label='low income high savings',color='b')
plt.title('Scatter Plot')

As you can see in the above graph, I have plotted two scatter plots based on the inputs specified in the above code. The data is displayed as a collection of points having ‘high-income low salary’ and ‘low-income high salary.’

Scatter plot with groups
Data can be classified into several groups. The code below demonstrates:

import numpy as np
import matplotlib.pyplot as plt

# Create data
N = 60
g1 = (0.6 + 0.6 * np.random.rand(N), np.random.rand(N))
g2 = (0.4+0.3 * np.random.rand(N), 0.5*np.random.rand(N))
g3 = (0.3*np.random.rand(N),0.3*np.random.rand(N))

data = (g1, g2, g3)
colors = ("red", "green", "blue")
groups = ("coffee", "tea", "water")

# Create plot
fig = plt.figure(figsize=(10,8))
ax = fig.add_subplot(1, 1, 1)

for data, color, group in zip(data, colors, groups):
   x, y = data
   ax.scatter(x, y, alpha=0.8, c=color, edgecolors='none', s=30, label=group)

plt.title('Matplot scatter plot')

The purpose of using “plt.figure()” is to create a figure object. It’s a Top-level container for all plot elements.

The whole figure is regarded as the figure object. It is necessary to explicitly use “plt.figure()”when we want to tweak the size of the figure and when we want to add multiple Axes objects in a single figure.

fig.add_subplot() is used to control the default spacing of the subplots.
For example, “111” means “1×1 grid, first subplot” and “234” means “2×3 grid, 4th subplot”.

You can easily understand by the following picture:

Next, let us understand the area plot or you can also say Stack plot using python matplotlib.

Matplotlib: Area Plot
Area plots are pretty much similar to the line plot. They are also known as stack plots. These plots can be used to display the evolution of the value of several groups on the same graphic. The values of each group are displayed on top of each other. It allows checking on the same figure the evolution of both the total of a numeric variable and the importance of each group.

A line chart forms the basis of an area plot, where the region between the axis and the line is represented by colors.

import numpy as np
import matplotlib.pyplot as plt

# Your x and y axis
y=[ [1,4,6,8,9], [2,2,7,10,12], [2,8,5,10,6] ]# Basic stacked area chart.
plt.stackplot(x,y, labels=['A','B','C'], colors=['m','c','r'])
plt.legend(loc='upper left')

import matplotlib.pyplot as plt
days = [1,2,3,4,5]
Enfield =[50,40,70,80,20]
Honda = [80,20,20,50,60]
Yahama =[70,20,60,40,60]
KTM = [80,20,20,50,60]
plt.stackplot(days, Enfield, Honda, Yahama, KTM,labels=['Enfield', 'Honda', 'Yahama', 'KTM'], colors=['r','c','y','m'])
plt.ylabel('Distance in kms')
plt.title('Bikes deatils in area plot')

The above-represented graph shows how an area plot can be plotted for the present scenario. Each shaded area in the graph shows a particular bike with the frequency variations denoting the distance covered by the bike on different days. Next, let us move to our last yet most frequently used plot – Pie chart.

Matplotlib: Pie Chart
In a pie plot, statistical data can be represented in a circular graph where the circle is divided into portions i.e. slices of pie that denote a particular data, that is, each portion is proportional to different values in the data. This sort of plot can be mainly used in mass media and business.

import matplotlib.pyplot as plt
slices = [8,5,5,6]
activities = ['Enfield','Honda','Yahama','KTM']
cols = ['c','g','y','b']
shadow= True,
plt.title('Bike details in Pie Plot')

In the above-represented pie plot, the bikes scenario is illustrated, and I have divided the circle into 4 sectors, each slice represents a particular bike and the percentage of distance traveled by it. Now, if you have noticed these slices adds up to 24 hrs, but the calculation of pie slices is done automatically for you. In this way, the pie charts are really useful as you don’t have to be the one who calculates the percentage of the slice of the pie.

Matplotlib: 3D Plot
Plotting of data along x, y, and z axes to enhance the display of data represents the 3-dimensional plotting. 3D plotting is an advanced plotting technique that gives us a better view of the data representation along the three axes of the graph.

Line Plot 3D

import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
x = [1,2,3,4,5]
y = [50,40,70,80,20]
y2 = [80,20,20,50,60]
y3 = [70,20,60,40,60]
y4 = [80,20,20,50,60]
plt.plot(x,y,'g',label='Enfield', linewidth=5)
plt.title('bike details in line plot')
plt.ylabel('Distance in kms')

In the above-represented 3D graph, a line graph is illustrated in a 3-dimensional manner. We make use of a special library to plot 3D graphs which are given in the following syntax.
Syntax for plotting 3D graphs:

import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure()
ax = fig.add_subplot(111, projection=’3d’)

The import Axes3D is mainly used to create an axis by making use of the projection=3d keyword. This enables a 3-dimensional view of any data that can be written along with the above-mentioned code.

Surface Plot 3D

Axes3D.plot_surface(X, Y, Z, *args, **kwargs)

By default, it will be colored in shades of a solid color, but it also supports color mapping by supplying the cmap argument.

The rstride and cstride kwargs set the stride used to sample the input data to generate the graph. If 1k by 1k arrays are passed in, the default values for the strides will result in a 100×100 grid being plotted. Defaults to 10. Raises a ValueError if both stride and count kwargs (see next section) are provided.

from matplotlib import cm
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure()
ax = fig.gca(projection='3d')
X = np.arange(-5, 5, 0.25)
Y = np.arange(-5, 5, 0.25)
X, Y = np.meshgrid(X, Y)
R = np.sqrt(X**2 + Y**2)
Z = np.sin(R)
surf = ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap=cm.coolwarm)

Matplotlib: Image Plot

from pylab import *
from numpy import NaN

xmin, xmax, ymin, ymax = -2, 0.8, -1.5, 1.5
max_it = 100    # maximum number of iterations
px     = 3000	# vertical lines
res    = (ymax - ymin) / px   # grid resolution

figure(figsize = (10, 10))

def m(c):
	z = 0
	for n in range(1, max_it + 1):
		z = z**2 + c
		if abs(z) > 2:
			return n
	return NaN

X = arange(xmin, xmax + res, res)
Y = arange(ymin, ymax + res, res)
Z = zeros((len(Y), len(X)))

for iy, y in enumerate(Y):
	print (iy + 1, "of", len(Y))
	for ix, x in enumerate(X):
		Z[-iy - 1, ix] = m(x + 1j * y)

save("mandel", Z)	# save array to file

imshow(Z, cmap = plt.cm.prism, interpolation = 'none',
  extent = (X.min(), X.max(), Y.min(), Y.max()))

Matplotlib: Working With Multiple Plots
I have discussed multiple types of plots in python matplotlib such as bar plot, scatter plot, pie plot, area plot, etc. Now, let me show you how to handle multiple plots.

import numpy as np
import matplotlib.pyplot as plt
def f(t):
    return np.exp(-t) * np.cos(2*np.pi*t)
t1 = np.arange(0.0, 5.0, 0.1)
t2 = np.arange(0.0, 5.0, 0.02)
t3 = np.arange(0.0, 5.0, 0.02)

plt.plot(t1, f(t1), 'bo', t2, f(t2))
plt.plot(t2, np.cos(2*np.pi*t2))
plt.plot(t3, np.tan(2*np.pi*t2))

Data Analysis with Pandas & Python

What is Data Analysis?
Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. In today’s business world, data analysis plays a role in making decisions more scientific and helping businesses operate more effectively
Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real-world data analysis in Python.
In this article, I have used Pandas to know more about doing data analysis.
Mainly pandas have two data structures, series, data frames, and Panel.

The easiest way to install pandas is to use pip:

pip install pandas

or, Download it from here.

  • pandas Series

pandas series can be used for the one-dimensional labeled array.

import pandas as pd
index_list = ['test1', 'test2', 'test3', 'test4']
a = pd.Series([100, 98.7, 98.4, 97.7],index=index_list)
test1    100.0
test2 98.7
test3 98.4
test4 97.7
dtype: float64

Labels can be accessed using index attribute

Index(['test1', 'test2', 'test3', 'test4'], dtype='object')

You can use array indexing or labels to access data in the series.
You can use array indexing or labels to access data in the series


You can also apply mathematical operations on pandas series.
b = a * 2
c = a ** 1.5

test1 200.0
test2 197.4
test3 196.8
test4 195.4
dtype: float64

test1 1000.000000
test2 980.563513
test3 976.096258
test4 965.699142
dtype: float64

You can even create a series of heterogeneous data.
s = pd.Series([‘test1’, 1.2, 3, ‘test2’], index=[‘test3’, ‘test4’, 2, ‘4.3’])


test3   test1
test4   1.2
2       3
4.3     test2
dtype: object
  • pandas DataFrame

pandas DataFrame is a two-dimensional array with heterogeneous data.i.e., data is aligned in a tabular fashion in rows and columns.
Let us assume that we are creating a data frame with the student’s data.

Name Age Gender Rating
Steve 32 Male 3.45
Lia 28 Female 4.6
Vin 45 Male 3.9
Katie 38 Female 2

You can think of it as an SQL table or a spreadsheet data representation.
The table represents the data of a sales team of an organization with their overall performance rating. The data is represented in rows and columns. Each column represents an attribute and each row represents a person.
The data types of the four columns are as follows −

Column Type
Name String
Age Integer
Gender String
Rating Float

Key Points
• Heterogeneous data
• Size Mutable
• Data Mutable

A pandas DataFrame can be created using the following constructor −
pandas.DataFrame( data, index, columns, dtype, copy)

•  data
data takes various forms like ndarray, series, map, lists, dict, constants and also another DataFrame.
•  index
For the row labels, the Index to be used for the resulting frame is Optional Default np.arrange(n) if no index is passed.
•  columns
For column labels, the optional default syntax is – np.arrange(n). This is only true if no index is passed.
•  dtype
The data type of each column.
•  copy
This command (or whatever it is) is used for copying of data if the default is False.

There are many methods to create DataFrames.
• Lists
• dict
• Series
• Numpy ndarrays
• Another DataFrame

Creating DataFrame from the dictionary of Series
The following method can be used to create DataFrames from a dictionary of pandas series.

import pandas as pd
index_list = ['test1', 'test2', 'test3', 'test4']
a = {"column1": pd.Series([100, 98.7, 98.4, 97.7],index=index_list), "column2": pd.Series([100, 100, 100, 85.4], index=index_list)}
df = pd.DataFrame(a)


      column1  column2
test1 100.0    100.0
test2 98.7     100.0
test3 98.4     100.0
test4 97.7     85.4


Index(['test1', 'test2', 'test3', 'test4'], dtype='object')


Index(['column1', 'column2'], dtype='object')

Creating DataFrame from list of dictionaries
l = [{‘orange’: 32, ‘apple’: 42}, {‘banana’: 25, ‘carrot’: 44, ‘apple’: 34}]
df = pd.DataFrame(l, index=[‘test1’, ‘test2’])


        apple  banana  carrot  orange
test1     42     NaN     NaN    32.0

test2     34    25.0    44.0     NaN

You might have noticed that we got a DataFrame with NaN values in it. This is because we didn’t the data for that particular row and column.

Creating DataFrame from Text/CSV files
Pandas tool comes in handy when you want to load data from a CSV or a text file. It has built-in functions to do this for use.

df = pd.read_csv(‘happiness.csv’)

Yes, we created a DataFrame from a CSV file. This dataset contains the outcome of the European quality of life survey. This dataset is available here. Now we have stored the DataFrame in df, we want to see what’s inside. First, we will see the size of the DataFrame.


(105, 4)

It has 105 Rows and 4 Columns. Instead of printing out all the data, we will see the first 10 rows.

   Country  Gender  Mean    N=
0      AT    Male   7.3   471
1     NaN  Female   7.3   570
2     NaN    Both   7.3  1041
3      BE    Male   7.8   468
4     NaN  Female   7.8   542
5     NaN    Both   7.8  1010
6      BG    Male   5.8   416
7     NaN  Female   5.8   555
8     NaN    Both   5.8   971
9      CY    Male   7.8   433

There are many more methods to create a DataFrames. But now we will see the basic operation on DataFrames.

Operations on DataFrame
We’ll recall the DataFrame we made earlier.

import pandas as pd
index_list = ['test1', 'test2', 'test3', 'test4']
a = {"column1": pd.Series([100, 98.7, 98.4, 97.7],index=index_list), "column2": pd.Series([100, 100, 100, 85.4], index=index_list)}
df = pd.DataFrame(a)


      column1 column2
test1 100.0   100.0
test2 98.7    100.0
test3 98.4    100.0
test4 97.7    85.4

Now we want to create a new row column from current columns. Let’s see how it is done.
df[‘column3’] = (2 * df[‘column1’] + 3 * df[‘column2’])/5

        column1  column2  column3
test1    100.0    100.0   100.00
test2     98.7    100.0    99.48
test3     98.4    100.0    99.36
test4     97.7     85.4    90.32

We have created a new column column3 from column1 and  column2. We’ll create one more using boolean.
df[‘flag’] = df[‘column1’] > 99.5

We can also remove columns.
column3 = df.pop(‘column3’)


test1    100.00
test2     99.48
test3     99.36
test4     90.32
Name: column3, dtype: float64


       column1  column2   flag
test1    100.0    100.0   True
test2     98.7    100.0  False
test3     98.4    100.0  False
test4     97.7     85.4  False

Descriptive Statistics using pandas
It’s very easy to view descriptive statistics of a dataset using pandas. We are gonna use, Biomass data collected from this source. Let’s load the data first.

url = ‘https://raw.github.com/vincentarelbundock/Rdatasets/master/csv/DAAG/biomass.csv’
df = pd.read_csv(url)

     Unnamed:0  dbh  wood   bark    root   rootsk  branch species     fac26
0          1    90   5528.0  NaN   460.0   NaN      NaN   E. maculata    z
1          2   106   13650.0 NaN  1500.0   665.0    NaN   E. Pilularis   2
2          3   112   11200.0 NaN  1100.0   680.0    NaN   E. Pilularis   2
3          4    34   1000.0  NaN   430.0    40.0    NaN   E. Pilularis   2
4          5   130   NaN     NaN  3000.0  1030.0    NaN   E. maculata    z

We are not interested in the unnamed column. So, let’s delete that first. Then we’ll see the statistics with one line of code.

          dbh        wood      bark        root        rootsk        branch
count 153.000000 133.000000   17.000000   54.000000   53.000000   76.000000
mean  26.352941  1569.045113  513.235294  334.383333  113.802264  54.065789
std   28.273679  4071.380720  632.467542  654.641245  247.224118  65.606369
min   3.000000   3.000000     7.000000    0.300000    0.050000    4.000000
25%   8.000000   29.000000    59.000000   11.500000   2.000000    10.750000
50%   15.000000  162.000000   328.000000  41.000000   11.000000   35.000000
75%   36.000000  1000.000000  667.000000  235.000000  45.000000   77.750000
max   145.000000 25116.000000 1808.000000 3000.000000 1030.000000 371.000000

It’s simple as that. We can see all the statistics. Count, mean, standard deviation and other statistics. Now we are gonna find some other metrics which are not available in the describe() summary.

Mean :

dbh         26.352941
wood      1569.045113
bark       513.235294
root       334.383333
rootsk     113.802264
branch      54.065789
dtype: float6

Min and Max

dbh                      3
wood                     3
bark                     7
root                   0.3
rootsk                0.05
branch                   4
species    Acacia mabellae
dtype: object


dbh          145
wood       25116
bark        1808
root         3000
rootsk      1030
branch      371
species    Other
dtype: object

Pairwise Correlation

             dbh       wood      bark      root    rootsk    branch
dbh     1.000000   0.905175  0.965413  0.899301  0.934982  0.861660
wood    0.905175   1.000000  0.971700  0.988752  0.967082  0.821731
bark    0.965413   0.971700  1.000000  0.961038  0.971341  0.943383
root    0.899301   0.988752  0.961038  1.000000  0.936935  0.679760
rootsk  0.934982   0.967082  0.971341  0.936935  1.000000  0.621550
branch  0.861660   0.821731  0.943383  0.679760  0.621550  1.000000

Data Cleaning
We need to clean our data. Our data might contain missing values, NaN values, outliers, etc. We may need to remove or replace that data. Otherwise, our data might make any sense.
We can find null values using the following method.


dbh        False
wood        True
bark        True
root        True
rootsk      True
branch      True
species    False
fac26       True
dtype: bool

We have to remove these null values. This can be done by the method shown below.

newdf = df.dropna()


     dbh   wood   bark  root  rootsk   branch        species  fac26
123   27  550.0  105.0  44.0     9.0    59.0   B. myrtifolia     z
124   26  414.0   78.0  38.0    13.0    44.0   B. myrtifolia     z
125    9   42.0    8.0   5.0     1.3     7.0   B. myrtifolia     z
126   12   85.0   13.0  17.0     2.2    16.0   B. myrtifolia     z


(4, 8)

Pandas .Panel()
A panel is a 3D container of data. The term Panel data is derived from econometrics and is partially responsible for the name pandas − pan(el)-da(ta)-s.
The names for the 3 axes are intended to give some semantic meaning to describing operations involving panel data. They are −
• items − axis 0, each item corresponds to a DataFrame contained inside.
• major_axis − axis 1, it is the index (rows) of each of the DataFrames.
• minor_axis − axis 2, it is the columns of each of the DataFrames.

A Panel can be created using the following constructor −
The parameters of the constructor are as follows −
• data – Data takes various forms like ndarray, series, map, lists, dict, constants and also another DataFrame
• items – axis=0
• major_axis – axis=1
• minor_axis – axis=2
• dtype – the Data type of each column
• copy – Copy data. Default, false

A Panel can be created using multiple ways like −
• From ndarrays
• From dict of DataFrames
• From 3D ndarray

# creating an empty panel
import pandas as pd
import numpy as np
data = np.random.rand(2,4,5)
p = pd.Panel(data)


Dimensions: 2 (items) x 4 (major_axis) x 5 (minor_axis)
Items axis: 0 to 1
Major_axis axis: 0 to 3
Minor_axis axis: 0 to 4

Note − Observe the dimensions of the empty panel and the above panel, all the objects are different.

From dict of DataFrame Objects

#creating an empty panel
import pandas as pd
import numpy as np
data = {'Item1' : pd.DataFrame(np.random.randn(4, 3)),
'Item2' : pd.DataFrame(np.random.randn(4, 2))}
p = pd.Panel(data)


Dimensions: 2 (items) x 4 (major_axis) x 3 (minor_axis)
Items axis: Item1 to Item2
Major_axis axis: 0 to 3
Minor_axis axis: 0 to 2

Selecting the Data from Panel
Select the data from the panel using −
• Items
• Major_axis
• Minor_axis

Using Items

# creating an empty panel
import pandas as pd
import numpy as np
data = {'Item1' : pd.DataFrame(np.random.randn(4, 3)),
'Item2' : pd.DataFrame(np.random.randn(4, 2))}
p = pd.Panel(data)

print p[‘Item1’]

        0          1          2
0 -0.006795 -1.156193 -0.524367
1 0.025610 1.533741 0.331956
2 1.067671 1.309666 1.304710
3 0.615196 1.348469 -0.410289

We have two items, and we retrieved item1. The result is a DataFrame with 4 rows and 3 columns, which are the Major_axis and Minor_axis dimensions.

Using major_axis
Data can be accessed using the method panel.major_axis(index).

     Item1     Item2
0 0.027133 -1.078773
1 0.115686 -0.253315
2 -0.473201 NaN

Using minor_axis
Data can be accessed using the method panel.minor_axis(index).

import pandas as pd
import numpy as np
data = {'Item1' : pd.DataFrame(np.random.randn(4, 3)),
'Item2' : pd.DataFrame(np.random.randn(4, 2))}
p = pd.Panel(data)


Item1      Item2
0 0.092727 -1.633860
1 0.333863 -0.568101
2 0.388890 -0.338230
3 -0.618997 -1.01808


How to configure django app using gunicorn?


Django is a python web framework used for developing web applications. It is fast, secure and scalable. Let us see how to configure the Django app using gunicorn.

Before proceeding to actual configuration, let us see some intro on the Gunicorn.


Gunicorn (Green Unicorn) is a WSGI (Web Server Gateway Interface) server implementation commonly used to run python web applications and implements PEP 3333 server standard specifications, therefore, it can run web applications that implement application interface. Web applications written in Django, Flask or Bottle implements application interface.


pip3 install gunicorn

Gunicorn coupled with Nginx or any web server works as a bridge between the web server and web framework. Web server (Nginx or Apache) can be used to serve static files and Gunicorn to handle requests to and responses from Django application. I will try to write another blog in detail on how to set up a django application with Nginx and Gunicorn.


Please make sure you have below packages installed in your system and a basic understanding of Python, Django and Gunicorn are recommended.

  • Python > 3.5
  • Gunicorn > 15.0
  • Django > 1.11

Configure Django App Using Gunicorn

There are different ways to configure the Gunicron, I am going to demonstrate more on running the Django app using the gunicorn configuration file.

First, let us start by creating the Django project, you can do so as follows.

django-admin startproject webapp

After starting the Django project, the directory structure looks like this.

The simplest way to run your django app using gunicorn is by using the following command, you must run this command from your manage.py folder.

gunicorn webapp.wsgi

This will run your Django project on 8000 port locally.


Now let’s see, how to configure the django app using gunicorn configuration file. A simple Gunicorn configuration with worker class `sync` will look like this.

import sys

BASE_DIR = "/path/to/base/dir/"

bind = ''
backlog = 2048

import multiprocessing
workers = multiprocessing.cpu_count() * 2 + 1
worker_class = 'sync'
worker_connections = 1000
timeout = 300
keepalive = 2

#   spew - Install a trace function that spews every line of Python
#       that is executed when running the server. This is the
#       nuclear option.
#       True or False

spew = False

#errorlog = '-'

accesslog = '/var/log/webapp_access_log'
loglevel = 'debug'
errorlog = '/var/log/webapp_error_log'

def post_fork(server, worker):
    server.log.info("Worker spawned (pid: %s)", worker.pid)

def pre_fork(server, worker):

def pre_exec(server):
    server.log.info("Forked child, re-executing.")

def when_ready(server):
    server.log.info("Server is ready. Spawning workers")

def worker_int(worker):
    worker.log.info("worker received INT or QUIT signal")

    ## get traceback info
    import threading, sys, traceback
    id2name = dict([(th.ident, th.name) for th in threading.enumerate()])
    code = []
    for threadId, stack in sys._current_frames().items():
        code.append("\n# Thread: %s(%d)" % (id2name.get(threadId,""),
        for filename, lineno, name, line in traceback.extract_stack(stack):
            code.append('File: "%s", line %d, in %s' % (filename,
                lineno, name))
            if line:
                code.append("  %s" % (line.strip()))

def worker_abort(worker):
    worker.log.info("worker received SIGABRT signal")

Let us see a few important details in the above configuration file.

  1. Append the base directory path in your systems path.
  2. You can bind the application to a socket using bind.
  3. `backlog` Maximum number of pending connections.
  4. `workers` number of workers to handle requests. This is based on your machine’s CPU count. This can be varied based on your application workload.
  5. `worker_class`, there are different types of classes, you can refer here for different types of classes. `sync` is the default and should handle normal types of loads.

You can refer more about the available Gunicorn settings here.

Running Django with gunicorn as a daemon PROCESS

Here is the sample systemd file,

Description=webapp daemon

ExecStart=/usr/local/bin/gunicorn --config /path/to/gunicorn/config.py --pid /var/run/webapp.pid webapp.wsgi:application
ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/bin/kill -s TERM $MAINPID


After adding the file to the location /etc/systemd/system/. To reload new changes in file execute the following command.

systemctl daemon-reload


Start, Stop and Status of Application using systemctl

Now you can simply execute the following commands for your application.

To start your application

systemctl start webapp

To stop your application.

systemctl stop webapp

To check the status of your application.

systemctl status webapp

Please refer to a short complete video tutorial to configure the Django app below.

Configure Celery with SQS and Django on Elastic Beanstalk


Has your users complained about the loading issue on the web app you developed. That might be because of some long I/O bound call or a time consuming process. For example, when a customer signs up to website and we need to send confirmation email which in normal case the email will be sent and then reply 200 OK response is sent on signup POST. However we can send email later, after sending 200 OK response, right?. This is not so straight forward when you are working with  a framework like Django, which is tightly binded to MVC paradigm.

So, how do we do it ? The very first thought in mind would be python threading module. Well, Python threads are implemented as pthreads (kernel threads), and because of the global interpreter lock (GIL), a Python process only runs one thread at a time. And again threads are hard to manage, maintain code and scale it.


Audience for this blog requires to have knowledge about Django and AWS elastic beanstalk.


Celery is here to rescue. It can help when you have a time consuming task (heavy compute or I/O bound tasks) between request-response cycle. Celery is an open source asynchronous task queue or job queue which is based on distributed message passing. In this post I will walk you through the celery setup procedure with django and SQS on elastic beanstalk.

Why Celery ?   

Celery is very easy to integrate with existing code base. Just write a decorator above the definition of a function declaring a celery task and call that function with a .delay method of that function.

from celery import Celery

app = Celery('hello', broker='amqp://guest@localhost//')

def hello():
    return 'hello world'
# Calling a celery task


To work with celery, we need a message broker. As of writing this blog, Celery supports RabbitMQ, Redis, and Amazon SQS (not fully) as message broker solutions. Unless you don’t want to stick to AWS ecosystem (as in my case), I recommend to go with RabbitMQ or Redis because SQS does not yet support remote control commands and events. For more info check here. One of the reason to use SQS is its pricing. One million SQS free request per month for every user.

Proceeding with SQS, go to AWS SQS dashboard and create a new SQS queues. Click on create new queue button.

Depending upon the requirement we can select any type of the queue. We will name queue as dev-celery.


Celery has a very nice documentation. Installation and configuration is described here. For convenience here are the steps

Activate your virtual environment, if you have configured one and install cerely.

pip install celery[sqs]


Celery has built-in support of django. It will pick its setting parameter from django’s settings.py which are prepended by CELERY_ (‘CELERY’ word needs to be defined while initializing celery app as namespace). So put below setting parameter in settings.py

# Amazon credentials will be taken from environment variable.

AWS login credentials should be present in the environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY

CELERY_BROKER_TRANSPORT_OPTIONS = {'region': 'us-west-2',
                                   'visibility_timeout': 3600,
                                   'polling_interval': 10,
                                   'queue_name_prefix': '%s-' % {True: 'dev',
                                                                 False: 'production'}[DEBUG],
                                   'CELERYD_PREFETCH_MULTIPLIER': 0,

Now let’s configure celery app within django code. Create a celery.py file besides django’s settings.py.

from __future__ import absolute_import, unicode_literals
import os
from celery import Celery

# set the default Django settings module for the 'celery' program.
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'proj.settings')

app = Celery('proj')

# Using a string here means the worker doesn't have to serialize
# the configuration object to child processes.
# - namespace='CELERY' means all celery-related configuration keys
#   should have a `CELERY_` prefix.
app.config_from_object('django.conf:settings', namespace='CELERY')

# Load task modules from all registered Django app configs.

def debug_task(self):
   print('Request: {0!r}'.format(self.request))

Now put below code in projects __init__.py

from __future__ import absolute_import, unicode_literals

# This will make sure the app is always imported when
# Django starts so that shared_task will use this app.
from .celery import app as celery_app

__all__ = ('celery_app',)


Now let’s test the configuration. Open terminal start celery

Terminal 1

$ celery worker --app=proj --loglevel=INFO
-------------- celery@lintel v4.1.0 (latentcall)
---- **** -----
--- * ***  * -- Linux-4.15.0-24-generic-x86_64-with-Ubuntu-18.04-bionic 2018-07-04 11:18:57
-- * - **** ---
- ** ---------- [config]
- ** ---------- .> app:         enq_web:0x7f0ba29fa3d0
- ** ---------- .> transport:   sqs://localhost//
- ** ---------- .> results:     disabled://
- *** --- * --- .> concurrency: 4 (prefork)
-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
--- ***** -----
-------------- [queues]
               .> celery           exchange=celery(direct) key=celery
 . enq_web._celery.debug_task


All the task which are registered to use celery using celery decorators appear here while starting celery. If you find that your task does not appear here then make sure that the module containing the task is imported on startup.

Now open django shell in another terminal

Terminal 2

$ python manage.py shell

In [1]: from proj import celery
In [2]: celery.debug_task() # ←← ← Not through celery 
In [3]: celery.debug_task.delay() # ←← ← This is through celery

After executing the task function with delay method, that task should run in the worker process which is listening to events in other terminal. Here celery sent a message to SQS with details of the task and worker process which was listening to SQS, received it and task was executed in worker process. Below is what you should see in terminal 1

Terminal 1

Request: <Context: {'origin': 'gen14099@lintel', u'args': [], 'chain': None, 'root_id': '041be6c3-419d-4aa0-822f-d50da1b340a0', 'expires': None, u'is_eager': False, u'correlation_id': '041be6c3-419d-4aa0-822f-d50da1b340a0', 'chord': None, u'reply_to': 'd2e76b9b-094b-33b4-a873-db5d2ace8881', 'id': '041be6c3-419d-4aa0-822f-d50da1b340a0', 'kwargsrepr': '{}', 'lang': 'py', 'retries': 0, 'task': 'proj.celery.debug_task', 'group': None, 'timelimit': [None, None], u'delivery_info': {u'priority': 0, u'redelivered': None, u'routing_key': 'celery', u'exchange': u''}, u'hostname': u'celery@lintel', 'called_directly': False, 'parent_id': None, 'argsrepr': '()', 'errbacks': None, 'callbacks': None, u'kwargs': {}, 'eta': None, '_protected': 1}>

Deploy celery worker process on AWS elastic beanstalk

Celery provides “multi” sub command to run process in daemon mode, but this cannot be used on production. Celery recommends various daemonization tools http://docs.celeryproject.org/en/latest/userguide/daemonizing.html

AWS elastic beanstalk already use supervisord for managing web server process. Celery can also be configured using supervisord tool. Celery’s official documentation has a nice example of supervisord config for celery. https://github.com/celery/celery/tree/master/extra/supervisord. Based on that we write quite a few commands under .ebextensions directory.

Create two files under .ebextensions directory. Celery.sh file extract the environment variable and forms celery configuration, which copied to /opt/python/etc/celery.conf file and supervisord is restarted. Here main celery command:

celery worker -A PROJECT_NAME -P solo --loglevel=INFO -n worker.%%h.

At the time if writing this blog celery had https://github.com/celery/celery/issues/3759 issue. As a work around to this issue we add “-P solo”. This will run task sequentially for a single worker process.

#!/usr/bin/env bash

# Get django environment variables
celeryenv=`cat /opt/python/current/env | tr '\n' ',' | sed 's/export //g' | sed 's/$PATH/%(ENV_PATH)s/g' | sed 's/$PYTHONPATH//g' | sed 's/$LD_LIBRARY_PATH//g'`

# Create celery configuraiton script
; Set full path to celery program if using virtualenv
command=/opt/python/run/venv/bin/celery worker -A PROJECT_NAME -P solo --loglevel=INFO -n worker.%%h


; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 600

; When resorting to send SIGKILL to the program to terminate it
; send SIGKILL to its whole process group instead,
; taking care of its children as well.

; if rabbitmq is supervised, set its priority higher
; so it starts first


# Create the celery supervisord conf script
echo "$celeryconf" | tee /opt/python/etc/celery.conf

# Add configuration script to supervisord conf (if not there already)
if ! grep -Fxq "[include]" /opt/python/etc/supervisord.conf
  echo "[include]" | tee -a /opt/python/etc/supervisord.conf
  echo "files: celery.conf" | tee -a /opt/python/etc/supervisord.conf

# Reread the supervisord config
/usr/local/bin/supervisorctl -c /opt/python/etc/supervisord.conf reread

# Update supervisord in cache without restarting all services
/usr/local/bin/supervisorctl -c /opt/python/etc/supervisord.conf update

# Start/Restart celeryd through supervisord
/usr/local/bin/supervisorctl -c /opt/python/etc/supervisord.conf restart celeryd-worker

Now create elastic beanstalk configuration file as below. Make sure you have pycurl and celery in requirements.txt. To install pycurl libcurl-devel needs to be installed from yum package manager.

    libcurl-devel: []

        command: "mkdir -p /var/log/celery/ /var/run/celery/"
        command: "cp .ebextensions/celery-worker.sh /opt/elasticbeanstalk/hooks/appdeploy/post/ && chmod 744 /opt/elasticbeanstalk/hooks/appdeploy/post/celery-worker.sh"
        cwd: "/opt/python/ondeck/app"
        command: "/opt/elasticbeanstalk/hooks/appdeploy/post/celery-worker.sh"

Add these files to git and deploy to elastic beanstalk.

Below is the figure describing the architecture with django, celery and elastic beanstalk.

What is DBF file? How to read it in linux and python?

What is DBF files ?

A DBF file is a standard database file used by dBASE, a database management system application. It organises data into multiple records with fields stored in an array data type. DBF files are also compatible with other “xBase” database programs, which became an important feature because of the file format’s popularity.

Tools which can read or open DBF files

Below are list of program which can read and open dbf file.

  • Windows
    1. dBase
    2. Microsoft Access
    3. Microsoft Excel
    4. Visual Foxpro
    5. Apache OpenOffice
    6. dbfview
    7. dbf Viewer Plus
  • Linux
    1. Apache OpenOffice
    2. GTK DBF Editor

How to read file in linux ?

“dbview” command available in linux, which can read dbf files.

Below code snippet show how to use dbview command.

[lalit : temp]₹ dbview test.dbf 
Name       : John
Surname    : Miller
Initials   : JM
Birthdate  : 19800102

Name       : Andy
Surname    : Larkin
Initials   : AL
Birthdate  : 19810203

Name       : Bill
Surname    : Clinth
Initials   : 
Birthdate  : 19820304

Name       : Bobb
Surname    : McNail
Initials   : 
Birthdate  : 19830405

[lalit : temp]₹ 

 How to read it using python ?

dbfread” is the library available in python to read dbf files. This library reads DBF files and returns the data as native Python data types for further processing.

dbfread requires python 3.2 or 2.7.  dbfread is a pure python module, so doesn’t depend on any packages outside the standard library.

You can install library by the command below.

pip install dbfread

The below code snippet can read dbf file and retrieve data as python dictionary.

>>> from dbfread import DBF

>>> for record in DBF('people.dbf'):
...     print(record)

<strong>Out Put</strong>
OrderedDict([('NAME', 'Alice'), ('BIRTHDATE', datetime.date(1987, 3, 1))])
OrderedDict([('NAME', 'Bob'), ('BIRTHDATE', datetime.date(1980, 11, 12))])

You can also use the with statement:

with DBF('people.dbf') as table:

By default the records are streamed directly from the file.  If you have enough memory you can load them into a list instead. This allows random access

>>> table = DBF('people.dbf', load=True)
>>> print(table.records[1]['NAME'])
>>> print(table.records[0]['NAME'])

 How to Write content in DBF file using python ?

dbfpy is a python-only module for reading and writing DBF-files.  dbfpy can read and write simple DBF-files.

You can install it by using below command

pip install dbfpy

The below example shows how to create dbf files and write records in to it.

import datetime
from mx import DateTime
from dbfpy import dbf

## create empty DBF, set fields

db = dbf.Dbf("test.dbf", new=True)
    ("NAME", "C", 15),
    ("SURNAME", "C", 25),
    ("INITIALS", "C", 10),
    ("BIRTHDATE", "D"),

## fill DBF with some records

for name, surname, initials, birthdate in (
    ("John", "Miller", "JM", (1980, 1, 2)),
    ("Andy", "Larkin", "AL", datetime.date(1981, 2, 3)),
    ("Bill", "Clinth", "", DateTime.Date(1982, 3, 4)),
    ("Bobb", "McNail", "", "19830405"),
    rec = db.newRecord()
    rec["NAME"] = name
    rec["SURNAME"] = surname
    rec["INITIALS"] = initials
    rec["BIRTHDATE"] = birthdate

Also you can update a dbf file record using dbf module.

The below example shows how to update a record in a .dbf file.

db = dbf.Dbf("test.dbf")
rec = db[2]
rec["INITIALS"] = "BC"


What is milter?

Every one gets tons of email these days. This includes emails about super duper offers from amazon to princess and wealthy businessmen trying to offer their money to you from some African country that you have never heard of. In all these emails in your inbox there lies one or two valuable emails either from your friends, bank alerts, work related stuff. Spam is a problem that email service providers are battling for ages. There are a few opensource spam fighting tools available like SpamAssasin or SpamBayes.

What is milter ?

Simply put – milter is mail filtering technology. Its designed by sendmail project. Now available in other MTAs also. People historically used all kinds of solutions for filtering mails on servers using procmail or MTA specific methods. The current scene seems to be moving forward to sieve. But there is a huge difference between milter and sieve. Sieve comes in to picture when mail is already accepted by MTA and had been handed over to MDA. On the other hand milter springs into action in the mail receiving part of MTA. When a new connection is made by remote server to your MTA, your MTA will give you an opportunity to accept of reject the mail every step of the way from new connection, reception of each header, and reception of body.

milter protocol various stages

The above picture depicts simplified version of milter protocol working. Full details of milter protocol can be found here https://github.com/avar/sendmail-pmilter/blob/master/doc/milter-protocol.txt  . Not only filtering; using milter, you can also modify message or change headers.


If you want to get started in C you can use libmilter.  For Python you have couple of options:

  1. pymilter –  https://pythonhosted.org/milter/
  2. txmilter – https://github.com/flaviogrossi/txmilter

Postfix supports milter protocol. You can find every thing related to postfix’s milter support in here – http://www.postfix.org/MILTER_README.html


I found sieve to be rather limited. It doesn’t offer too many options to implement complex logic. It was purposefully made like that. Also sieve starts at the end of mail reception process after mail is already accepted by MTA.

Coding milter program in your favorite programming language gives you full power and allows you to implement complex , creative stuff.


When writing milter programs take proper care to return a reply to MTA quickly. Don’t do long running tasks in milter program when the MTA is waiting for reply. This will have crazy side effects like remote parties submitting same mail multiple time filling up your inbox.