The perceptron - a simple simulation

Table of contents
In [1]:
%matplotlib inline
from pylab import *
from utils import *

Let us start by implementing a very simple network. Our network will only have two input units plus a bias unit, as in the figure. The network will learn to categorize few different input patterns as belonging or not to a class.

Training

Initializing data and parameters

In this example we will create input data with pseudo-random number generation. We will start from two points, the centroids, and create all patterns in each group (belonging/not belonging) adding noise to each centroid.

In [2]:
#-------------------------------------------------
# Training


# Constants

# Number of input elements
n = 2        

# Learning rate
eta = 0.0001 

# number of training patterns
n_patterns = 2000

# Number of repetitions of 
# the pattern series
epochs = 4

# Number of timesteps
stime = n_patterns*epochs


# Variables

# generate training data (function build_dataset in utils.py)
data = build_dataset(n_patterns)

# Each row of P is an input pattern
P = data[:,:2]

# Each element of o is the desired output 
# relative to an input pattern
o = data[:,2]

# Initialize weights
w = zeros(n+1)

# Initialize the weight history storage
dw = zeros([n+1,stime])

# Initialize the error history storage
squared_errors = zeros(epochs)

Let's plot the input points. Red points belong to the class to be learned, while blue ones do not belong to it.

In [3]:
# limits
upper_bound = P.max(0) + 0.2*(P.max(0)-P.min(0))
lower_bound = P.min(0) - 0.2*(P.max(0)-P.min(0))


# Create the figure
fig = figure(figsize=(4,4))

scatter(*P[(n_patterns/2):,:].T, s = 50,  c = '#ff8888' )
scatter(*P[:(n_patterns/2),:].T, s = 50,  c = '#8888ff' )

xlim( [lower_bound[0], upper_bound[0]] )
ylim( [lower_bound[1], upper_bound[1]] )

show()

Spreading of the network during training

Here starts the core part, iterating the timesteps. We also divide the training phase in epochs. Each epoch is a single presentation of the whole input pattern series. The sum of squared errors will be grouped by epochs.

In [4]:
# Create a list of pattern indices.
# We will reshuffle it at each 
# repetition of the series
pattern_indices = arange(n_patterns)

# counter of repetitions 
# of the series of patterns
epoch = -1

for t in xrange(stime) :
    
    # Reiterate the input pattern 
    # sequence through timesteps
    
    # Reshuffle at the end 
    # of the series
    if t%n_patterns == 0:
        shuffle(pattern_indices)
        epoch += 1
        
    # Current pattern 
    k = pattern_indices[t%n_patterns]
    
    # MAIN STEP CALCULATIONS
     
    # Bias-plus-input vector
    x = hstack([1, P[k]])
    
    # Weighted sum - !!dot product!!
    net = dot(w, x)
    
    # Activation
    y = step(net)
    
    # Learning
    w += eta*(o[k] - y)*x
       
    # Store current weights
    dw[:,t] = w
    
    # Current error
    squared_errors[epoch] += 0.5*(o[k] - y)**2

Plotting the results of training

We plot the final decision boundary together with the history of the squared errors through epocs. As you see below, the network finds a line that completely divides points belonging to the class from those not belonging to it. The gray lines are previous attempts to define the boundary which led to an error greater than zero. The curve of the error goes to zero at the third epoch.

In [5]:
# Create the figure
fig = figure(figsize=(10,4))
ax = fig.add_subplot(121)
ax.set_title('Decision boundary')

# Chose the x-axis coords of the
# two points to plot the decision 
# boundary line
x1 = array([lower_bound[0],upper_bound[0]])

# Calculate the y-axis coords of the
# two points to plot the decision 
# boundary line as it changes 
for t in xrange(stime) :
    
    # Show evert 10th timestep
    if t%10 == 0:
        
        if dw[2,t] != 0 :
            # Evaluate x2 based on current weights
            x2 = -(dw[1,t]*x1 + dw[0,t])/dw[2,t]
        
            # Plot the changes in the boundary line during learning
            ax.plot(x1,x2, c='#cccccc', linewidth = 1, zorder = 1)

# Evaluate x2 ibased on final weights
x2 = -(w[1]*x1 + w[0])/w[2]

# Plot the learned boundary line
plot(x1,x2, c= '#000000', linewidth = 2, zorder = 1)

# Plot in red points belonging to the class
scatter(*P[(n_patterns/2):,:].T, s = 50,  c = '#ff8888', zorder = 2 )       
# Plot in blue points not belonging to the class 
scatter(*P[:(n_patterns/2),:].T, s = 50,  c = '#8888ff', zorder = 2 )

# Limits and labels of the plot
xlim( [lower_bound[0], upper_bound[0]] )
ylim( [lower_bound[1], upper_bound[1]] )
xlabel("$p_1$", size = 'xx-large')
ylabel("$p_2$", size = 'xx-large')

# Plot squared errors
ax = fig.add_subplot(122)
ax.set_title('Error')
ax.plot(squared_errors)

# Labels and ticks of the plot
xlabel("epochs", size = 'xx-large')
ylabel("SSE", size = 'xx-large')
xticks(range(epochs)) 

show()

Testing

Initializing data and parameters

We now create a new dataset to test the network by generating a cloud of random points allover the space of inputs:

In [6]:
#-------------------------------------------------
# Test

# Number of test patterns
n_patterns = 50000

# Generating test data - we use a single repeated centroid
# so we have a single population of points expanding across
# the decision boundary line  
test_centroid = lower_bound +(upper_bound-lower_bound)/2.0

# Generating test data - build_dataset function from utils.py. 
# We change the standard deviation 
data = build_dataset(n_patterns, 
                      centroids1 = [ test_centroid ], # we use the same centroids for class 
                      centroids2 = [ test_centroid ], # and non-class
                      std_deviation = 2.6 ) # we want the test samples  
                                            # to be sparsely distributed

# Each row of P is a test pattern
P = data[:,:2]

# Create the figure
fig = figure(figsize=(5,4))

title('Tests - average error = {}'.format(mean(squared_errors).round(4)))

# Show points
ax = scatter(*P.T, s = 2,  edgecolors='none', zorder = 2  )

xlim( [lower_bound[0], upper_bound[0]] )
ylim( [lower_bound[1], upper_bound[1]] )
show()

Classifying the input patterns

We read each pattern and collect the answer of the network

In [7]:
y = zeros(n_patterns)

# iterate tests
for t in xrange(n_patterns) :
    
    # Bias-plus-input vector
    x = hstack([1, P[t]])
    
    # Weighted sum - !!dot product!!
    net = dot(w, x)
    
    # Activation
    y[t] = step(net)

Plotting the results of test

We can plot all test patterns using the output of the network to color them. Red and blue dots correspond to patterns belonging or not belonging to the class. You can see that the network divided all inputs into two groups with a linear separation. This division is a generalization of that given with the training samples.

In [8]:
# Create the figure
fig = figure(figsize=(5,4))


title('Tests - average error = {}'.format(mean(squared_errors).round(4)))


# Show points
ax = scatter(*P.T, s = 2,  c = y, edgecolors='none', zorder = 2, cmap = cm.coolwarm )

#limits
xlim( [lower_bound[0], upper_bound[0]] )
ylim( [lower_bound[1], upper_bound[1]] )
xlabel("$p_1$", size = 'xx-large')
ylabel("$p_2$", size = 'xx-large')
show()











































The next cell is just for styling

In [9]:
from IPython.core.display import HTML
def css_styling():
    styles = open("../style/ipybn.css", "r").read()
    return HTML(styles)
css_styling()
Out[9]: