The Perceptron and Friends

Lesson 2

Introduction

We've seen in our previous lesson that we can simulate a neural cell at the very low level of its electro-chemical processes. Indeed, this is quite helpful when trying to understand the functioning of individual cells and their transmission of the neural impulse (i.e. the progression of the action potential).

Our interests for this course lie mainly in the information processing actions of several (sometimes many many more) interconnected neurons. Computationally we can just ignore the process that causes the neuron to fire and, instead, concentrate on the net result of that firing. Should you protest our simplification, say, along the lines of  “Well, couldn't the low level effects of the electro-chemical process have some slight effect on our result?” I present you with the following observation— Simulation and modelling must take place at some scale of analysis. Failure to fix our scale or scope predestines us to fail miserably. Should we model the electro-chemical processes? We're close to modelling some of the molecular processes when we do this, but what about the atomic processes themselves? What of the sub-atomic processes? Will our results hold up in different physical universes, those where some of our closely held rules don't apply?

I  hope you appreciate that we're mainly interested in the net result of the electro-chemical and synaptic processes– that is, the firing of neurons and their connections to their neighbors, and how those firings can communicate information.

The Perceptron

In the late 50's - early 60's Frank Rosenblat proposed several machines that laid the groundwork for the advent of artificial neural networks. These machines were called Perceptrons. These machines were based on the 1940's work of Warren McCulloch and Walter Pitts on the logical calculus of neural mechanisms. [1,2,3]

Rosenblat's machines were collections of units like those shown below. For this reason, the name perceptron has been used to denote not only the machines but (more commonly) the processing units themselves. Variations of these units go by the name Adaline, Sigma-node, and others.

[Graphics:HTMLFiles/index_1.gif]

Figure 1. A basic perceptron. Inputs, ψ_ (1…n), are multiplied by their respective weights, α_ (1…n). These weighted inputs are summed and, if the resulting value exceeds some threshold, θ, the output Ψ is a spike, usually represented by a ‘1’. Otherwise, there is no output, a ‘0’.

The perceptron is an analog to the biological neurons we described in the previous section.  In essence, the perceptron is an artificial neuron that processes information from several inputs and, using some decision rule, and either activates its output or not.The inputs are analogous to the dendrites and the output represents the axon in a biological neuron. A collection of these assemblies will connect the outputs (axons) of some cells to the inputs of others, creating a virtual, artificial assembly.

In short, the perceptron collects several inputs, weights them, sums them, and then makes a decision to ‘fire’ or not based on this weighted sum.

Typically the inputs are binary – that is,they either are 1 or 0, representing the presence or absence of activity respectively. In fact, biological neurons behave largely in this manner. There is no differentiation between ‘strengths’ of firing, only the presence or absence thereof. Since the output of a perceptron is binary it follows that the resulting input to other cells will also be binary. The weights act to condition these inputs, emphasizing some and de-emphasizing others before they are summed. After this summation a decision is made to fire or not. The most basic type of decision is a so-called threshold decision. We will examine various sorts of decision functions in the upcoming sections.

Notation

There are a many ways to notate the various inputs, weights, and other parameters, Here, we use the Greek lowercase psi, ψ, to denote the input n-tuple. The subscript selects one of the n elements from the list, so ψ_3is the 3rd input. (NB- Some folks refer to a list of this sort as a vector. As a point of fact, this a semantic matter of interpretation, most notably with respect to magnitude. Usually the nature of the input of a neuron/perceptron is not interpreted to have a magnitude. Still, the term ‘input vector’ is frequently used and should be understood to be simply the list of inputs to the neuron. Some argue that the general form of these sorts of data structures follow more closely the idea of tensors. You are left to decide for yourself what makes the most sense and I welcome the feedback of any philosophical debates on the subject.)

Similarly, the weights are represented by the Greek lowercase letter alpha, α. Subscripts represent individual elements of the list, α_3being the third weight.

Summation is, as is typical, represented by the Greek uppercase sigma, Σ.

Finally, the input to the threshold function shown above is represented by the lowercase Greek letter theta, θ.

The resulting state of the cell, on or off, is a binary value represented by the capital psi, Ψ.

Sometimes, for convenience, the input and output are represented by lowercase Roman letters 'i' and 'o' respectively with weights represented by 'w'.

Another common notation uses r_into represent the input, w to represent the weights, and r_out to represent the output state of the cell.

Traditional mathematical notation for the above process is

Ψ = θ(∑ψ α)

Where θ represents the Heaviside unit step function. In alternate implementations θ is an adjustable threshold value. We will investigate several types of threshold functions (also called activation functions) below.

First, we'll decompose the process for those of you not totally familiar with this sort of calculation.

Simple Summation

To quote the philosopher Mr. T, “enough jibberjabber”, let's get on with some computational examples. Let's start by examining the process of summation.

[Graphics:HTMLFiles/index_11.gif]

Figure 2. Examining the inputs and summation.

In Mathematica a list or vector or n-tuple is represented as a comma separated list, surrounded by curly-brackets (also known as braces or curly braces).

ψ = {1, 0, 0, 1, 1}

{1, 0, 0, 1, 1}

Above we see a 5-tuple representing an input. The first and last two inputs are firing neurons, while the second and third are inactive.

We can sum the values of the list using this notation-

Plus @@ ψ

3

Here, Plus is said to be applied to the list (the two '@' signs denote application, shorthand for the Mathematica function.) and we see the result, 3. So, in some overly simplistic, majority rules environment, we might say that, for a 5-input perceptron if 3 or more inputs are firing then we too shall fire.

This may be adequate but it doesn't allow any sort of fine-tuning of the inputs, it doesn't allow us to say that some inputs are more important than others, for example –

ψ = {1, 0, 1, 1, 0} ;

(The semicolon at the end of the line suppresses output from Mathematica. ψ is set, we just don't see the results of setting it.)

Plus @@ ψ

3

In this case there are still three inputs firing but we can't differentiate the resulting sum from the previous example.

Explore

   1.   Try other combinations and different sizes of input vectors.

   2.    Can you figure out other ways to represent this process in Mathematica?

Example answers

   1.   Try other combinations and different sizes of input vectors.

ψ = {1, 1, 1, 1, 0} ; Plus @@ ψ

4

ψ = {1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1} ; Plus @@ ψ

8

   2.    Can you figure out other ways to represent this process in Mathematica?

Plus @@ {1, 1, 1, 0, 1, 1}

5

Apply[Plus, {1, 0, 1, 0, 1, 1}]

4

ψ = {1, 0, 1, 0, 1, 1} ; Apply[Plus, ψ]

4

webMathematica

These exercises are in file L2E1.js (link!)

Weighted Summation

[Graphics:HTMLFiles/index_29.gif]

Figure 3. Weighting the inputs and summing them

Here we see our input ψ along with a set of weights α —

ψ = {1, 0, 0, 1, 1} ; FormBox[RowBox[{RowBox[{α, =, RowBox[{{, RowBox[{-.2, ,, 1.2, ,, 3.3, ,, RowBox[{-, 4.2}], ,, 1.3}], }}]}], ;}], TraditionalForm]

Multiplying the input vector by the weight vector

t = ψ α

FormBox[RowBox[{{, RowBox[{RowBox[{-, 0.2}], ,, 0, ,, 0, ,, RowBox[{-, 4.2}], ,, 1.3}], }}], TraditionalForm]

results in a new set of values. This is sometimes called an activation vector. We sum the activation just like before —

Plus @@ t

FormBox[RowBox[{-, 3.1}], TraditionalForm]

This final value (sometimes notated h) is called the activation value or net activation of the perceptron.

In this case, different combinations of inputs will result in different output-

ψ = {1, 0, 1, 1, 0} ; Plus @@ (ψ α)

FormBox[RowBox[{-, 1.1}], TraditionalForm]

ψ = {1, 1, 1, 1, 0} ; Plus @@ (ψ α)

FormBox[0.1, TraditionalForm]

So, you see here that the result of the summation isn't always binary (0,1) but rather takes on some real value as a result of the summation.

Explore

   1.   Try other combinations and different sizes of input vectors.

   2.  Try different weight vectors.

   3.   Can you figure out other ways to represent this process in Mathematica?

Example answers

   1.   Try other combinations of input vectors.

FormBox[RowBox[{ψ = {1, 1, 1, 1, 0}, ;, , RowBox[{α, =, RowBox[{{, RowBox[{- ... 4.2}], ,, 1.3}], }}]}], ;, , t = ψ α, ;, , Plus @@ t}], TraditionalForm]

FormBox[0.1, TraditionalForm]

   2.   Try different weight vectors.

FormBox[RowBox[{ψ = {1, 1, 1, 1, 0}, ;, , RowBox[{α, =, RowBox[{{, RowBox[{- ... 4.2}], ,, 1.3}], }}]}], ;, , t = ψ α, ;, , Plus @@ t}], TraditionalForm]

FormBox[0.1, TraditionalForm]

FormBox[RowBox[{RowBox[{α, =, RowBox[{{, RowBox[{RowBox[{-, 1.2}], ,, RowBox[{-, 1.2}], , ... .42}], ,, 1.3}], }}]}], ;, , t = ψ α, ;, , Plus @@ t}], TraditionalForm]

FormBox[RowBox[{-, 2.52}], TraditionalForm]

   3.    Can you figure out other ways to represent this process in Mathematica?

Plus @@ ({1, 1, 1, 0, 1} α)

FormBox[RowBox[{-, 0.8}], TraditionalForm]

FormBox[RowBox[{Apply, [, RowBox[{Plus, ,, RowBox[{{1, 0, 1, 0, 1, 1}, *, RowBox[{{, RowBox[{.12, ,, .12, ,, -.2, ,, -.4, ,, 1.1, ,, RowBox[{-, 0.02}]}], }}]}]}], ]}], TraditionalForm]

FormBox[1., TraditionalForm]

FormBox[RowBox[{ψ = {1, 1, 1, 0, 1, 1}, ;, , RowBox[{α, =, RowBox[{{, RowBox ... ,, 1.1, ,, RowBox[{-, 0.02}]}], }}]}], ;, , Apply[Plus, ψ α]}], TraditionalForm]

FormBox[1.12, TraditionalForm]

If students do this directly, sing!

ψ . α

FormBox[1.12, TraditionalForm]

webMathematica

These exercises are in file L2E2.js (link!)

Weighted Summation with Threshold

[Graphics:HTMLFiles/index_54.gif]

Figure 4. Adding the threshold function.

In a neural network, both physical and artificial, the outputs of neurons (axons) serve as inputs for other neurons (synapses). You may recall that the decided for this experiment to constrain ourselves to an all-or-nothing perceptron— i.e. either the artificial neuron fires or it does not. We represented these two states numerically by 1 and 0 respectively. We have a net activation value of FormBox[RowBox[{h, =, RowBox[{-, 3.1}]}], TraditionalForm] in this example but we need to convert it to one of the two states above, 0 or 1.

Mathematica's function returns 0 if its input is less than 0 and 1 otherwise. Therefore, we can use it as a simple decision function: If the sum of the weighted inputs is greater than 0 than we ‘fire’ (i.e. we pass a value of 1 down to other perceptron inputs), otherwise, we do not. As we stated above, the UnitStep function is an implementation of the Heaviside unit step function.

Plot[UnitStep[x], {x, -2, 2}] ;

[Graphics:HTMLFiles/index_57.gif]

Here is the math we're already familiar with —

FormBox[RowBox[{ψ = {1, 1, 1, 0, 0}, ;, <br />, FormBox[RowBox[{RowBox[{α, =, RowBox ... 2, ,, 1.2, ,, 3.3, ,, RowBox[{-, 4.2}], ,, 1.3}], }}]}], ;}], TraditionalForm]}], TraditionalForm]

the weighted inputs —

t = ψ α

FormBox[RowBox[{{, RowBox[{RowBox[{-, 0.2}], ,, 1.2, ,, 3.3, ,, 0, ,, 0}], }}], TraditionalForm]

summed —

h = Plus @@ t

FormBox[4.3, TraditionalForm]

and finally passed through the UnitStep function —

Ψ = UnitStep[h]

1

Thus, for the given input ψ and weights α our perceptron fires!

Notationally, as seen in Equation 1, we represent the unit step function with θ(x). Mechanically, this is what the function does —

θ(x) = {1 : x>0               0 : x≤0

Other neural network notate the step function as a special type of output function (we'll look at different output functions below). In this case the ‘generic’ output function is notated g(x) and the specific case of the step function g_step(x).

Explore

   1.   Try other combinations and different sizes of input vectors, weight vectors.

   2.   Modify the example to have an adjustable threshold, θ.

   3.   Pick a set of random input values and weights. Hand tweak the inputs to get the neuron to fire or not. Then, hand tweak the weights to achieve the same results.

   4.   Can you figure out other ways to represent this process in Mathematica?

Example answers

   1.   Try other combinations and different sizes of input vectors, weight vectors.

ψ = {1, 1, 1, 1, 0} ; FormBox[RowBox[{RowBox[{α, =, RowBox[{{, RowBox[{1.2, ,, RowBo ... , ,, 1.3}], }}]}], ;}], TraditionalForm] t = ψ α ; h = Plus @@ t ; Ψ = UnitStep[h]

1

   2.  Modify the example to have an adjustable threshold, θ.

Here, the threshold is

FormBox[RowBox[{ψ = {1, 0, 1, 1, 0}, ;, , RowBox[{α, =, RowBox[{{, RowBox[{1 ... , RowBox[{θ, =, 1.1}], ;, , Ψ = UnitStep[h - θ]}], TraditionalForm]

1

   3.   Pick a set of random input values and weights. Hand tweak the inputs to get the neuron to fire or not. Then, hand tweak the weights to achieve the same results.

Generate a random table of inputs

ψ = Table[Random[Integer, {0, 1}], {5}]

{0, 1, 0, 0, 1}

and weights

α = Table[Random[Real, {-4, 4}], {5}]

FormBox[RowBox[{{, RowBox[{RowBox[{-, 3.97306}], ,, RowBox[{-, 3.78723}], ,, 0.389626, ,, RowBox[{-, 3.5145}], ,, RowBox[{-, 0.410137}]}], }}], TraditionalForm]

and carry out the process-

t = ψ α ; h = Plus @@ t ; Ψ = UnitStep[h]

0

It didn't fire. Tweak the input vector (turn off the two bad values and turn on a positive weighted one)

ψ = {0, 0, 1, 0, 0}

{0, 0, 1, 0, 0}

carry out the process —

t = ψ α ; h = Plus @@ t ; Ψ = UnitStep[h]

1

It fires!

Reset ψ to its original value, modify one member of α and carry out the process —

FormBox[RowBox[{RowBox[{ψ = {0, 1, 0, 0, 1}, ;, , FormBox[RowBox[{RowBox[{α, ... }], TraditionalForm] t = ψ α ; <br />h = Plus @@ t ; <br />Ψ = UnitStep[h]

1

   4.    Can you figure out other ways to represent this process in Mathematica?

FormBox[RowBox[{ψ = {1, 0, 1, 1, 0}, ;, , RowBox[{α, =, RowBox[{{, RowBox[{1 ... ,, RowBox[{-, 0.2}], ,, 1.3}], }}]}], ;, , RowBox[{θ, =, 0.5}], ;}], TraditionalForm]

UnitStep[Apply[Plus, ψ α] - θ]

1

UnitStep[ψ . α - θ]

1

NB- To write a Mathematica function to do the thresholding for us we could say

ourThreshold[h_, θ_] := UnitStep[h - θ]

This function takes two arguments, the value to threshold and the threshold value.

Evaluate the following and double-click on a graphic to see an animation of the function as θ changes on the range [-1,1].

Table[Plot[ourThreshold[x, θ], {x, -2, 2}, PlotRange-> {{-2, 2}, {0, 1}}],  {θ, -1, 1, .2}] ;

[Graphics:HTMLFiles/index_102.gif]

webMathematica

These exercises are in file L2E3.js (link!)

A Little Shortcut

Recall that, based on Equation 1, the net activation, h is

h = ∑ψ α

There is a convenient mathematical shortcut for generating the net activation according to the above equation. The does exactly what the above equation specifies - it multiplies two vectors together and sums the resulting values. (Can you think of other mathematical / geometric applications where you've seen this process before? That is, multiplying things together and adding them all up?)

In notation, the dot product is denoted

ψ . α

The dot-product takes two vectors as an input and returns a single value (called a scalar)

ψ = {1, 0, 1} ; α = {.5, .23, -.32} ;  h = ψ . α

FormBox[0.18, TraditionalForm]

Explore

   1.   Try other combinations and different sizes of input vectors, weight vectors.

   2.   What happens if the two input vectors have different lengths? Does this even make sense?

   3.   The dot product has an interesting geometric interpretation. Use Mathematica to explore these.

Example answers

   1.   Try other combinations and different sizes of input vectors, weight vectors.

ψ = {1, 1, 1} ; α = {.5, .23, -.32} ; <br /> h = ψ . α

FormBox[0.41, TraditionalForm]

ψ = {0, 1, 1} ; α = {.5, .23, -.32} ;  h = ψ . α

FormBox[RowBox[{-, 0.09}], TraditionalForm]

ψ = {0, 1, 1, 0, 1} ; α = {.5, .23, -.32, .42, -.22} ; <br /> h = ψ . α

FormBox[RowBox[{-, 0.31}], TraditionalForm]

   2.  What happens if the two input vectors have different lengths? Does this even make sense?

ψ = {0, 1, 1} ; <br /> α = {.5, .23, -.32, .42, -.22} ;  h = ψ . α

Dot :: dotsh : Tensors  {0, 1, 1} and {0.5`, 0.23`, -0.32`, 0.42`, -0.22`} have incompatible shapes. More…

FormBox[RowBox[{{0, 1, 1}, ., RowBox[{{, RowBox[{0.5, ,, 0.23, ,, RowBox[{-, 0.32}], ,, 0.42, ,, RowBox[{-, 0.22}]}], }}]}], TraditionalForm]

Of course there are unequal amounts of things to multiply together and eventually add. Therefore no.

   3.  The dot product has an interesting geometric interpretation. Use Mathematica to explore these.

to come.

webMathematica

These exercises are in file L2E4.js (link!)

The Output Function

In a neural network, both physical and artificial, the outputs of neurons (axons) serve as inputs for other neurons (synapses). You may recall that the decided for this experiment to constrain ourselves to an all-or-nothing network— either the neuron fired or it did not. We represented these two states numerically by 1 and 0 respectively. We have a net activation value of FormBox[RowBox[{h, =, RowBox[{-, 3.1}]}], TraditionalForm] in this example but we need to convert it to one of the two states above, 0 or 1.

This transformation is achieved using an output function (also sometimes called a transfer function, or activation function). Notationally, we will generically use g(x) to indicate the output function.

Step Function

We've already looked at the simple unit step function, θ(x) or g_step(x), above.

As an equation the step function is simply —

FormBox[Cell[TextData[Cell[BoxData[g    (x) = θ(x)]]]], TextForm]                                     step

In Mathematica, functions are denoted —

g_step[x_] := UnitStep[x]

We are using so-called Traditional Notation in Mathematica, so we can get away with a little prettier version-

g_step(x_) := θ(x)

Recall that the step function looks like this —

Plot[g_step(x), {x, -2, 2}] ;

[Graphics:HTMLFiles/index_124.gif]

Linear Function

One frequently used activation function is the simple linear transfer —

g_lin (x) = x

In Mathematica

g_lin(x_) := x

and graphically —

Plot[g_lin(x), {x, -2, 2}] ;

[Graphics:HTMLFiles/index_128.gif]

Notice that this output function simply replicates its input. In this case notice that we've strayed from our original biologically-correct version of all-or-nothing activation. Still, these sorts of activation functions will have further utility down-the-road.

Theta Function

Confusing the situation a little bit, there is a sometimes used function called g_θ(x)that isn't just a unit step, instead it is a modification of the unit step as so –

g_θ (x) = x θ (x)

The result is a linear function, but only when x>0.

In Mathematica

g_θ(x_) := x θ(x)

and graphically —

Plot[g_θ(x), {x, -2, 2}] ;

[Graphics:HTMLFiles/index_134.gif]

Sigmoid Function

The sigmoid function is a commonly used activation function in artificial neural networks. Mathematically it is —

g_sig (x) = 1/(1 + ^(-x))

g_sig(x_) := 1/(1 + ^(-x)) ;

and graphically —

Plot[g_sig(x), {x, -5, 5}] ;

[Graphics:HTMLFiles/index_138.gif]

Gaussian Function

Another statistically interesting function is the Gaussian activation function —

g_gauss (x) = ^(-x^2)

g_gauss(x_) := ^(-x^2)

Graphically —

Plot[g_gauss(x), {x, -3, 3}] ;

[Graphics:HTMLFiles/index_142.gif]

Explore

   1.   Speculate on the use of each of these functions. What are some of them good at? What are they bad at?

   2.   Speculate on the statistical nature of the last two activation functions (the Sigmoid and Gaussian functions). Why would we want such a thing?

   3.   Take one of the full example from above and apply the different output functions to it (instead of UnitStep[]) does it change the resulting activation value? If so how?

Example answers

   1.   Speculate on the use of each of these functions. What are some of them good at? What are they bad at?

to come

   2.   Speculate on the statistical nature of the last two activation functions (the Sigmoid and Gaussian functions). Why would we want such a thing?

to come

   1.   Take one of the full example from above and apply the different output functions to it (instead of UnitStep[]) does it change the resulting activation value? If so how?

FormBox[RowBox[{RowBox[{ψ = {0, 1, 0, 0, 1}, ;, , FormBox[RowBox[{RowBox[{α, ... }], TraditionalForm] t = ψ α ; <br />h = Plus @@ t ; <br />Ψ = UnitStep[h]

1

Ψ = g_lin(h)

FormBox[3.39, TraditionalForm]

Ψ = g_step(h)

1

Ψ = g_θ(h)

FormBox[3.39, TraditionalForm]

Ψ = g_sig(h)

FormBox[0.967391, TraditionalForm]

Ψ = g_gauss(h)

FormBox[0.0000102104, TraditionalForm]

webMathematica

These exercises are in file L2E5.js (link!)

References

   [1]   McCulloch, WS, Pitts, W. 1943, 'A logical calculus of the ideas immanent in nervous activity', Bulletin of Mathematical Biophysics, vol. 5, pp 115133.

   [2]   Rosenblatt, F. 1959, ‘Two theorems of statistical separability in the perceptrton’, Proceedings of a Symposium on the Mechanization of Thought Processes, Her Majesty's Stationary Office, London, pp. 421–456.

   [3]   Rosenblatt, F. 1962, Principles of Neurodynamics, Spartan Books, New York.

Revision History

   1.  October 2003 - Initial ramblings version

   2. December 2003 - First release to CSAC


Created by Mathematica  (December 12, 2003)