This database is used as a cannonical example for several topics in data science and machine learning. The Pima indian diabetes dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The objective is to predict if a patient has diabetes based on diagnostic measurements of eight simple features. We will also use this dataset for learning the basics of the Pandas library (which is essential for data science and machine learning).

Let's first start with listing the required Python modules for this project. We use the Keras deep learning library which makes things simple and practical. In a subsequent study unit we will also show how this can be done with Tensorflow.

Also note that we are using a designated module kerutils which contains some useful Keras and Numpy utilities for all our course units. It can be downloaded from: https://github.com/samyzaf/kerutils

In [2]:
from keras.models import Sequential, load_model
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers.advanced_activations import PReLU, SReLU, ELU, LeakyReLU, ThresholdedReLU, ParametricSoftplus
from keras.layers.noise import GaussianNoise
from keras.utils.visualize_util import plot
import matplotlib.pyplot as plt
import matplotlib.cm
import pandas as pd
from pandas.tools.plotting import scatter_matrix
from kerutils import *

from matplotlib import rcParams
rcParams['axes.grid'] = True
rcParams['figure.figsize'] = 10,7
%matplotlib inline

# fixed random seed for reproducibility
np.random.seed(0)
from kerutils import *
In [1]:
# These are css/html styles for good looking ipython notebooks
from IPython.core.display import HTML
css = open('style-notebook.css').read()
HTML('<style>{}</style>'.format(css))
Out[1]:
In [4]:
features = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
data = pd.read_csv('pima.csv', names=features)
In [5]:
# Lets view the first 10 rows of the data set
# See below what these names mean

data.head(10)
Out[5]:
preg plas pres skin test mass pedi age class
0 6 148 72 35 0 33.6 0.627 50 1
1 1 85 66 29 0 26.6 0.351 31 0
2 8 183 64 0 0 23.3 0.672 32 1
3 1 89 66 23 94 28.1 0.167 21 0
4 0 137 40 35 168 43.1 2.288 33 1
5 5 116 74 0 0 25.6 0.201 30 0
6 3 78 50 32 88 31.0 0.248 26 1
7 10 115 0 0 0 35.3 0.134 29 0
8 2 197 70 45 543 30.5 0.158 53 1
9 8 125 96 0 0 0.0 0.232 54 1

Medical Features short names

  1. preg = Number of times pregnant
  2. plas = Plasma glucose concentration a 2 hours in an oral glucose tolerance test
  3. pres = Diastolic blood pressure (mm Hg)
  4. skin = Triceps skin fold thickness (mm)
  5. test = 2-Hour serum insulin (mu U/ml)
  6. mass = Body mass index (weight in kg/(height in m)^2)
  7. pedi = Diabetes pedigree function
  8. age = Age (years)
  9. class = Class variable (1:tested positive for diabetes, 0: tested negative for diabetes)

More information: https://archive.ics.uci.edu/ml/datasets/Pima+Indians+Diabetes

In [6]:
# How many rows do we have?

data.index
Out[6]:
RangeIndex(start=0, stop=768, step=1)
In [7]:
# Try2: How many rows do we have?

len(data.index)
Out[7]:
768
In [8]:
# How many columns do we have?

len(data.columns)
Out[8]:
9
In [9]:
# How many records do we have in our data set?

data.size    # 9 * 768
Out[9]:
6912
In [10]:
# You can get everything in one line !

data.shape
Out[10]:
(768, 9)
In [11]:
# View the last 10 records of our data set

data.tail(10)
Out[11]:
preg plas pres skin test mass pedi age class
758 1 106 76 0 0 37.5 0.197 26 0
759 6 190 92 0 0 35.5 0.278 66 1
760 2 88 58 26 16 28.4 0.766 22 0
761 9 170 74 31 0 44.0 0.403 43 1
762 9 89 62 0 0 22.5 0.142 33 0
763 10 101 76 48 180 32.9 0.171 63 0
764 2 122 70 27 0 36.8 0.340 27 0
765 5 121 72 23 112 26.2 0.245 30 0
766 1 126 60 0 0 30.1 0.349 47 1
767 1 93 70 31 0 30.4 0.315 23 0
In [12]:
# The plot() method plots all features
# This is TMI (too much information)

data.plot()   
Out[12]:
<matplotlib.axes._subplots.AxesSubplot at 0x18e66a67ef0>
In [13]:
# Better try one at a time
# Here is the plasma concentration level per record distribution
# (less is more!)

data.plot(y='plas')
Out[13]:
<matplotlib.axes._subplots.AxesSubplot at 0x18e67ec9b70>
In [14]:
# The graph above does not look very useful.
# Let's sort the plas values
plt.plot(sorted(data.plas.as_matrix()))
Out[14]:
[<matplotlib.lines.Line2D at 0x18e67fbdb70>]
In [15]:
# This is the age distribution (sorted!)

plt.plot(sorted(data.age.as_matrix()))
Out[15]:
[<matplotlib.lines.Line2D at 0x18e680259b0>]
In [16]:
# Age range

print("Min age = %d, Max age = %d" % (data.age.min(), data.age.max()))
Min age = 21, Max age = 81
In [17]:
# Age average

data.age.mean()
Out[17]:
33.240885416666664
In [18]:
# Age median

data.age.median()
Out[18]:
29.0
In [19]:
top = plt.subplot2grid((4,4), (0, 0), rowspan=2, colspan=4)
top.scatter(data['plas'], data['age'])
plt.title('Age against Plasma')
bottom = plt.subplot2grid((4,4), (2,0), rowspan=2, colspan=4)
bottom.bar(data['preg'], data['plas'])
plt.title('Plasma against Pregnancies')
plt.tight_layout()

Distribution by class=0 and class=1 (negative/positive diabetes test)

In [20]:
# Items with class == 0 (negative diabetes check)

data0 = data[data['class'] == 0]
len(data0.index)
Out[20]:
500
In [21]:
# Items with class == 1 (persons with a positive diabtetes check)

data1 = data[data['class'] == 1]
len(data1.index)
Out[21]:
268
In [22]:
# The Pandas groupby method computes the distribution of one feature
# with respect to the others
# We see 8 histograms distrubuted against a negative diabetes chck
# and other 8 histograms with distribution against a positive diabetes check

data.groupby('class').hist(figsize=(8,8), xlabelsize=7, ylabelsize=7)
Out[22]:
class
0    [[Axes(0.125,0.684722;0.215278x0.215278), Axes...
1    [[Axes(0.125,0.684722;0.215278x0.215278), Axes...
dtype: object
In [23]:
sm = scatter_matrix(data, alpha=0.2, figsize=(7.5, 7.5), diagonal='kde')

[plt.setp(item.yaxis.get_majorticklabels(), 'size', 6) for item in sm.ravel()]
[plt.setp(item.xaxis.get_majorticklabels(), 'size', 6) for item in sm.ravel()]
plt.tight_layout(h_pad=0.15, w_pad=0.15)

Deep Learning with Keras

All the above Pandas coding and statistical charts are nice and helpful but in most practical situations are useless for coming up with an algorithm for predicting if a person is likely to have diabetes based on his 8 medical records.

In recent years, deep neural networks has been suggested as an effective technique for solving such problem, and so far has shown to be successful in many areas. We will try to build a simple neural network for predicting if a person has diabetes (0 or 1), based of the 8 features in his record. This is the full data except the class column! We want to be able to predict the class (positive or negative diabetes check) from the 8 features (as input)

In [24]:
#  Let's first extract the first 8 features from our data (from the 9 we have)
#  We want to be able to predict the class (positive or negative diabetes check)

X = data.ix[:,0:8]
In [25]:
# let's look at the first 10 records sample

X.head(10)
Out[25]:
preg plas pres skin test mass pedi age
0 6 148 72 35 0 33.6 0.627 50
1 1 85 66 29 0 26.6 0.351 31
2 8 183 64 0 0 23.3 0.672 32
3 1 89 66 23 94 28.1 0.167 21
4 0 137 40 35 168 43.1 2.288 33
5 5 116 74 0 0 25.6 0.201 30
6 3 78 50 32 88 31.0 0.248 26
7 10 115 0 0 0 35.3 0.134 29
8 2 197 70 45 543 30.5 0.158 53
9 8 125 96 0 0 0.0 0.232 54
In [26]:
#This is the output class column!
Y = data.ix[:,8]
Y.head(10)
Out[26]:
0    1
1    0
2    1
3    0
4    1
5    0
6    1
7    0
8    1
9    1
Name: class, dtype: int64

Keras Model Defintion

  1. A sequential neural-network in Keras consists of a sequence of layers, starting from the inputs layer up to the output layer (also known as Feedforward Neural Network). The number and breadth of intermediate layers ("deep layers" or "hidden layers") depends on accuracy and performance requirements that we impose. We can start with a single hidden layer and add more neurons to it or add more hidden layers in progressive stages until we meet our accuracy requirements.

  2. The second factor we need to attend is the connectivity type of our network: which neurons from a given layer are connected to which neurons at the next layer? The most straightforward type is full connectivity, which means that every neuron at a given layer is connected to every neuron at the next layer. This type of connectivity is reffered to by the Dense method of Keras layers module.

  3. The third factor is what kind of activation function are we going to use? Each non-input neuron receives many inputs from neurons at the layer below it. How do we determine its output? The function that determines the value that a neuron fires in response to its inputs vector is called the activation function of the model.

  4. Each connection must be assigned an initial weight, which is going to be tuned during the training phase. The usual arrangement is to assign small random weights when the model is defined. It usually has a negligible impact on the training speed.

The most common activation function used is the rectified linear unit (ReLU) function which is simply $$ \textbf{relu}(\vec{x}) = \max(\vec{w} \cdot \vec{x} + b, 0) $$ where $\vec{x}$ is the inputs vector, $\vec{w}$ is the weights vector, and $b$ is the bias value. In many experiments it has been determined that it is fast and leads to more accurate model.

For the output layer, the common practice is to use the sigmoid activation function as its output is always between 0 and 1 (which is useful for classifying). The sigmoid activation function is defined by: $$ \sigma(\vec{x}) = \frac{1}{1 + e^{-(\Sigma\vec{w} \cdot \vec{x} + b)}} \\ \hbox{ } $$ where $\vec{x}$ is the inputs vector, $\vec{w}$ is the weights vector, and $b$ is the bias value.

More information on activation functions is found here: http://stats.stackexchange.com/questions/115258/comprehensive-list-of-activation-functions-in-neural-networks-with-pros-cons

In [27]:
# Our first Keras Model
model1 = Sequential()
model1.add(Dense(16, input_shape=(8,), init='uniform', activation='relu'))
model1.add(Dense(1, init='uniform', activation='sigmoid'))
In [28]:
plot(model1, to_file='model1.png', show_shapes=True)

Compiling the model

After defining our model, we need to compile it to an efficient object code. If you own an advanced graphics card (GPU) like NVIDIA Tesla or Titan, then with proper configuration you will be able to run these nn models in a much faster and efficient manner than on a traditional Intel/AMD CPU.

Keras has two backends: Theano and Tensorflow (Google). Just recently, Tensorflow has been released to Microsoft Windows, but requires Python 3.5. Theano and Tensorflow are two important backend deep learning numerical engines that shall be covered in more detail later on ...

The Keras compile method accepts two important parameters: loss and optimizer.

  1. A loss function role is to measure the distance from an actual output to the correct output (the error). It is the main source of feedback in the training process. Each time we run our model on a given input, the loss function measures the error of the output from the target objective. The main purpose of the training phase is to minimize the loss values. The standard choice is the logarithmic loss function which in Keras is reffered to as binary_crossentropy. The formula is: $$ \textbf{cross_entropy} (\vec{t}, \vec{o}) = - ( \vec{t} \cdot \log(\vec{o}) + (1 - \vec{t}) \cdot \log(1 - \vec{o})) $$ where $\vec{t}$ is the target vector and $\vec{o}$ is the model output vector.

  2. The optimizer argument is a function which Keras is using for optimizing the model edge weights during the training process. Each time Keras finds a gap between the model output and the correct output, it tries to tune the weights of the model connections so as to minimize this gap as much as possible. This is a very delicate and highly intricate process that deserves a math course by itself. For our purposes it's enough to be familiar with its simplest form which in traditional math is called Stochastic Gradient Descent . Keras is using a specific variation which it calls adam. If needed, more information on this optimizer can be picked from here: https://arxiv.org/abs/1412.6980

  3. The last parameter for the compile function is the metrics argument. It designates a method to judge the overall performance of our model by looking at large lists of outputs and targets. This is enough for the moment, it will be covered in more depth later on. Keras referrence: https://keras.io/metrics

In [29]:
model1.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

Training our Model

After defining and compiling our model, it's time for training!
We will feed our model with many samples as inputs and their correct outputs. Hopefully, after learning many real examples, our model will be capable of solving new cases.

  1. The training phase consists of several cycles (called epochs in Keras jargon), in which our training set is fed to it repeatedly. The number of epochs is up to you, and we may experiment with several options.
  2. In the training phase, our model will tune the weights of its edges. Keras provides an option to decide at which points the weights update is done through the batch_size argument. The weights update will occur after every batch_size training instances. Note that each epoch consists of the training of all test cases. We will try 200 epochs and a batch_size of 16.
  3. Our pima data set consists of 768 cases. We will use 668 of them for training and the remaining 100 cases for validation (testing).
In [30]:
X_train = X[0:668].as_matrix()
Y_train = Y[0:668].as_matrix()
X_test = X[668:].as_matrix()
Y_test = Y[668:].as_matrix()

FitMonitor

The FitMonitor callback is defined in the kerutils module and its purpose is to monitor the progress of the training process, report progress and some intermediate statistics, and save the best model in which the training accuracy and validations accuracy are maximal and are close enough (see 'filename' the 'thresh' parameters below).

In [31]:
fmon = FitMonitor()

And now the actual training is performed by the Keras Model fit method

In [32]:
h = model1.fit(
    X_train, Y_train,
    nb_epoch=100,
    batch_size=10,
    verbose=0,
    validation_data=(X_test, Y_test),
    callbacks=[fmon],
) 
Train begin: 2017-01-29 18:22:40
Stop file: stop_training_file.keras (create this file to stop training gracefully)
Pause file: pause_training_file.keras (create this file to pause training and view graphs)
batch_size = 10
do_validation = True
metrics = ['loss', 'acc', 'val_loss', 'val_acc']
nb_epoch = 100
nb_sample = 668
verbose = 0
.....05% epoch=5, acc=0.679641, loss=0.614703, val_acc=0.660000, val_loss=0.628794, time=0.5 seconds
.....10% epoch=10, acc=0.679641, loss=0.591485, val_acc=0.620000, val_loss=0.630049, time=0.9 seconds
.....15% epoch=15, acc=0.703593, loss=0.574162, val_acc=0.640000, val_loss=0.631649, time=1.2 seconds
.....20% epoch=20, acc=0.715569, loss=0.565563, val_acc=0.640000, val_loss=0.613058, time=1.5 seconds
.....25% epoch=25, acc=0.738024, loss=0.557478, val_acc=0.660000, val_loss=0.589605, time=1.8 seconds
.....30% epoch=30, acc=0.729042, loss=0.550090, val_acc=0.660000, val_loss=0.610700, time=2.1 seconds
.....35% epoch=35, acc=0.741018, loss=0.549702, val_acc=0.630000, val_loss=0.623895, time=2.4 seconds
.....40% epoch=40, acc=0.729042, loss=0.543815, val_acc=0.690000, val_loss=0.588585, time=2.8 seconds
.....45% epoch=45, acc=0.742515, loss=0.539460, val_acc=0.680000, val_loss=0.596132, time=3.1 seconds
.....50% epoch=50, acc=0.739521, loss=0.533094, val_acc=0.660000, val_loss=0.589717, time=3.3 seconds
.....55% epoch=55, acc=0.757485, loss=0.526671, val_acc=0.680000, val_loss=0.588590, time=3.6 seconds
.....60% epoch=60, acc=0.738024, loss=0.530631, val_acc=0.680000, val_loss=0.599560, time=3.9 seconds
.....65% epoch=65, acc=0.755988, loss=0.521124, val_acc=0.680000, val_loss=0.583426, time=4.2 seconds
.....70% epoch=70, acc=0.747006, loss=0.518439, val_acc=0.710000, val_loss=0.583134, time=4.5 seconds
.....75% epoch=75, acc=0.751497, loss=0.510267, val_acc=0.690000, val_loss=0.590711, time=4.8 seconds
.....80% epoch=80, acc=0.767964, loss=0.508868, val_acc=0.690000, val_loss=0.589785, time=5.1 seconds
.....85% epoch=85, acc=0.758982, loss=0.504152, val_acc=0.660000, val_loss=0.624368, time=5.4 seconds
.....90% epoch=90, acc=0.760479, loss=0.505475, val_acc=0.690000, val_loss=0.582637, time=5.7 seconds
.....95% epoch=95, acc=0.747006, loss=0.498779, val_acc=0.690000, val_loss=0.574550, time=6.0 seconds
.... 99% epoch=99 acc=0.758982 loss=0.498867
Train end: 2017-01-29 18:22:47
Total run time: 6.3 seconds
max_acc = 0.781437  epoch = 98
max_val_acc = 0.760000  epoch = 99

Saving the model to a file. It can be loaded later with the load_model method. Note that Keras uses the HDF5 file format to save the model.

In [33]:
model1.save("model_1.h5")

We use the show_scores utility from the kerutils module for printing the model scores and plotting the accuracy and loss graphs.

The view_acc and view_loss functions can be used to draw these graphs and are also defined in the kerutils module as well.

In [34]:
show_scores(model1, h, X_train, Y_train, X_test, Y_test)
Training: accuracy   = 0.770958 loss = 0.484062
Validation: accuracy = 0.760000 loss = 0.539452
Over fitting score   = 0.062973
Under fitting score  = 0.058786
Params count: 161
stop epoch = 99
nb_epoch = 100
batch_size = 10
nb_sample = 668
In [35]:
# At any stage we can evaluate the accuracy and loss of our training set

loss, accuracy = model1.evaluate(X_train, Y_train, verbose=0)
print("Training: accuracy=%f, loss=%f" % (accuracy, loss))
Training: accuracy=0.770958, loss=0.484062
In [36]:
# It is more interesting to evaluate our validation data
# and see if its scores resembles those of our training data

loss, accuracy = model1.evaluate(X_test, Y_test, verbose=0)
print("Validation: accuracy=%f, loss=%f" % (accuracy, loss))
Validation: accuracy=0.760000, loss=0.539452

The validation scores came out quite close to the training scores (this gap is usually an indication of an overfitting phenomenon which we will discuss more later on). In order to appreciate this result, we should ask ourselves what is the probability for it to happen? Just by looking at the data there doesn't seem to be any correlation between the validation set to the training set. The model was built by looking at the trainig data alone, and has not seen any of the test data at all. Theoretically, it well may happen that diabetes is a chaotic illness, and that there is no relation between the test data and the trainig data !?

Suppose we have $n$ records in our validation data X_test: ${x_1, x_2, x_3, \cdots, x_n}$. The probabilty for guessing the diabetes check for each record is $0.5$, and therefore the probability to guess more than 75% of the results is given by the formula $$ \textbf{probability} = \sum_{i=k}^{n} \binom{n}{i} \cdot p^{i} \cdot (1-p)^{n-i} $$ where $p = 0.5$ and $k = \textbf{int}(0.75 n)$. In our case, $n = 168$, $k = 117$, and $p = 0.5$.

A simple calculation yields: $$ \textbf{probabilty} = 3.0047541e-11 = 0.000000000030047541 $$

This is a convincing indication that the Pima Indian Diabetes database is indeed governed by some hidden laws which were partially captured by our neural-network (model1) but are not visible to the naked eye, not even for an experienced physician eye (I would even add and say that it would be foolish to trust a human to master the wisdom behind this data).

The amazing fact is that it took Keras 5 seconds to build the model1 neural-network, and it well maybe more trustable than some human trying to capture the database structure by a manual study of it.

Code for computing this probability can be obtained from
http://www.samyzaf.com/cgi-bin/view_file.py?file=ML/lib/prob.py

Trying model2 ...

As model1 has shown an encouraging start. We will try now to add an additional deep layer with 8 neurons (just for curiosity testing) and build a new model2, which hopefully will be more successful than model1.

We will also increase the number of epochs (nb_epoch) to 160 and reduce batch_size to 8. We will try to capture the best model (in training process) whose training accuracy is not below 0.8 (minacc) and its distance to the validation accuracy (thresh) is below 0.05. If a model of this sort is found during the training process, it will be automatically saved in the file model_2_autosave.h5 (can happen several times).

In [40]:
# Our second Keras Model
model2 = Sequential()
model2.add(Dense(8, input_dim=8, init='uniform', activation='relu'))
model2.add(Dense(16, init='uniform', activation='relu'))
model2.add(Dense(1, init='uniform', activation='sigmoid'))

In [41]:
# compilation
model2.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
In [43]:
# training
fmon = FitMonitor(thresh=0.05, minacc=0.8, filename="model_2_autosave.h5")
h = model2.fit(X_train, Y_train,
    nb_epoch=200,
    batch_size=16,
    validation_data=(X_test, Y_test),
    callbacks=[fmon],
    verbose=0,
)

model2.save("model_2.h5")
show_scores(model2, h, X_train, Y_train, X_test, Y_test)
Train begin: 2017-01-29 18:26:32
Stop file: stop_training_file.keras (create this file to stop training gracefully)
Pause file: pause_training_file.keras (create this file to pause training and view graphs)
batch_size = 16
do_validation = True
metrics = ['loss', 'acc', 'val_loss', 'val_acc']
nb_epoch = 200
nb_sample = 668
verbose = 0
.....05% epoch=10, acc=0.790419, loss=0.434150, val_acc=0.790000, val_loss=0.550532, time=0.5 seconds
..
Saving model to model_2_autosave.h5: epoch=15, acc=0.802395, val_acc=0.780000
...10% epoch=20, acc=0.784431, loss=0.436291, val_acc=0.780000, val_loss=0.529988, time=0.9 seconds
.....15% epoch=30, acc=0.785928, loss=0.438413, val_acc=0.790000, val_loss=0.517252, time=1.3 seconds
.....20% epoch=40, acc=0.785928, loss=0.431293, val_acc=0.810000, val_loss=0.515973, time=1.8 seconds
.....25% epoch=50, acc=0.784431, loss=0.428734, val_acc=0.750000, val_loss=0.571004, time=2.1 seconds
.....30% epoch=60, acc=0.796407, loss=0.431660, val_acc=0.790000, val_loss=0.551121, time=2.5 seconds
.....35% epoch=70, acc=0.791916, loss=0.428090, val_acc=0.790000, val_loss=0.509764, time=2.9 seconds
.....40% epoch=80, acc=0.797904, loss=0.419425, val_acc=0.800000, val_loss=0.522724, time=3.3 seconds
.....45% epoch=90, acc=0.799401, loss=0.416887, val_acc=0.810000, val_loss=0.536384, time=3.7 seconds
.
Saving model to model_2_autosave.h5: epoch=93, acc=0.806886, val_acc=0.780000
...
Saving model to model_2_autosave.h5: epoch=98, acc=0.811377, val_acc=0.780000
.50% epoch=100, acc=0.806886, loss=0.415441, val_acc=0.810000, val_loss=0.529199, time=4.1 seconds
..
Saving model to model_2_autosave.h5: epoch=104, acc=0.812874, val_acc=0.800000
...55% epoch=110, acc=0.791916, loss=0.420904, val_acc=0.810000, val_loss=0.532436, time=4.5 seconds
.....60% epoch=120, acc=0.791916, loss=0.413900, val_acc=0.800000, val_loss=0.551974, time=4.9 seconds
.....65% epoch=130, acc=0.791916, loss=0.425823, val_acc=0.800000, val_loss=0.562812, time=5.3 seconds
.....70% epoch=140, acc=0.796407, loss=0.413322, val_acc=0.800000, val_loss=0.565352, time=5.7 seconds
.....75% epoch=150, acc=0.799401, loss=0.409299, val_acc=0.790000, val_loss=0.550602, time=6.1 seconds
..
Saving model to model_2_autosave.h5: epoch=155, acc=0.812874, val_acc=0.810000
...80% epoch=160, acc=0.809880, loss=0.416567, val_acc=0.800000, val_loss=0.548553, time=6.5 seconds
.....85% epoch=170, acc=0.806886, loss=0.402324, val_acc=0.810000, val_loss=0.562464, time=6.9 seconds
.....90% epoch=180, acc=0.809880, loss=0.399845, val_acc=0.790000, val_loss=0.601024, time=7.3 seconds
.....95% epoch=190, acc=0.802395, loss=0.407739, val_acc=0.810000, val_loss=0.568395, time=7.7 seconds
.
Saving model to model_2_autosave.h5: epoch=193, acc=0.814371, val_acc=0.810000
... 99% epoch=199 acc=0.806886 loss=0.404510
Train end: 2017-01-29 18:26:40
Total run time: 8.1 seconds
max_acc = 0.814371  epoch = 193
max_val_acc = 0.830000  epoch = 37
Best model saved in file: model_2_autosave.h5
Checkpoint: epoch=193, acc=0.814371, val_acc=0.810000
Training: accuracy   = 0.811377 loss = 0.400858
Validation: accuracy = 0.780000 loss = 0.583821
Over fitting score   = 0.014753
Under fitting score  = 0.014365
Params count: 233
stop epoch = 199
nb_epoch = 200
batch_size = 16
nb_sample = 668

Adding one more hidden layer improved accuracy slightly. The training and validation scores are even closer to each other which adds confidence to our model. However note that during the training process the FitMonitor was able to capture and save a model with a slightly better scores.

In [44]:
# We need to load the saved model which is better than the last one
model2 = load_model("model_2_autosave.h5")
In [45]:
# accuracy and loss of our training set

loss, accuracy = model2.evaluate(X_train, Y_train, verbose=0)
print("Train: accuracy=%f, loss=%f" % (accuracy, loss))
Train: accuracy=0.814371, loss=0.390612
In [46]:
# accuracy and loss of validation set

loss, accuracy = model2.evaluate(X_test, Y_test, verbose=0)
print("Test: accuracy=%f, loss=%f" % (accuracy, loss))
Test: accuracy=0.810000, loss=0.574652

Trying model 3

Model 2 has been a slight imporvement. Let us experiment with lots of neurons, and a different kind of activation function (leaky relu), just to get the hang of it. Note that Keras advanced activation functions are considered as layers (not functions), so they are specified in a separate line.

In [84]:
# Our third Keras Model
model3 = Sequential()
model3.add(Dense(8, input_dim=8, init='uniform'))
model3.add(LeakyReLU(0.2))
model3.add(Dense(64, init='uniform'))
model3.add(LeakyReLU(0.2))
model3.add(Dense(128, init='uniform'))
model3.add(LeakyReLU(0.2))
model3.add(Dense(64, init='uniform'))
model3.add(LeakyReLU(0.2))
model3.add(Dense(1, init='uniform', activation='sigmoid'))
In [85]:
model3.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
In [86]:
fmon = FitMonitor(thresh=0.05, minacc=0.8, filename="model_3_autosave.h5")
h = model3.fit(
    X_train, Y_train,
    nb_epoch=300,
    batch_size=12,
    validation_data=(X_test, Y_test),
    callbacks=[fmon],
    verbose=0,
)

# model3.save("model_3.h5")
show_scores(model3, h, X_train, Y_train, X_test, Y_test)
Train begin: 2017-01-29 18:45:23
Stop file: stop_training_file.keras (create this file to stop training gracefully)
Pause file: pause_training_file.keras (create this file to pause training and view graphs)
batch_size = 12
do_validation = True
metrics = ['loss', 'acc', 'val_loss', 'val_acc']
nb_epoch = 300
nb_sample = 668
verbose = 0
.....05% epoch=15, acc=0.742515, loss=0.534947, val_acc=0.720000, val_loss=0.546758, time=1.8 seconds
.....10% epoch=30, acc=0.763473, loss=0.477897, val_acc=0.780000, val_loss=0.490335, time=3.0 seconds
.....15% epoch=45, acc=0.760479, loss=0.459392, val_acc=0.790000, val_loss=0.502018, time=4.2 seconds
.....20% epoch=60, acc=0.781437, loss=0.443577, val_acc=0.750000, val_loss=0.503500, time=5.3 seconds
.....25% epoch=75, acc=0.764970, loss=0.462783, val_acc=0.760000, val_loss=0.533327, time=6.4 seconds
.....30% epoch=90, acc=0.772455, loss=0.452717, val_acc=0.750000, val_loss=0.544288, time=7.5 seconds
.....35% epoch=105, acc=0.782934, loss=0.421693, val_acc=0.790000, val_loss=0.516013, time=8.6 seconds
..
Saving model to model_3_autosave.h5: epoch=111, acc=0.800898, val_acc=0.790000
...40% epoch=120, acc=0.784431, loss=0.408205, val_acc=0.760000, val_loss=0.509900, time=10.1 seconds
.....45% epoch=135, acc=0.790419, loss=0.431271, val_acc=0.770000, val_loss=0.479991, time=11.3 seconds
.
Saving model to model_3_autosave.h5: epoch=140, acc=0.802395, val_acc=0.780000
..
Saving model to model_3_autosave.h5: epoch=144, acc=0.803892, val_acc=0.760000
..50% epoch=150, acc=0.788922, loss=0.404927, val_acc=0.750000, val_loss=0.514634, time=12.5 seconds
..
Saving model to model_3_autosave.h5: epoch=157, acc=0.815868, val_acc=0.800000
...55% epoch=165, acc=0.793413, loss=0.411722, val_acc=0.750000, val_loss=0.560564, time=13.7 seconds
.....60% epoch=180, acc=0.793413, loss=0.392039, val_acc=0.770000, val_loss=0.492842, time=14.8 seconds
.....65% epoch=195, acc=0.799401, loss=0.417078, val_acc=0.660000, val_loss=0.609460, time=15.9 seconds
..
Saving model to model_3_autosave.h5: epoch=203, acc=0.824850, val_acc=0.800000
...70% epoch=210, acc=0.821856, loss=0.371159, val_acc=0.760000, val_loss=0.569873, time=17.1 seconds
.....75% epoch=225, acc=0.820359, loss=0.378846, val_acc=0.750000, val_loss=0.581183, time=18.3 seconds
.....80% epoch=240, acc=0.823353, loss=0.372036, val_acc=0.780000, val_loss=0.540321, time=19.4 seconds
.....85% epoch=255, acc=0.830838, loss=0.348740, val_acc=0.740000, val_loss=0.638012, time=20.6 seconds
.....90% epoch=270, acc=0.839820, loss=0.340097, val_acc=0.770000, val_loss=0.603042, time=21.7 seconds
.....95% epoch=285, acc=0.827844, loss=0.347103, val_acc=0.770000, val_loss=0.706822, time=22.8 seconds
.... 99% epoch=299 acc=0.832335 loss=0.333094
Train end: 2017-01-29 18:45:47
Total run time: 23.9 seconds
max_acc = 0.853293  epoch = 295
max_val_acc = 0.810000  epoch = 123
Best model saved in file: model_3_autosave.h5
Checkpoint: epoch=203, acc=0.824850, val_acc=0.800000
Training: accuracy   = 0.854790 loss = 0.298251
Validation: accuracy = 0.780000 loss = 0.653372
Over fitting score   = 0.052193
Under fitting score  = 0.041904
Params count: 17289
stop epoch = 299
nb_epoch = 300
batch_size = 12
nb_sample = 668
In [89]:
# accuracy and loss of our training set

loss, accuracy = model3.evaluate(X_train, Y_train, verbose=0)
print("Train: accuracy=%f, loss=%f" % (accuracy, loss))
Train: accuracy=0.854790, loss=0.298251
In [90]:
# accuracy and loss of testing set

loss, accuracy = model3.evaluate(X_test, Y_test, verbose=0)
print("Validation: accuracy=%f, loss=%f" % (accuracy, loss))
Validation: accuracy=0.780000, loss=0.653372

The training accuracy came out better (85.48% !) but the validation accuracy is a bit far behind, which means we are in an overfitting scenario. Our model is memorizing the training samples and is no good for other samples.

Most likely we will not be able to do better than this without more training data. Luckily, real medical databases usually contain larger sets of data that suffice for building efficient neural-networks. This is one of main lessons one learns each time: a neural network needs a lot of training samples. This is why it is important for institutions to collect as much data as possible.

Model 4

Sometimes, switching to a different activation function may help. Here is model 3 with the S-shaped linear rectifier (SReLU).

In [94]:
# Our 4th Keras Model
model4 = Sequential()
model4.add(Dense(8, input_dim=8, init='uniform'))
model4.add(SReLU())
model4.add(Dense(64, init='uniform'))
model4.add(SReLU())
model4.add(Dense(128, init='uniform'))
model4.add(SReLU())
model4.add(Dense(64, init='uniform'))
model4.add(SReLU())
model4.add(Dense(1, init='uniform', activation='sigmoid'))

model4.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

fmon = FitMonitor(thresh=0.05, minacc=0.8, filename="model_4_autosave.h5")
h = model4.fit(
    X_train, Y_train,
    nb_epoch=300,
    batch_size=12,
    validation_data=(X_test, Y_test),
    callbacks=[fmon],
    verbose=0,
)

model4.save("model_4.h5")
show_scores(model4, h, X_train, Y_train, X_test, Y_test)
Train begin: 2017-01-29 19:16:57
Stop file: stop_training_file.keras (create this file to stop training gracefully)
Pause file: pause_training_file.keras (create this file to pause training and view graphs)
batch_size = 12
do_validation = True
metrics = ['loss', 'acc', 'val_loss', 'val_acc']
nb_epoch = 300
nb_sample = 668
verbose = 0
.....05% epoch=15, acc=0.726048, loss=0.530695, val_acc=0.680000, val_loss=0.586641, time=3.2 seconds
.....10% epoch=30, acc=0.763473, loss=0.481737, val_acc=0.730000, val_loss=0.495109, time=4.9 seconds
.....15% epoch=45, acc=0.758982, loss=0.485945, val_acc=0.790000, val_loss=0.486149, time=6.5 seconds
.....20% epoch=60, acc=0.751497, loss=0.467313, val_acc=0.770000, val_loss=0.488012, time=8.2 seconds
.....25% epoch=75, acc=0.769461, loss=0.459519, val_acc=0.670000, val_loss=0.609363, time=9.9 seconds
.....30% epoch=90, acc=0.775449, loss=0.445670, val_acc=0.730000, val_loss=0.545151, time=11.6 seconds
.....35% epoch=105, acc=0.769461, loss=0.446457, val_acc=0.720000, val_loss=0.573888, time=13.3 seconds
.....40% epoch=120, acc=0.784431, loss=0.424908, val_acc=0.780000, val_loss=0.558506, time=15.0 seconds
....
Saving model to model_4_autosave.h5: epoch=132, acc=0.806886, val_acc=0.780000
.45% epoch=135, acc=0.794910, loss=0.432571, val_acc=0.750000, val_loss=0.584586, time=17.3 seconds
.....50% epoch=150, acc=0.779940, loss=0.424596, val_acc=0.740000, val_loss=0.580437, time=19.0 seconds
.....55% epoch=165, acc=0.818862, loss=0.391982, val_acc=0.730000, val_loss=0.590264, time=20.7 seconds
.....60% epoch=180, acc=0.823353, loss=0.376876, val_acc=0.770000, val_loss=0.605785, time=22.4 seconds
.....65% epoch=195, acc=0.829341, loss=0.346577, val_acc=0.710000, val_loss=0.722939, time=24.1 seconds
.....70% epoch=210, acc=0.802395, loss=0.404480, val_acc=0.740000, val_loss=0.709640, time=25.9 seconds
.....75% epoch=225, acc=0.865269, loss=0.290825, val_acc=0.700000, val_loss=0.791550, time=27.6 seconds
.....80% epoch=240, acc=0.878743, loss=0.255682, val_acc=0.740000, val_loss=0.924942, time=29.3 seconds
.....85% epoch=255, acc=0.893713, loss=0.235522, val_acc=0.720000, val_loss=1.028225, time=31.0 seconds
.....90% epoch=270, acc=0.899701, loss=0.195577, val_acc=0.670000, val_loss=1.093014, time=32.7 seconds
.....95% epoch=285, acc=0.919162, loss=0.170115, val_acc=0.740000, val_loss=1.017229, time=34.4 seconds
.... 99% epoch=299 acc=0.938623 loss=0.150009
Train end: 2017-01-29 19:17:33
Total run time: 36.1 seconds
max_acc = 0.944611  epoch = 298
max_val_acc = 0.820000  epoch = 136
Best model saved in file: model_4_autosave.h5
Checkpoint: epoch=132, acc=0.806886, val_acc=0.780000
Training: accuracy   = 0.934132 loss = 0.151159
Validation: accuracy = 0.710000 loss = 1.334607
Over fitting score   = 0.119973
Under fitting score  = 0.086122
Params count: 18345
stop epoch = 299
nb_epoch = 300
batch_size = 12
nb_sample = 668
In [95]:
# accuracy and loss of our training set

loss, accuracy = model4.evaluate(X_train, Y_train, verbose=0)
print("Train: accuracy=%f, loss=%f" % (accuracy, loss))
Train: accuracy=0.934132, loss=0.151159
In [97]:
# accuracy and loss of testing set

loss, accuracy = model4.evaluate(X_test, Y_test, verbose=0)
print("Validation: accuracy=%f, loss=%f" % (accuracy, loss))
Validation: accuracy=0.710000, loss=1.334607

The model accuracy graph looks better but the gap is still high. However the FitMonitor was able to capture a better model with a smaller gap.

In [98]:
model4 = load_model("model_4_autosave.h5")
In [99]:
# accuracy and loss of our training set

loss, accuracy = model4.evaluate(X_train, Y_train, verbose=0)
print("Train: accuracy=%f, loss=%f" % (accuracy, loss))
Train: accuracy=0.805389, loss=0.395331
In [100]:
loss, accuracy = model4.evaluate(X_test, Y_test, verbose=0)
print("Test: accuracy=%f, loss=%f" % (accuracy, loss))
Test: accuracy=0.780000, loss=0.566750

Model 5

Using the keras.callbacks.Callback mechanism it is possible to reverse engineer Keras for finding a network architecture with higher success rates. But it does require some programming and a long thorough search. There's always a risk of running into the pit of overfitting, in which the model is too large and can actually memorize the training set in it weights structure. We will conclude this study unit with a model that combines three different activation functions. They were randomly chosen but nevertheless seem to improve accuracy scores. We even get a higher validation score than training (which is sometimes called "underfitting", but is more desirable than overfitting).

In [124]:
model5 = Sequential()
model5.add(Dense(8, input_dim=8, init='uniform', activation='relu'))
model5.add(Dense(64, init='uniform'))
model4.add(SReLU())
model5.add(Dense(64, init='uniform'))
model5.add(LeakyReLU(0.05))
model5.add(Dense(64, init='uniform', activation='sigmoid'))
model5.add(Dense(1, init='uniform', activation='sigmoid'))

model5.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

fmon = FitMonitor(thresh=0.05, minacc=0.8, filename="model_5_autosave.h5")
h = model5.fit(
    X_train, Y_train,
    nb_epoch=1024,
    batch_size=12,
    validation_data=(X_test, Y_test),
    callbacks=[fmon],
    verbose=0,
)

model5.save("model_5.h5")
show_scores(model5, h, X_train, Y_train, X_test, Y_test)
Train begin: 2017-01-29 20:34:21
Stop file: stop_training_file.keras (create this file to stop training gracefully)
Pause file: pause_training_file.keras (create this file to pause training and view graphs)
batch_size = 12
do_validation = True
metrics = ['loss', 'acc', 'val_loss', 'val_acc']
nb_epoch = 1024
nb_sample = 668
verbose = 0
.....05% epoch=52, acc=0.778443, loss=0.462300, val_acc=0.750000, val_loss=0.517249, time=5.0 seconds
.....10% epoch=103, acc=0.781437, loss=0.451038, val_acc=0.780000, val_loss=0.540028, time=9.0 seconds
.....15% epoch=154, acc=0.773952, loss=0.456800, val_acc=0.760000, val_loss=0.506888, time=12.8 seconds
...
Saving model to model_5_autosave.h5: epoch=185, acc=0.802395, val_acc=0.770000
..20% epoch=205, acc=0.782934, loss=0.445194, val_acc=0.800000, val_loss=0.519103, time=17.3 seconds
.....25% epoch=256, acc=0.770958, loss=0.435095, val_acc=0.830000, val_loss=0.481699, time=21.3 seconds
.
Saving model to model_5_autosave.h5: epoch=272, acc=0.802395, val_acc=0.830000
....30% epoch=308, acc=0.776946, loss=0.431295, val_acc=0.760000, val_loss=0.511191, time=25.5 seconds
.
Saving model to model_5_autosave.h5: epoch=323, acc=0.802395, val_acc=0.800000
....35% epoch=359, acc=0.787425, loss=0.430355, val_acc=0.740000, val_loss=0.513462, time=29.4 seconds

Saving model to model_5_autosave.h5: epoch=360, acc=0.812874, val_acc=0.770000
.....40% epoch=410, acc=0.791916, loss=0.434181, val_acc=0.780000, val_loss=0.529752, time=33.4 seconds
.....45% epoch=461, acc=0.782934, loss=0.419729, val_acc=0.760000, val_loss=0.494244, time=37.4 seconds
.....50% epoch=512, acc=0.802395, loss=0.419107, val_acc=0.780000, val_loss=0.536762, time=41.2 seconds
.....55% epoch=564, acc=0.803892, loss=0.393484, val_acc=0.820000, val_loss=0.514464, time=45.1 seconds
.
Saving model to model_5_autosave.h5: epoch=580, acc=0.820359, val_acc=0.780000
.
Saving model to model_5_autosave.h5: epoch=584, acc=0.826347, val_acc=0.800000
...60% epoch=615, acc=0.811377, loss=0.390501, val_acc=0.780000, val_loss=0.559063, time=49.0 seconds
.....65% epoch=666, acc=0.814371, loss=0.388266, val_acc=0.760000, val_loss=0.613043, time=52.9 seconds
.
Saving model to model_5_autosave.h5: epoch=679, acc=0.833832, val_acc=0.800000
....70% epoch=717, acc=0.818862, loss=0.382199, val_acc=0.800000, val_loss=0.544150, time=56.9 seconds
.....75% epoch=768, acc=0.814371, loss=0.394897, val_acc=0.730000, val_loss=0.622401, time=60.8 seconds
.....80% epoch=820, acc=0.839820, loss=0.357324, val_acc=0.760000, val_loss=0.607067, time=64.7 seconds
.....85% epoch=871, acc=0.838323, loss=0.357413, val_acc=0.800000, val_loss=0.598570, time=68.5 seconds
.....90% epoch=922, acc=0.841317, loss=0.361577, val_acc=0.740000, val_loss=0.578249, time=72.4 seconds
.....95% epoch=973, acc=0.838323, loss=0.353113, val_acc=0.780000, val_loss=0.616350, time=76.3 seconds
.... 99% epoch=1023 acc=0.838323 loss=0.359054
Train end: 2017-01-29 20:35:42
Total run time: 80.2 seconds
max_acc = 0.850299  epoch = 937
max_val_acc = 0.850000  epoch = 388
Best model saved in file: model_5_autosave.h5
Checkpoint: epoch=679, acc=0.833832, val_acc=0.800000
Training: accuracy   = 0.839820 loss = 0.345867
Validation: accuracy = 0.790000 loss = 0.664996
Over fitting score   = 0.047365
Under fitting score  = 0.037930
Params count: 9033
stop epoch = 1023
nb_epoch = 1024
batch_size = 12
nb_sample = 668
In [125]:
# accuracy and loss of our training set

loss, accuracy = model5.evaluate(X_train, Y_train, verbose=0)
print("Train: accuracy=%f, loss=%f" % (accuracy, loss))
Train: accuracy=0.839820, loss=0.345867
In [126]:
# accuracy and loss of testing set

loss, accuracy = model5.evaluate(X_test, Y_test, verbose=0)
print("Validation: accuracy=%f, loss=%f" % (accuracy, loss))
Validation: accuracy=0.790000, loss=0.664996

Summary

So with a 265 neurons network we were able to define a neural network with about 81% accuracy scores. The question is can we do better? and are there systematic methods for improving neural network models? Currently this seems to be a wild area for creativity and imagination, and there are some web sites that conduct public competitions of this type. See for example: https://www.kaggle.com/competitions

Our course projects will follow these type of challenges, from which we believe the students can learn and get a lot of practice. It will also make the grading task much simpler :-) ...

Course Project 1

Find a minimal neural network for the Pima Indians database, which adhers to the following requirements:

  1. Produces at least 85% accuracy on the training and testing data sets
  2. Number of deep layers does not exceed 5
  3. Number of nuerons at each layer does not exceed 256
  4. Number of training epochs does not exceed 5000
  5. Number of parameters (weights) does not exceed 200K.

The solution with the smallest number of parameters will receive the highest grade, and the remaining solutions will receive smaller grades relative to their ranking compared to the best solution.

Course Project 2

Construct a minimal neural-network for predicting breast cancer reocurrence events. Accuracy should be at least 85%. The database can be downloaded from the university of California Irvine at: https://archive.ics.uci.edu/ml/datasets/Breast+Cancer

Course Project 3

Download the song_year database from: http://www.samyzaf.com/ML/song_year/song_year.zip. This database contains more than half million song records (515345) that were composed from 1922 to 2011. Each record consists of 91 features related to its audio attributes (including the year of its composition). Your task is to designed a neural-network that can predict the year from the other 90 musical attributes. Existence of such small network will be an indication that musical composition is strongly related to the time it is composed. Is this the case?