Object Recognition in CIFAR-10 Image Database

CIFAR-10 is by now a classical computer-vision dataset for object recognition case study. It is a subset of the 80 million tiny images dataset that was designed and created by the Canadian Institute for Advanced Research (CIFAR, pronounced "see far").

The CIFAR-10 dataset consists of 60000 32x32x3 color images in 10 equal classes, (6000 images per class). Each class of images corresponds to a physical object (automobile, cat, dog, airplane, etc). It was collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. We strongly recommend looking at the following two sources before starting work on this notebook:

  1. http://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf
  2. http://cs231n.github.io/convolutional-networks/

Prerequisites

The code of this IPython notebook run on Windows 10, Python 2.7 with keras, numpy, matplotlib and jupyter. We also use an NVIDIA GPU (GeForce GTX950) with cuDNN 5103. Of course, it can also be run on a CPU but it will be significantly slower (not recommended!).

To run the code in this notebook, you'll also need to download the following course libraries which we use in several examples of this course:

  1. http://www.samyzaf.com/cgi-bin/view_file.py?file=ML/lib/kerutils.py
  2. http://www.samyzaf.com/cgi-bin/view_file.py?file=ML/lib/dlutils.py
  3. http://www.samyzaf.com/ML/style-notebook.css (notebook stylesheet)

You can actually download all the course modules from Github:
https://github.com/samyzaf/kerutils

In [2]:
from keras.datasets import cifar10
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential, load_model
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D
from keras.optimizers import SGD
from keras.constraints import maxnorm
from keras.utils import np_utils
from keras.layers.noise import GaussianNoise
from keras.layers.advanced_activations import SReLU
from keras.utils.visualize_util import plot
import pandas as pd
import matplotlib.pyplot as plt
import time, pickle
from kerutils import *
%matplotlib inline
Using Theano backend.
DEBUG: nvcc STDOUT mod.cu
   Creating library C:/Users/samy/AppData/Local/Theano/compiledir_Windows-10-10.0.14393-Intel64_Family_6_Model_94_Stepping_3_GenuineIntel-2.7.11-64/tmpuhhmdp/265abc51f7c376c224983485238ff1a5.lib and object C:/Users/samy/AppData/Local/Theano/compiledir_Windows-10-10.0.14393-Intel64_Family_6_Model_94_Stepping_3_GenuineIntel-2.7.11-64/tmpuhhmdp/265abc51f7c376c224983485238ff1a5.exp

Using gpu device 0: GeForce GTX 950 (CNMeM is enabled with initial size: 80.0% of memory, cuDNN 5103)
c:\anaconda2\lib\site-packages\theano\sandbox\cuda\__init__.py:600: UserWarning: Your cuDNN version is more recent than the one Theano officially supports. If you see any problems, try updating Theano or downgrading cuDNN to version 5.
  warnings.warn(warn)
In [1]:
# These are css/html styles for good looking ipython notebooks
from IPython.core.display import HTML
css = open('style-notebook.css').read()
HTML('<style>{}</style>'.format(css))
Out[1]:

The CIFAR-10 image classes are encoded as integers 0-9 by the following Python dictionary

In [3]:
nb_classes = 10
class_name = {
    0: 'airplane',
    1: 'automobile',
    2: 'bird',
    3: 'cat',
    4: 'deer',
    5: 'dog',
    6: 'frog',
    7: 'horse',
    8: 'ship',
    9: 'truck',
}

Load training and test data

In [4]:
(X_train, y_train), (X_test, y_test) = cifar10.load_data()
y_train = y_train.reshape(y_train.shape[0])  # somehow y_train comes as a 2D nx1 matrix
y_test = y_test.reshape(y_test.shape[0])

print('X_train shape:', X_train.shape)
print(X_train.shape[0], 'training samples)'
print(X_test.shape[0], 'validation samples)'
X_train shape: (50000L, 32L, 32L, 3L)
50000 training samples
10000 validation samples

The original data of each image is a 32x32x3 matrix of integers from 0 to 255. We need to scale it down to floats in the unit interval

In [5]:
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255

As usual, we must convert the y_train and y_test vectors to one-hot format:

0 → [1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
1 → [0, 1, 0, 0, 0, 0, 0, 0, 0, 0]
2 → [0, 0, 1, 0, 0, 0, 0, 0, 0, 0]
3 → [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]
etc...
In [6]:
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)

Let's also write two small utilities for drawing samples of images, so we can inspect our results visually.

In [7]:
def draw_img(i):
    im = X_train[i]
    c = y_train[i]
    plt.imshow(im)
    plt.title("Class %d (%s)" % (c, class_name[c]))
    plt.axis('on')

def draw_sample(X, y, n, rows=4, cols=4, imfile=None, fontsize=12):
    for i in range(0, rows*cols):
        plt.subplot(rows, cols, i+1)
        im = X[n+i].reshape(32,32,3)
        plt.imshow(im, cmap='gnuplot2')
        plt.title("{}".format(class_name[y[n+i]]), fontsize=fontsize)
        plt.axis('off')
        plt.subplots_adjust(wspace=0.6, hspace=0.01)
        #plt.subplots_adjust(hspace=0.45, wspace=0.45)
        #plt.tight_layout(pad=0.4, w_pad=0.5, h_pad=1.0)
    if imfile:
        plt.savefig(imfile)

Let's draw image 7 in X_train for example

In [8]:
draw_img(7)

To test the second utility, let's draw the first 15 images in a 3x5 grid:

In [9]:
draw_sample(X_train, y_train, 0, 3, 5)
In [12]:
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)

Building Neural Networks for CIFAR-10

In contrast to previous case studies, in this case it would be prohibitive to use fully connected neural network unless we have good reasons to believe that we can make good progress with a small number of neurons on layers beyond the input layer. The input layer would have to be of size 3072 (as every image is a 32x32x3 matrix). if we add a hidden layer with the same size, we'll end up with 9 milion synapses on the first floor. Adding one more layer of such size will take us to billions of synapses, which is of course impractical.

Deep learning frameworks have come up with special types of designated layers for processing images with minimal number of synapses (compared to Dense layer). Each image pixel is connected to a very small subset of pixels of size 3x3 or 5x5 in its neighborhood. Intuitively, image pixels are mostly impacted by pixels around them rather than pixels in a far away region of the image.

These two types of layers are explained in more detail in the following two articles, which we recommend to read before you approach the following code:

  1. http://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf
  2. http://cs231n.github.io/convolutional-networks/

We will start with a small Keras model which combines a well thought mix of Convolution2D, Maxpooling2D and Dense layers. It is mostly based on open source code examples by François Chollet (author of Keras from Google) and other similar sources:

  1. https://github.com/fchollet/keras/blob/master/examples/cifar10_cnn.py
  2. https://www.kaggle.com/okhan1/state-farm-distracted-driver-detection/testing-keras/run/232911
  3. http://blog.schlerp.net/2016/7/neural-networks-in-python-3-keras

Two Types of Training

We will use two types of training:

  1. Standard training: the usual Keras fit method
  2. Training with augmented data: In this mode, our training data is passing through a special Keras generator which applies certain image operations on each data item and generates new items for training. This way we can multiply our training data indefinitely as much as we wish and thus provide our model with as much training as we wish (but of course we should avoid overfitting).

The Keras generator for the second training mode is called ImageDataGenerator and can be understood from the Keras manual page:
https://keras.io/preprocessing/image/#imagedatagenerator

Lets Train Model 1 (standard training)

In [21]:
nb_epoch = 50
batch_size = 32

model1 = Sequential()
model1.add(Convolution2D(32, 3, 3, input_shape=(32, 32, 3), border_mode='same', activation='relu', W_constraint=maxnorm(3)))
model1.add(Dropout(0.2))
model1.add(Convolution2D(32, 3, 3, activation='relu', border_mode='same', W_constraint=maxnorm(3)))
model1.add(MaxPooling2D(pool_size=(2, 2)))
model1.add(Flatten())
model1.add(Dense(512, activation='relu', W_constraint=maxnorm(3)))
model1.add(Dropout(0.5))
model1.add(Dense(nb_classes, activation='softmax'))
# Compile model
lrate = 0.01
decay = lrate/nb_epoch
sgd = SGD(lr=lrate, momentum=0.9, decay=decay, nesterov=False)
model1.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
print(model1.summary())

print('Standard Training.')

h = model1.fit(
    X_train,
    Y_train,
    batch_size=batch_size,
    nb_epoch=nb_epoch,
    validation_data=(X_test, Y_test),
    shuffle=True
)

show_scores(model1, h, X_train, Y_train, X_test, Y_test)
print('Saving model1 to the file "model1.h5"')
model1.save("model1.h5")
Model Summary:
____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
convolution2d_3 (Convolution2D)  (None, 32, 32, 32)    896         convolution2d_input_2[0][0]      
____________________________________________________________________________________________________
dropout_3 (Dropout)              (None, 32, 32, 32)    0           convolution2d_3[0][0]            
____________________________________________________________________________________________________
convolution2d_4 (Convolution2D)  (None, 32, 32, 32)    9248        dropout_3[0][0]                  
____________________________________________________________________________________________________
maxpooling2d_2 (MaxPooling2D)    (None, 16, 16, 32)    0           convolution2d_4[0][0]            
____________________________________________________________________________________________________
flatten_2 (Flatten)              (None, 8192)          0           maxpooling2d_2[0][0]             
____________________________________________________________________________________________________
dense_3 (Dense)                  (None, 512)           4194816     flatten_2[0][0]                  
____________________________________________________________________________________________________
dropout_4 (Dropout)              (None, 512)           0           dense_3[0][0]                    
____________________________________________________________________________________________________
dense_4 (Dense)                  (None, 10)            5130        dropout_4[0][0]                  
====================================================================================================
Total params: 4210090
____________________________________________________________________________________________________
None
Standard Training.
Train on 50000 samples, validate on 10000 samples
Epoch 1/50
50000/50000 [==============================] - 34s - loss: 1.7511 - acc: 0.3645 - val_loss: 1.4763 - val_acc: 0.4601
Epoch 2/50
50000/50000 [==============================] - 34s - loss: 1.4102 - acc: 0.4927 - val_loss: 1.2470 - val_acc: 0.5588
Epoch 3/50
50000/50000 [==============================] - 34s - loss: 1.2466 - acc: 0.5534 - val_loss: 1.1544 - val_acc: 0.5939
Epoch 4/50
50000/50000 [==============================] - 34s - loss: 1.1335 - acc: 0.5971 - val_loss: 1.1024 - val_acc: 0.6102
Epoch 5/50
50000/50000 [==============================] - 34s - loss: 1.0310 - acc: 0.6337 - val_loss: 1.0466 - val_acc: 0.6321
Epoch 6/50
50000/50000 [==============================] - 34s - loss: 0.9422 - acc: 0.6659 - val_loss: 1.0070 - val_acc: 0.6489
Epoch 7/50
50000/50000 [==============================] - 34s - loss: 0.8644 - acc: 0.6940 - val_loss: 1.0108 - val_acc: 0.6482
Epoch 8/50
50000/50000 [==============================] - 35s - loss: 0.7860 - acc: 0.7201 - val_loss: 0.9407 - val_acc: 0.6727
Epoch 9/50
50000/50000 [==============================] - 34s - loss: 0.7211 - acc: 0.7420 - val_loss: 0.9388 - val_acc: 0.6803
Epoch 10/50
50000/50000 [==============================] - 34s - loss: 0.6512 - acc: 0.7714 - val_loss: 0.9561 - val_acc: 0.6750
Epoch 11/50
50000/50000 [==============================] - 34s - loss: 0.6022 - acc: 0.7869 - val_loss: 0.9649 - val_acc: 0.6781
Epoch 12/50
50000/50000 [==============================] - 34s - loss: 0.5512 - acc: 0.8027 - val_loss: 0.9646 - val_acc: 0.6828
Epoch 13/50
50000/50000 [==============================] - 34s - loss: 0.5062 - acc: 0.8198 - val_loss: 0.9558 - val_acc: 0.6915
Epoch 14/50
50000/50000 [==============================] - 34s - loss: 0.4622 - acc: 0.8348 - val_loss: 0.9656 - val_acc: 0.6870
Epoch 15/50
50000/50000 [==============================] - 34s - loss: 0.4208 - acc: 0.8499 - val_loss: 0.9934 - val_acc: 0.6902
Epoch 16/50
50000/50000 [==============================] - 34s - loss: 0.3878 - acc: 0.8624 - val_loss: 1.0046 - val_acc: 0.6924
Epoch 17/50
50000/50000 [==============================] - 34s - loss: 0.3639 - acc: 0.8704 - val_loss: 1.0410 - val_acc: 0.6898
Epoch 18/50
50000/50000 [==============================] - 34s - loss: 0.3318 - acc: 0.8824 - val_loss: 1.0516 - val_acc: 0.6917
Epoch 19/50
50000/50000 [==============================] - 34s - loss: 0.3084 - acc: 0.8897 - val_loss: 1.0763 - val_acc: 0.6924
Epoch 20/50
50000/50000 [==============================] - 34s - loss: 0.2865 - acc: 0.8982 - val_loss: 1.1012 - val_acc: 0.6929
Epoch 21/50
50000/50000 [==============================] - 34s - loss: 0.2690 - acc: 0.9031 - val_loss: 1.0899 - val_acc: 0.6914
Epoch 22/50
50000/50000 [==============================] - 34s - loss: 0.2495 - acc: 0.9108 - val_loss: 1.1001 - val_acc: 0.6964
Epoch 23/50
50000/50000 [==============================] - 34s - loss: 0.2342 - acc: 0.9170 - val_loss: 1.1171 - val_acc: 0.7017
Epoch 24/50
50000/50000 [==============================] - 34s - loss: 0.2208 - acc: 0.9220 - val_loss: 1.1430 - val_acc: 0.6965
Epoch 25/50
50000/50000 [==============================] - 34s - loss: 0.2097 - acc: 0.9262 - val_loss: 1.1811 - val_acc: 0.6965
Epoch 26/50
50000/50000 [==============================] - 34s - loss: 0.1950 - acc: 0.9320 - val_loss: 1.1600 - val_acc: 0.7006
Epoch 27/50
50000/50000 [==============================] - 34s - loss: 0.1886 - acc: 0.9337 - val_loss: 1.2166 - val_acc: 0.6974
Epoch 28/50
50000/50000 [==============================] - 34s - loss: 0.1749 - acc: 0.9380 - val_loss: 1.2317 - val_acc: 0.7014
Epoch 29/50
50000/50000 [==============================] - 34s - loss: 0.1692 - acc: 0.9409 - val_loss: 1.2404 - val_acc: 0.6948
Epoch 30/50
50000/50000 [==============================] - 34s - loss: 0.1652 - acc: 0.9427 - val_loss: 1.2466 - val_acc: 0.6998
Epoch 31/50
50000/50000 [==============================] - 34s - loss: 0.1571 - acc: 0.9458 - val_loss: 1.2328 - val_acc: 0.6991
Epoch 32/50
50000/50000 [==============================] - 34s - loss: 0.1461 - acc: 0.9486 - val_loss: 1.2797 - val_acc: 0.6958
Epoch 33/50
50000/50000 [==============================] - 34s - loss: 0.1428 - acc: 0.9498 - val_loss: 1.2880 - val_acc: 0.6939
Epoch 34/50
50000/50000 [==============================] - 34s - loss: 0.1356 - acc: 0.9539 - val_loss: 1.2841 - val_acc: 0.6989
Epoch 35/50
50000/50000 [==============================] - 34s - loss: 0.1327 - acc: 0.9544 - val_loss: 1.2901 - val_acc: 0.6989
Epoch 36/50
50000/50000 [==============================] - 34s - loss: 0.1265 - acc: 0.9569 - val_loss: 1.3254 - val_acc: 0.6971
Epoch 37/50
50000/50000 [==============================] - 34s - loss: 0.1212 - acc: 0.9590 - val_loss: 1.3383 - val_acc: 0.6995
Epoch 38/50
50000/50000 [==============================] - 34s - loss: 0.1177 - acc: 0.9589 - val_loss: 1.3340 - val_acc: 0.7019
Epoch 39/50
50000/50000 [==============================] - 34s - loss: 0.1151 - acc: 0.9600 - val_loss: 1.3548 - val_acc: 0.7027
Epoch 40/50
50000/50000 [==============================] - 34s - loss: 0.1104 - acc: 0.9621 - val_loss: 1.3613 - val_acc: 0.7033
Epoch 41/50
50000/50000 [==============================] - 34s - loss: 0.1088 - acc: 0.9629 - val_loss: 1.3798 - val_acc: 0.6991
Epoch 42/50
50000/50000 [==============================] - 34s - loss: 0.1040 - acc: 0.9653 - val_loss: 1.3835 - val_acc: 0.7005
Epoch 43/50
50000/50000 [==============================] - 34s - loss: 0.1016 - acc: 0.9658 - val_loss: 1.3968 - val_acc: 0.7028
Epoch 44/50
50000/50000 [==============================] - 34s - loss: 0.0954 - acc: 0.9682 - val_loss: 1.3809 - val_acc: 0.7025
Epoch 45/50
50000/50000 [==============================] - 34s - loss: 0.0987 - acc: 0.9662 - val_loss: 1.3955 - val_acc: 0.7025
Epoch 46/50
50000/50000 [==============================] - 34s - loss: 0.0958 - acc: 0.9682 - val_loss: 1.4032 - val_acc: 0.7032
Epoch 47/50
50000/50000 [==============================] - 34s - loss: 0.0896 - acc: 0.9699 - val_loss: 1.4049 - val_acc: 0.7035
Epoch 48/50
50000/50000 [==============================] - 34s - loss: 0.0857 - acc: 0.9713 - val_loss: 1.3944 - val_acc: 0.7009
Epoch 49/50
50000/50000 [==============================] - 34s - loss: 0.0889 - acc: 0.9703 - val_loss: 1.4177 - val_acc: 0.7050
Epoch 50/50
50000/50000 [==============================] - 34s - loss: 0.0870 - acc: 0.9704 - val_loss: 1.4261 - val_acc: 0.7014
Training: accuracy   = 0.999560 loss = 0.005692
Validation: accuracy = 0.701400 loss = 1.426094
Over fitting score   = 0.236352
Under fitting score  = 0.191977
In [22]:
loss, accuracy = model1.evaluate(X_train, Y_train, verbose=0)
print("Training: accuracy = %f  ;  loss = %f" % (accuracy, loss))
Training: accuracy = 0.999560  ;  loss = 0.005692
In [23]:
loss, accuracy = model1.evaluate(X_test, Y_test, verbose=0)
print("Validation: accuracy1 = %f  ;  loss1 = %f" % (accuracy, loss))
Validation: accuracy1 = 0.701400  ;  loss1 = 1.426094

What we see in the last two graphs is a classic example of overfitting phenomenon. While the training accuracy has skyrocketed to 99.96% (wow!!), our validation data comes to the rescue and cools down our enthusiasm: only 70.14%. The almost 30% gap between the training data and validation data is a clear indication of overfitting, and a good reason to abandone model1 and look for a better one. We should also notice the clear big gap between the training loss and validation loss. This is also a clear mark f overfitting that should raise a warning sign.

Inspecting the output

Neverthelss, befor we search for a new model, let's take a quick look on some of the cases that our model1 missed. It may give us hints on the strengths an weaknesses of NN models, and what we can expect from these artificial models.

The predict_classes method is helpful for getting a vector (y_pred) of the predicted classes of model1. We should compare y_pred to the expected true classes y_test in order to get the false cases:

In [30]:
y_pred = model1.predict_classes(X_test)
 9984/10000 [============================>.] - ETA: 0s
In [42]:
true_preds = [(x,y) for (x,y,p) in zip(X_test, y_test, y_pred) if y == p]
false_preds = [(x,y,p) for (x,y,p) in zip(X_test, y_test, y_pred) if y != p]
print("Number of true predictions: ", len(true_preds))
print("Number of false predictions:", len(false_preds))
Number of true predictions:  7014
Number of false predictions: 2986

The array false_preds consists of all triples (x,y,p) where x is an image, y is its true class, and p is the false predicted value of model1.

Lets visualize a sample of 15 items:

In [40]:
for i,(x,y,p) in enumerate(false_preds[0:15]):
    plt.subplot(3, 5, i+1)
    plt.imshow(x, cmap='gnuplot2')
    plt.title("y: %s\np: %s" % (class_name[y], class_name[p]), fontsize=9, loc='left')
    plt.axis('off')
    plt.subplots_adjust(wspace=0.6, hspace=0.2)

Well, we see that model1 confuses between airplanes and sheep, dogs and cats, etc. But we should not underestimate the fact that it is still correct in 70% of the cases, which is highly untrivial! (suppose that as a programmer you were assigned to write a traditional computer program that can guess the class in 70% of the case - think how hard it would be...)

Second Keras Model for the CIFAR-10 dataset

Lets try our small model with the aid of augmented data. The Keras ImageDataGenerator is a great tool for generating more training data from old data, so that we may have enough training and avoid overfitting.

The ImageDataGenerator takes quite a few graphic parameters which we cannot explain in this tutorial. We recommend reading the Keras documentation page and a short tutorial:

  1. https://keras.io/preprocessing/image/#imagedatagenerator
  2. http://machinelearningmastery.com/image-augmentation-deep-learning-keras/

Lets first take a look at a few samples of images that are genereated but ImageDataGenerator:

In [94]:
imdgen = ImageDataGenerator(
    featurewise_center = False,  # set input mean to 0 over the dataset
    samplewise_center = False,  # set each sample mean to 0
    featurewise_std_normalization = False,  # divide inputs by std of the dataset
    samplewise_std_normalization = False,  # divide each input by its std
    zca_whitening = False,  # apply ZCA whitening
    rotation_range = 0,  # randomly rotate images in the range (degrees, 0 to 180)
    width_shift_range = 0.1,  # randomly shift images horizontally (fraction of total width)
    height_shift_range = 0.1,  # randomly shift images vertically (fraction of total height)
    horizontal_flip = True,  # randomly flip images
    vertical_flip = False,  # randomly flip images
)

imdgen.fit(X_train)
it = imdgen.flow(X_train, Y_train, batch_size=15) # This is a Python iterator
images, categories = it.next()
print("Number of images returned by iterator:", len(images))
for i in range(15):
    plt.subplot(3, 5, i+1)
    im = images[i]
    c = np.where(categories[i] == 1)[0][0] # convert one-hot to regular index
    plt.imshow(im, cmap='gnuplot2')
    plt.title(class_name[c], fontsize=9)
    plt.axis('off')
    plt.subplots_adjust(wspace=0.6, hspace=0.2)
Number of images returned by iterator: 15

The images you see are not from the CIFAR-10 collection. They were generated by Keras ImageDataGenerator from images in the CIFAR-10 database by applying various image operators on them. This way we can increase the number of training samples almost indefinitely (in every training epoch we get a completely new set of samples!)

The second important point to note about this iterator is that it does not require any memory or disk space to keep its images (no matter how many of them we want to make)! It generates them in small batches (usually 32 or 128 at a time), and they are discarded after model training. So we can train our model with millions of samples without using a memory more than 100KB (for 32 batch size) or 400KB (for 128 batch size). This is extremely important when our images are in real size (like 2048x3072).

Lets see now the second type of Keras training based on the ImageDataGenerator. Note the new training method name: fit_generator.

Model 2 (with Data Augmentation)

In [14]:
nb_epoch = 100   # This tim lets increase the number of epochs to 100
batch_size = 32

model2 = Sequential()
model2.add(Convolution2D(32, 3, 3, input_shape=(32, 32, 3), border_mode='same', activation='relu', W_constraint=maxnorm(3)))
model2.add(Dropout(0.2))
model2.add(Convolution2D(32, 3, 3, activation='relu', border_mode='same', W_constraint=maxnorm(3)))
model2.add(MaxPooling2D(pool_size=(2, 2)))
model2.add(Flatten())
model2.add(Dense(512, activation='relu', W_constraint=maxnorm(3)))
model2.add(Dropout(0.5))
model2.add(Dense(nb_classes, activation='softmax'))
# Compile model with SGD
lrate = 0.01
decay = lrate/nb_epoch
sgd = SGD(lr=lrate, momentum=0.9, decay=decay, nesterov=False)
model2.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
print(model2.summary())

print('Augmented Data Training.')

imdgen = ImageDataGenerator(
    featurewise_center = False,  # set input mean to 0 over the dataset
    samplewise_center = False,  # set each sample mean to 0
    featurewise_std_normalization = False,  # divide inputs by std of the dataset
    samplewise_std_normalization = False,  # divide each input by its std
    zca_whitening = False,  # apply ZCA whitening
    rotation_range = 0,  # randomly rotate images in the range (degrees, 0 to 180)
    width_shift_range = 0.1,  # randomly shift images horizontally (fraction of total width)
    height_shift_range = 0.1,  # randomly shift images vertically (fraction of total height)
    horizontal_flip = True,  # randomly flip images
    vertical_flip = False,  # randomly flip images
)

# compute quantities required for featurewise normalization
# (std, mean, and principal components if ZCA whitening is applied)
imdgen.fit(X_train)

# fit the model on the batches generated by datagen.flow()
dgen = imdgen.flow(X_train, Y_train, batch_size=batch_size)
fmon = FitMonitor(thresh=0.03, minacc=0.98)  # this is from our kerutils module (see above)
h = model2.fit_generator(
    dgen,
    samples_per_epoch = X_train.shape[0],
    nb_epoch = nb_epoch,
    validation_data = (X_test, Y_test),
    verbose = 0,
    callbacks = [fmon]
)

show_scores(model2, h, X_train, Y_train, X_test, Y_test)
print('Saving model2 to "model2.h5"')
model2.save("model2.h5")
____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
convolution2d_3 (Convolution2D)  (None, 32, 32, 32)    896         convolution2d_input_2[0][0]      
____________________________________________________________________________________________________
dropout_3 (Dropout)              (None, 32, 32, 32)    0           convolution2d_3[0][0]            
____________________________________________________________________________________________________
convolution2d_4 (Convolution2D)  (None, 32, 32, 32)    9248        dropout_3[0][0]                  
____________________________________________________________________________________________________
maxpooling2d_2 (MaxPooling2D)    (None, 16, 16, 32)    0           convolution2d_4[0][0]            
____________________________________________________________________________________________________
flatten_2 (Flatten)              (None, 8192)          0           maxpooling2d_2[0][0]             
____________________________________________________________________________________________________
dense_3 (Dense)                  (None, 512)           4194816     flatten_2[0][0]                  
____________________________________________________________________________________________________
dropout_4 (Dropout)              (None, 512)           0           dense_3[0][0]                    
____________________________________________________________________________________________________
dense_4 (Dense)                  (None, 10)            5130        dropout_4[0][0]                  
====================================================================================================
Total params: 4210090
____________________________________________________________________________________________________
None
Augmented Data Training.
Train begin: 2016-11-26 16:06:54
Stop file: stop_training_file.keras (create this file to stop training gracefully)
Pause file: pause_training_file.keras (create this file to pause training and view graphs)
do_validation = True
metrics = ['loss', 'acc', 'val_loss', 'val_acc']
nb_epoch = 100
nb_sample = 50000
verbose = 0
..... 05% epoch=5 acc=0.594480 loss=1.139003 val_acc=0.652900 val_loss=0.968054
        max_acc=0.594480 max_val_acc=0.652900
..... 10% epoch=10 acc=0.657480 loss=0.969157 val_acc=0.709400 val_loss=0.825718
        max_acc=0.657480 max_val_acc=0.709400
..... 15% epoch=15 acc=0.689060 loss=0.878997 val_acc=0.729600 val_loss=0.773610
        max_acc=0.689060 max_val_acc=0.729600
..... 20% epoch=20 acc=0.710240 loss=0.826102 val_acc=0.741900 val_loss=0.727546
        max_acc=0.710240 max_val_acc=0.743400
..... 25% epoch=25 acc=0.730160 loss=0.777298 val_acc=0.754800 val_loss=0.701633
        max_acc=0.730160 max_val_acc=0.763300
..... 30% epoch=30 acc=0.736260 loss=0.749473 val_acc=0.768900 val_loss=0.665305
        max_acc=0.736260 max_val_acc=0.768900
..... 35% epoch=35 acc=0.745280 loss=0.729686 val_acc=0.764800 val_loss=0.667546
        max_acc=0.745400 max_val_acc=0.773000
..... 40% epoch=40 acc=0.752900 loss=0.706349 val_acc=0.780400 val_loss=0.645095
        max_acc=0.752900 max_val_acc=0.780400
..... 45% epoch=45 acc=0.758080 loss=0.689673 val_acc=0.781100 val_loss=0.638592
        max_acc=0.758080 max_val_acc=0.781200
..... 50% epoch=50 acc=0.763140 loss=0.671631 val_acc=0.781000 val_loss=0.634494
        max_acc=0.763140 max_val_acc=0.784600
..... 55% epoch=55 acc=0.769720 loss=0.662885 val_acc=0.790800 val_loss=0.614884
        max_acc=0.769720 max_val_acc=0.790800
..... 60% epoch=60 acc=0.769640 loss=0.652074 val_acc=0.790100 val_loss=0.609898
        max_acc=0.772760 max_val_acc=0.790800
..... 65% epoch=65 acc=0.774020 loss=0.640879 val_acc=0.787800 val_loss=0.617783
        max_acc=0.774280 max_val_acc=0.790800
..... 70% epoch=70 acc=0.778960 loss=0.624706 val_acc=0.793300 val_loss=0.609352
        max_acc=0.781320 max_val_acc=0.793300
..... 75% epoch=75 acc=0.783760 loss=0.614971 val_acc=0.795500 val_loss=0.608315
        max_acc=0.783760 max_val_acc=0.795500
..... 80% epoch=80 acc=0.786700 loss=0.609307 val_acc=0.794300 val_loss=0.599971
        max_acc=0.786700 max_val_acc=0.795600
..... 85% epoch=85 acc=0.788340 loss=0.602530 val_acc=0.793400 val_loss=0.603756
        max_acc=0.788340 max_val_acc=0.797000
..... 90% epoch=90 acc=0.788760 loss=0.603021 val_acc=0.797100 val_loss=0.599524
        max_acc=0.792060 max_val_acc=0.797100
..... 95% epoch=95 acc=0.792060 loss=0.593095 val_acc=0.798000 val_loss=0.597773
        max_acc=0.792060 max_val_acc=0.799600
.... 99% epoch=99 acc=0.794760 loss=0.584867
Train end: 2016-11-26 17:01:36
Total run time: 3282.28 seconds
max_acc = 0.795400  epoc = 96
max_val_acc = 0.800500  epoc = 97
Training: accuracy   = 0.884280 loss = 0.346051
Validation: accuracy = 0.798400 loss = 0.593385
Over fitting score   = 0.015105
Under fitting score  = 0.025015
Saving model2 to "model2.h5"
In [25]:
loss, accuracy = model2.evaluate(X_train, Y_train, verbose=0)
print("Training: accuracy = %f  ;  loss = %f" % (accuracy, loss))
Training: accuracy = 0.884280  ;  loss = 0.346051
In [24]:
loss, accuracy = model2.evaluate(X_test, Y_test, verbose=0)
print("Validation: accuracy = %f  ;  loss = %f" % (accuracy, loss))
Validation: accuracy = 0.798400  ;  loss = 0.593385

Indeed, using convolutional layers has yielded better validation (almost 10% more than the previous model). Training accuracy has drastically dropped from 99.95% to less than 88.7%, but this is not an indication for an inferior model. On the contrary, the extreme overfitting that we got in model1 was a clear indication of model inadequacy. The overfitting that we see in model2 is not too bad, and it is better fit for practical practices than model1.

Still, 80% is not good enough (it was super exceptional in the 90's :-) and we should strive for more. Looking at the precision and loss graphs, it doesn't look like we are going to get much improvement in model2 by adding more epochs or tuning other parameters, although we encourage the students to try other optimizers and activation functions (Keras has plenty of them) if a fast GPU is available for this. Otherwise it can take a lot of time. It took us about 3 hours to run 120 epochs on a NVIDIA GeForce GTX950, so it could take days to try more epochs and different parameters, unless you have an NVIDIA TITAN or TESLA cards.

In this tutorial we will continue and experiment with our medium and big models that contain more layers and more neurons.

Model 3 (with Data Augmentation)

In [17]:
nb_epoch = 120
batch_size = 32

model3 = Sequential()
model3.add(Convolution2D(32, 3, 3, input_shape=(32, 32, 3), border_mode='same', activation='relu', W_constraint=maxnorm(3)))
model3.add(Dropout(0.2))
model3.add(Convolution2D(32, 3, 3, activation='relu', border_mode='same', W_constraint=maxnorm(3)))
model3.add(MaxPooling2D(pool_size=(2, 2)))
model3.add(Dropout(0.2))

model3.add(Convolution2D(64, 3, 3, border_mode='same'))
model3.add(Activation('relu'))
model3.add(MaxPooling2D(pool_size=(2, 2)))
model3.add(Dropout(0.25))

model3.add(Flatten())
model3.add(Dense(512, activation='relu', W_constraint=maxnorm(3)))
model3.add(Dropout(0.5))
model3.add(Dense(nb_classes, activation='softmax'))

# Compile model with SGD (Stochastic Gradient Descent)
lrate = 0.01
decay = lrate/nb_epoch
sgd = SGD(lr=lrate, momentum=0.9, decay=decay, nesterov=False)
model3.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
print(model3.summary())

print('Augmented Data Training.')

imdgen = ImageDataGenerator(
    featurewise_center = False,  # set input mean to 0 over the dataset
    samplewise_center = False,  # set each sample mean to 0
    featurewise_std_normalization = False,  # divide inputs by std of the dataset
    samplewise_std_normalization = False,  # divide each input by its std
    zca_whitening = False,  # apply ZCA whitening
    rotation_range = 0,  # randomly rotate images in the range (degrees, 0 to 180)
    width_shift_range = 0.1,  # randomly shift images horizontally (fraction of total width)
    height_shift_range = 0.1,  # randomly shift images vertically (fraction of total height)
    horizontal_flip = True,  # randomly flip images
    vertical_flip = False,  # randomly flip images
)

# compute quantities required for featurewise normalization
# (std, mean, and principal components if ZCA whitening is applied)
imdgen.fit(X_train)

# fit the model on the batches generated by datagen.flow()
dgen = imdgen.flow(X_train, Y_train, batch_size=batch_size)
fmon = FitMonitor(thresh=0.03, minacc=0.99)  # this is from our kerutils module (see above)
h = model3.fit_generator(
    dgen,
    samples_per_epoch = X_train.shape[0],
    nb_epoch = nb_epoch,
    validation_data = (X_test, Y_test),
    verbose = 0,
    callbacks = [fmon]
)

show_scores(model3, h, X_train, Y_train, X_test, Y_test)
print('Saving model3 to "model3.h5"')
model3.save("model3.h5")
print('Saving history dict to pickle file: hist3.p')
with open('hist3.p', 'wb') as f:
    pickle.dump(h.history, f)
____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
convolution2d_5 (Convolution2D)  (None, 32, 32, 32)    896         convolution2d_input_3[0][0]      
____________________________________________________________________________________________________
dropout_5 (Dropout)              (None, 32, 32, 32)    0           convolution2d_5[0][0]            
____________________________________________________________________________________________________
convolution2d_6 (Convolution2D)  (None, 32, 32, 32)    9248        dropout_5[0][0]                  
____________________________________________________________________________________________________
maxpooling2d_3 (MaxPooling2D)    (None, 16, 16, 32)    0           convolution2d_6[0][0]            
____________________________________________________________________________________________________
dropout_6 (Dropout)              (None, 16, 16, 32)    0           maxpooling2d_3[0][0]             
____________________________________________________________________________________________________
convolution2d_7 (Convolution2D)  (None, 16, 16, 64)    18496       dropout_6[0][0]                  
____________________________________________________________________________________________________
activation_1 (Activation)        (None, 16, 16, 64)    0           convolution2d_7[0][0]            
____________________________________________________________________________________________________
maxpooling2d_4 (MaxPooling2D)    (None, 8, 8, 64)      0           activation_1[0][0]               
____________________________________________________________________________________________________
dropout_7 (Dropout)              (None, 8, 8, 64)      0           maxpooling2d_4[0][0]             
____________________________________________________________________________________________________
flatten_3 (Flatten)              (None, 4096)          0           dropout_7[0][0]                  
____________________________________________________________________________________________________
dense_5 (Dense)                  (None, 512)           2097664     flatten_3[0][0]                  
____________________________________________________________________________________________________
dropout_8 (Dropout)              (None, 512)           0           dense_5[0][0]                    
____________________________________________________________________________________________________
dense_6 (Dense)                  (None, 10)            5130        dropout_8[0][0]                  
====================================================================================================
Total params: 2131434
____________________________________________________________________________________________________
None
Augmented Data Training.
Train begin: 2016-11-26 17:26:39
Stop file: stop_training_file.keras (create this file to stop training gracefully)
Pause file: pause_training_file.keras (create this file to pause training and view graphs)
do_validation = True
metrics = ['loss', 'acc', 'val_loss', 'val_acc']
nb_epoch = 120
nb_sample = 50000
verbose = 0
..... 05% epoch=6 acc=0.580440 loss=1.172381 val_acc=0.648000 val_loss=1.006105
        max_acc=0.580440 max_val_acc=0.648000
..... 10% epoch=12 acc=0.665980 loss=0.952033 val_acc=0.734700 val_loss=0.764830
        max_acc=0.665980 max_val_acc=0.734700
..... 15% epoch=18 acc=0.702740 loss=0.852635 val_acc=0.756900 val_loss=0.692230
        max_acc=0.702740 max_val_acc=0.757300
..... 20% epoch=24 acc=0.723760 loss=0.791566 val_acc=0.784800 val_loss=0.630691
        max_acc=0.723760 max_val_acc=0.784800
..... 25% epoch=30 acc=0.737940 loss=0.751319 val_acc=0.787400 val_loss=0.604651
        max_acc=0.738360 max_val_acc=0.790400
..... 30% epoch=36 acc=0.749240 loss=0.715604 val_acc=0.799200 val_loss=0.577126
        max_acc=0.749240 max_val_acc=0.799200
..... 35% epoch=42 acc=0.758320 loss=0.685261 val_acc=0.808400 val_loss=0.556841
        max_acc=0.758320 max_val_acc=0.808400
..... 40% epoch=48 acc=0.764200 loss=0.668627 val_acc=0.807400 val_loss=0.553829
        max_acc=0.767580 max_val_acc=0.812800
..... 45% epoch=54 acc=0.770420 loss=0.651092 val_acc=0.810600 val_loss=0.540425
        max_acc=0.770420 max_val_acc=0.813700
..... 50% epoch=60 acc=0.776600 loss=0.635974 val_acc=0.819000 val_loss=0.527376
        max_acc=0.776600 max_val_acc=0.819000
..... 55% epoch=66 acc=0.780860 loss=0.625670 val_acc=0.818600 val_loss=0.527241
        max_acc=0.782140 max_val_acc=0.822100
..... 60% epoch=72 acc=0.784900 loss=0.613893 val_acc=0.820100 val_loss=0.521272
        max_acc=0.784900 max_val_acc=0.822100
..... 65% epoch=78 acc=0.789640 loss=0.599307 val_acc=0.826000 val_loss=0.502816
        max_acc=0.789640 max_val_acc=0.826000
..... 70% epoch=84 acc=0.789080 loss=0.594389 val_acc=0.826600 val_loss=0.505549
        max_acc=0.791200 max_val_acc=0.827900
..... 75% epoch=90 acc=0.794560 loss=0.586308 val_acc=0.826400 val_loss=0.500715
        max_acc=0.796020 max_val_acc=0.830500
..... 80% epoch=96 acc=0.797140 loss=0.575448 val_acc=0.829700 val_loss=0.495649
        max_acc=0.797140 max_val_acc=0.830800
..... 85% epoch=102 acc=0.798160 loss=0.571394 val_acc=0.832800 val_loss=0.486568
        max_acc=0.799200 max_val_acc=0.833300
..... 90% epoch=108 acc=0.803020 loss=0.556634 val_acc=0.830300 val_loss=0.489828
        max_acc=0.803020 max_val_acc=0.833300
..... 95% epoch=114 acc=0.804200 loss=0.554599 val_acc=0.830200 val_loss=0.490865
        max_acc=0.805200 max_val_acc=0.835300
.... 99% epoch=119 acc=0.805920 loss=0.556924
Train end: 2016-11-26 18:36:51
Total run time: 4211.31 seconds
max_acc = 0.806640  epoc = 118
max_val_acc = 0.837100  epoc = 119
Training: accuracy   = 0.905400 loss = 0.297184
Validation: accuracy = 0.837100 loss = 0.476085
Over fitting score   = 0.036405
Under fitting score  = 0.044304
Saving model3 to "model3.h5"
In [26]:
loss, accuracy = model3.evaluate(X_train, Y_train, verbose=0)
print("Training: accuracy = %f  ;  loss = %f" % (accuracy, loss))
Training: accuracy = 0.905400  ;  loss = 0.297184
In [27]:
loss, accuracy = model3.evaluate(X_test, Y_test, verbose=0)
print("Validation: accuracy = %f  ;  loss = %f" % (accuracy, loss))
Validation: accuracy = 0.837100  ;  loss = 0.476085

Model3 size: 2.13M. Training accuracy raised to 90%, and validation accuracy 83.7%. There's still a 7% overfitting gap, and 83.7% validation is still not high enough. Lets try harder.

Model 4 (with Data Augmentation)

In [12]:
nb_epoch = 400
batch_size = 32

model4 = Sequential()
model4.add(Convolution2D(32, 3, 3, input_shape=(32, 32, 3), border_mode='same', activation='relu'))
model4.add(Dropout(0.2))
model4.add(Convolution2D(32, 3, 3, activation='relu', border_mode='same'))
model4.add(MaxPooling2D(pool_size=(2, 2)))
model4.add(Dropout(0.2))
           
model4.add(Convolution2D(64, 3, 3, border_mode='same'))
model4.add(Activation('relu'))
model4.add(Convolution2D(64, 3, 3))
model4.add(Activation('relu'))
model4.add(MaxPooling2D(pool_size=(2, 2)))
model4.add(Dropout(0.25))

model4.add(Flatten())
model4.add(Dense(512, activation='relu', W_constraint=maxnorm(3)))
model4.add(Dropout(0.25))
model4.add(Dense(nb_classes, activation='softmax'))


# Compile model with SGD (Stochastic Gradient Descent)
lrate = 0.01
decay = lrate/nb_epoch
sgd = SGD(lr=lrate, momentum=0.9, decay=decay, nesterov=True)
model4.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
print(model4.summary())

print('Augmented Data Training.')

imdgen = ImageDataGenerator(
    featurewise_center = False,  # set input mean to 0 over the dataset
    samplewise_center = False,  # set each sample mean to 0
    featurewise_std_normalization = False,  # divide inputs by std of the dataset
    samplewise_std_normalization = False,  # divide each input by its std
    zca_whitening = False,  # apply ZCA whitening
    rotation_range = 0,  # randomly rotate images in the range (degrees, 0 to 180)
    width_shift_range = 0.1,  # randomly shift images horizontally (fraction of total width)
    height_shift_range = 0.1,  # randomly shift images vertically (fraction of total height)
    horizontal_flip = True,  # randomly flip images
    vertical_flip = False,  # randomly flip images
)

# compute quantities required for featurewise normalization
# (std, mean, and principal components if ZCA whitening is applied)
imdgen.fit(X_train)

# fit the model on the batches generated by datagen.flow()
dgen = imdgen.flow(X_train, Y_train, batch_size=batch_size)
fmon = FitMonitor(thresh=0.03, minacc=0.99)  # this is from our kerutils module (see above)
h = model4.fit_generator(
    dgen,
    samples_per_epoch = X_train.shape[0],
    nb_epoch = nb_epoch,
    validation_data = (X_test, Y_test),
    verbose = 0,
    callbacks = [fmon]
)

show_scores(model4, h, X_train, Y_train, X_test, Y_test)
print('Saving model4 to "model4.h5"')
model4.save("model4.h5")
print('Saving history dict to pickle file: hist4.p')
with open('hist4.p', 'wb') as f:
    pickle.dump(h.history, f)
____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
convolution2d_5 (Convolution2D)  (None, 32, 32, 32)    896         convolution2d_input_2[0][0]      
____________________________________________________________________________________________________
dropout_6 (Dropout)              (None, 32, 32, 32)    0           convolution2d_5[0][0]            
____________________________________________________________________________________________________
convolution2d_6 (Convolution2D)  (None, 32, 32, 32)    9248        dropout_6[0][0]                  
____________________________________________________________________________________________________
maxpooling2d_3 (MaxPooling2D)    (None, 16, 16, 32)    0           convolution2d_6[0][0]            
____________________________________________________________________________________________________
dropout_7 (Dropout)              (None, 16, 16, 32)    0           maxpooling2d_3[0][0]             
____________________________________________________________________________________________________
convolution2d_7 (Convolution2D)  (None, 16, 16, 64)    18496       dropout_7[0][0]                  
____________________________________________________________________________________________________
activation_3 (Activation)        (None, 16, 16, 64)    0           convolution2d_7[0][0]            
____________________________________________________________________________________________________
convolution2d_8 (Convolution2D)  (None, 14, 14, 64)    36928       activation_3[0][0]               
____________________________________________________________________________________________________
activation_4 (Activation)        (None, 14, 14, 64)    0           convolution2d_8[0][0]            
____________________________________________________________________________________________________
maxpooling2d_4 (MaxPooling2D)    (None, 7, 7, 64)      0           activation_4[0][0]               
____________________________________________________________________________________________________
dropout_8 (Dropout)              (None, 7, 7, 64)      0           maxpooling2d_4[0][0]             
____________________________________________________________________________________________________
flatten_2 (Flatten)              (None, 3136)          0           dropout_8[0][0]                  
____________________________________________________________________________________________________
dense_4 (Dense)                  (None, 512)           1606144     flatten_2[0][0]                  
____________________________________________________________________________________________________
dropout_9 (Dropout)              (None, 512)           0           dense_4[0][0]                    
____________________________________________________________________________________________________
dense_5 (Dense)                  (None, 10)            5130        dropout_9[0][0]                  
====================================================================================================
Total params: 1676842
____________________________________________________________________________________________________
None
Augmented Data Training.
Train begin: 2016-11-27 14:06:46
Stop file: stop_training_file.keras (create this file to stop training gracefully)
Pause file: pause_training_file.keras (create this file to pause training and view graphs)
do_validation = True
metrics = ['loss', 'acc', 'val_loss', 'val_acc']
nb_epoch = 400
nb_sample = 50000
verbose = 0
..... 05% epoch=20 acc=0.770460 loss=0.655337 val_acc=0.806600 val_loss=0.561261
        max_acc=0.770460 max_val_acc=0.807000
..... 10% epoch=40 acc=0.819600 loss=0.513672 val_acc=0.840100 val_loss=0.472556
        max_acc=0.819600 max_val_acc=0.840100
..... 15% epoch=60 acc=0.849180 loss=0.431731 val_acc=0.851500 val_loss=0.441469
        max_acc=0.849180 max_val_acc=0.857800
..... 20% epoch=80 acc=0.868120 loss=0.377544 val_acc=0.856200 val_loss=0.427496
        max_acc=0.868120 max_val_acc=0.861300
..... 25% epoch=100 acc=0.878660 loss=0.339156 val_acc=0.866900 val_loss=0.407274
        max_acc=0.878860 max_val_acc=0.871000
..... 30% epoch=120 acc=0.889680 loss=0.312401 val_acc=0.868700 val_loss=0.404493
        max_acc=0.889680 max_val_acc=0.872900
..... 35% epoch=140 acc=0.897060 loss=0.291688 val_acc=0.872900 val_loss=0.394843
        max_acc=0.897520 max_val_acc=0.873800
..... 40% epoch=160 acc=0.902080 loss=0.273710 val_acc=0.870600 val_loss=0.403077
        max_acc=0.904440 max_val_acc=0.877500
..... 45% epoch=180 acc=0.907460 loss=0.257895 val_acc=0.878300 val_loss=0.391379
        max_acc=0.910740 max_val_acc=0.878300
..... 50% epoch=200 acc=0.912240 loss=0.245903 val_acc=0.875800 val_loss=0.393326
        max_acc=0.914100 max_val_acc=0.878300
..... 55% epoch=220 acc=0.916540 loss=0.235293 val_acc=0.873400 val_loss=0.398259
        max_acc=0.917620 max_val_acc=0.879000
..... 60% epoch=240 acc=0.920960 loss=0.223276 val_acc=0.876100 val_loss=0.393793
        max_acc=0.920960 max_val_acc=0.881700
..... 65% epoch=260 acc=0.922860 loss=0.219113 val_acc=0.879500 val_loss=0.396560
        max_acc=0.923860 max_val_acc=0.881700
..... 70% epoch=280 acc=0.924940 loss=0.211239 val_acc=0.879700 val_loss=0.396718
        max_acc=0.927340 max_val_acc=0.881700
..... 75% epoch=300 acc=0.929300 loss=0.199700 val_acc=0.880300 val_loss=0.390777
        max_acc=0.929300 max_val_acc=0.882200
..... 80% epoch=320 acc=0.930240 loss=0.197598 val_acc=0.878100 val_loss=0.397210
        max_acc=0.931280 max_val_acc=0.882200
..... 85% epoch=340 acc=0.933120 loss=0.189463 val_acc=0.879200 val_loss=0.401805
        max_acc=0.933120 max_val_acc=0.882200
..... 90% epoch=360 acc=0.931200 loss=0.192346 val_acc=0.881700 val_loss=0.398597
        max_acc=0.934480 max_val_acc=0.883000
..... 95% epoch=380 acc=0.935320 loss=0.182817 val_acc=0.878100 val_loss=0.406267
        max_acc=0.935740 max_val_acc=0.883300
.... 99% epoch=399 acc=0.934620 loss=0.181784
Train end: 2016-11-27 18:33:36
Total run time: 16010.16 seconds
max_acc = 0.938480  epoc = 395
max_val_acc = 0.883300  epoc = 365
Training: accuracy   = 0.997720 loss = 0.023093
Validation: accuracy = 0.879700 loss = 0.408845
Over fitting score   = 0.042562
Under fitting score  = 0.036000
Saving model4 to "model4.h5"
In [13]:
loss, accuracy = model4.evaluate(X_train, Y_train, verbose=0)
print("Training: accuracy = %f  ;  loss = %f" % (accuracy, loss))
Training: accuracy = 0.997720  ;  loss = 0.023093
In [14]:
loss, accuracy = model4.evaluate(X_test, Y_test, verbose=0)
print("Validation: accuracy = %f  ;  loss = %f" % (accuracy, loss))
Validation: accuracy = 0.879700  ;  loss = 0.408845

Looks like we did it again: validation accuracy has risen to 88% but training accuracy is too high (99.77%) which is an indication of over fitting, although not on the same scale as in model1 (30% gap).

We will take advantage of this situation to demonstrate how you can continue to train a model from the last training point (it took 2.15 hours to reach the last model state).

To be safe we'll just copy model4 to model5 by simply loading the save file of model4 to a new model. Then we will train model5 exactly as above. We will give it extra 100 epochs and see where it will get us.

Model 5

Let's first copy model4 to model5:

In [40]:
model5 = load_model("model4.h5")

Just for assurance, let's compute the training accuracy of model5 and see if we get the same scores as for model4

In [41]:
loss, accuracy = model5.evaluate(X_train, Y_train, verbose=0)
print("Training: accuracy = %f  ;  loss = %f" % (accuracy, loss))
Training: accuracy = 0.997720  ;  loss = 0.023093
In [42]:
loss, accuracy = model5.evaluate(X_test, Y_test, verbose=0)
print("Validation: accuracy = %f  ;  loss = %f" % (accuracy, loss))
Validation: accuracy = 0.879700  ;  loss = 0.408845

Looks good. Lets tray to train model5 with additional samples generated by Keras ImageDataGenerator. This time we'll add a small rotation angle of 5 degrees, and set the shift ranges to 0.05. We will also try a larger batch_size of 64 samples.

In [35]:
lrate = 0.01
decay = 1e-6
sgd = SGD(lr=lrate, momentum=0.9, decay=decay, nesterov=False)
model5.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
In [36]:
loss, accuracy = model5.evaluate(X_test, Y_test, verbose=0)
print("Validation: accuracy = %f  ;  loss = %f" % (accuracy, loss))
Validation: accuracy = 0.879700  ;  loss = 0.408845
In [37]:
nb_epoch = 100
batch_size = 24

print('Post Training')

imdgen = ImageDataGenerator(
    featurewise_center = False,  # set input mean to 0 over the dataset
    samplewise_center = False,  # set each sample mean to 0
    featurewise_std_normalization = False,  # divide inputs by std of the dataset
    samplewise_std_normalization = False,  # divide each input by its std
    zca_whitening = False,  # apply ZCA whitening
    rotation_range = 7,  # randomly rotate images in the range (degrees, 0 to 180)
    width_shift_range = 0.05,  # randomly shift images horizontally (fraction of total width)
    height_shift_range = 0.05,  # randomly shift images vertically (fraction of total height)
    horizontal_flip = True,  # randomly flip images
    vertical_flip = False,  # randomly flip images
)

# compute quantities required for featurewise normalization
# (std, mean, and principal components if ZCA whitening is applied)
imdgen.fit(X_train)

# fit the model on the batches generated by datagen.flow()
dgen = imdgen.flow(X_train, Y_train, batch_size=batch_size)
fmon = FitMonitor(thresh=0.03, minacc=0.99)  # this is from our kerutils module (see above)
h = model5.fit_generator(
    dgen,
    samples_per_epoch = X_train.shape[0],
    nb_epoch = nb_epoch,
    validation_data = (X_test, Y_test),
    verbose = 0,
    callbacks = [fmon]
)

show_scores(model5, h, X_train, Y_train, X_test, Y_test)
print('Saving model5 to "model5.h5"')
Post Training
Train begin: 2016-11-27 23:26:26
Stop file: stop_training_file.keras (create this file to stop training gracefully)
Pause file: pause_training_file.keras (create this file to pause training and view graphs)
do_validation = True
metrics = ['loss', 'acc', 'val_loss', 'val_acc']
nb_epoch = 100
nb_sample = 50000
verbose = 0
..... 05% epoch=5 acc=0.718560 loss=0.828089 val_acc=0.747900 val_loss=0.761909
        max_acc=0.729560 max_val_acc=0.785100
..... 10% epoch=10 acc=0.715260 loss=0.836823 val_acc=0.776200 val_loss=0.673832
        max_acc=0.729560 max_val_acc=0.785100
..... 15% epoch=15 acc=0.717440 loss=0.836386 val_acc=0.759900 val_loss=0.713908
        max_acc=0.729560 max_val_acc=0.785100
..... 20% epoch=20 acc=0.722380 loss=0.826159 val_acc=0.758900 val_loss=0.712595
        max_acc=0.729560 max_val_acc=0.785100
.Train end: 2016-11-27 23:42:02
Total run time: 935.89 seconds
max_acc = 0.729560  epoc = 2
max_val_acc = 0.785100  epoc = 4
Training: accuracy   = 0.814780 loss = 0.548113
Validation: accuracy = 0.762200 loss = 0.720553
Over fitting score   = 0.038969
Under fitting score  = 0.038331
Saving model5 to "model5.h5"
In [38]:
loss, accuracy = model5.evaluate(X_train, Y_train, verbose=0)
print("Training: accuracy = %f  ;  loss = %f" % (accuracy, loss))
Training: accuracy = 0.814780  ;  loss = 0.548113
In [39]:
loss, accuracy = model5.evaluate(X_test, Y_test, verbose=0)
print("Validation: accuracy = %f  ;  loss = %f" % (accuracy, loss))
Validation: accuracy = 0.762200  ;  loss = 0.720553

We lost our patience after 70 epochs. The graph suggests that if we wait long enough we may cross the 90% barrier. Let's try instead another type of activation function SReLU (S-shaped Rectified Linear Unit).

Model 6

SReLU is an intriguing activation function as it has four learnable parameters that are tuned during the training process. Here is the main paper on this function: https://arxiv.org/pdf/1512.07030.pdf

In [8]:
from keras.layers.noise import GaussianNoise
from keras.layers.advanced_activations import SReLU

batch_size = 64
nb_epoch = 400

nb_filters = 32
# size of pooling area for max pooling
nb_pool = 2
# convolution kernel size
nb_conv = 3

model6 = Sequential()

# noise input
percent_noise = 0.1
noise = (1.0/255) * percent_noise
model6.add(GaussianNoise(noise, input_shape=(32,32,3)))

model6.add(Convolution2D(2*nb_filters, nb_conv, nb_conv))
model6.add(SReLU())

model6.add(Convolution2D(nb_filters, nb_conv, nb_conv))
model6.add(SReLU())
model6.add(Convolution2D(2*nb_filters, nb_conv, nb_conv))
model6.add(SReLU())

model6.add(Convolution2D(nb_filters, nb_conv, nb_conv))
model6.add(SReLU())

model6.add(MaxPooling2D(pool_size=(nb_pool, nb_pool)))
model6.add(Dropout(0.25))

model6.add(Flatten())

#99.51 0.0, 0.0
model6.add(Dense(512))
model6.add(SReLU())
model6.add(Dropout(0.25))

model6.add(Dense(512))
model6.add(SReLU())
model6.add(Dropout(0.25))

model6.add(Dense(nb_classes))
model6.add(Activation('softmax'))

# Compile model with SGD (Stochastic Gradient Descent)
lrate = 0.01
decay = lrate/nb_epoch
sgd = SGD(lr=lrate, momentum=0.9, decay=decay, nesterov=False)
model6.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
#model6.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

imdgen = ImageDataGenerator(
    featurewise_center = False,  # set input mean to 0 over the dataset
    samplewise_center = False,  # set each sample mean to 0
    featurewise_std_normalization = False,  # divide inputs by std of the dataset
    samplewise_std_normalization = False,  # divide each input by its std
    zca_whitening = False,  # apply ZCA whitening
    rotation_range = 4,  # randomly rotate images in the range (degrees, 0 to 180)
    width_shift_range = 0.1,  # randomly shift images horizontally (fraction of total width)
    height_shift_range = 0.1,  # randomly shift images vertically (fraction of total height)
    horizontal_flip = True,  # randomly flip images
    vertical_flip = False,  # randomly flip images
)

# compute quantities required for featurewise normalization
# (std, mean, and principal components if ZCA whitening is applied)
imdgen.fit(X_train)

# fit the model on the batches generated by datagen.flow()
dgen = imdgen.flow(X_train, Y_train, batch_size=batch_size)
fmon = FitMonitor(thresh=0.03, minacc=0.99)  # this is from our kerutils module (see above)
h = model6.fit_generator(
    dgen,
    samples_per_epoch = X_train.shape[0],
    nb_epoch = nb_epoch,
    validation_data = (X_test, Y_test),
    verbose = 1,
    callbacks = [fmon]
)

show_scores(model6, h, X_train, Y_train, X_test, Y_test)
print('Saving model6 to "model6.h5"')
model6.save("model6.h5")
plot(model6, to_file='model6.png', show_layer_names=False, show_shapes=True)
print('Saving history dict to pickle file: hist6.p')
with open('hist6.p', 'wb') as f:
    pickle.dump(h.history, f)
Train begin: 2016-11-29 13:37:02
Stop file: stop_training_file.keras (create this file to stop training gracefully)
Pause file: pause_training_file.keras (create this file to pause training and view graphs)
do_validation = True
metrics = ['loss', 'acc', 'val_loss', 'val_acc']
nb_epoch = 400
nb_sample = 50000
verbose = 1
Epoch 1/400
50000/50000 [==============================] - 84s - loss: 2.1470 - acc: 0.1866 - val_loss: 1.9589 - val_acc: 0.2875
Epoch 2/400
50000/50000 [==============================] - 85s - loss: 1.8957 - acc: 0.3005 - val_loss: 1.7025 - val_acc: 0.3852
Epoch 3/400
50000/50000 [==============================] - 85s - loss: 1.6713 - acc: 0.3827 - val_loss: 1.4963 - val_acc: 0.4586
Epoch 4/400
50000/50000 [==============================] - 85s - loss: 1.5140 - acc: 0.4431 - val_loss: 1.3191 - val_acc: 0.5211
Epoch 5/400
50000/50000 [==============================] - 85s - loss: 1.4108 - acc: 0.4886 - val_loss: 1.2582 - val_acc: 0.5473
Epoch 6/400
50000/50000 [==============================] - 85s - loss: 1.3383 - acc: 0.5156 - val_loss: 1.1617 - val_acc: 0.5800
Epoch 7/400
50000/50000 [==============================] - 85s - loss: 1.2665 - acc: 0.5455 - val_loss: 1.1381 - val_acc: 0.5887
Epoch 8/400
50000/50000 [==============================] - 85s - loss: 1.2026 - acc: 0.5679 - val_loss: 1.0291 - val_acc: 0.6340
Epoch 9/400
50000/50000 [==============================] - 85s - loss: 1.1433 - acc: 0.5913 - val_loss: 1.0085 - val_acc: 0.6396
Epoch 10/400
50000/50000 [==============================] - 85s - loss: 1.0944 - acc: 0.6086 - val_loss: 0.9696 - val_acc: 0.6527
Epoch 11/400
50000/50000 [==============================] - 85s - loss: 1.0374 - acc: 0.6298 - val_loss: 0.8882 - val_acc: 0.6854
Epoch 12/400
50000/50000 [==============================] - 85s - loss: 0.9989 - acc: 0.6434 - val_loss: 0.8554 - val_acc: 0.6961
Epoch 13/400
50000/50000 [==============================] - 85s - loss: 0.9554 - acc: 0.6606 - val_loss: 0.8345 - val_acc: 0.7043
Epoch 14/400
50000/50000 [==============================] - 85s - loss: 0.9238 - acc: 0.6722 - val_loss: 0.7806 - val_acc: 0.7237
Epoch 15/400
50000/50000 [==============================] - 85s - loss: 0.8940 - acc: 0.6848 - val_loss: 0.7889 - val_acc: 0.7237
Epoch 16/400
50000/50000 [==============================] - 85s - loss: 0.8684 - acc: 0.6925 - val_loss: 0.7406 - val_acc: 0.7433
Epoch 17/400
50000/50000 [==============================] - 85s - loss: 0.8376 - acc: 0.7022 - val_loss: 0.7343 - val_acc: 0.7447
Epoch 18/400
50000/50000 [==============================] - 85s - loss: 0.8228 - acc: 0.7075 - val_loss: 0.6956 - val_acc: 0.7581
Epoch 19/400
50000/50000 [==============================] - 85s - loss: 0.8018 - acc: 0.7151 - val_loss: 0.6861 - val_acc: 0.7594
Epoch 20/400
50000/50000 [==============================] - 85s - loss: 0.7792 - acc: 0.7255 - val_loss: 0.6581 - val_acc: 0.7687
Epoch 21/400
49984/50000 [============================>.] - ETA: 0s - loss: 0.7668 - acc: 0.7289. 05% epoch=20 acc=0.728920 loss=0.766802 val_acc=0.763300 val_loss=0.668365
        max_acc=0.728920 max_val_acc=0.768700
50000/50000 [==============================] - 85s - loss: 0.7668 - acc: 0.7289 - val_loss: 0.6684 - val_acc: 0.7633
Epoch 22/400
50000/50000 [==============================] - 85s - loss: 0.7441 - acc: 0.7369 - val_loss: 0.6543 - val_acc: 0.7704
Epoch 23/400
50000/50000 [==============================] - 85s - loss: 0.7344 - acc: 0.7386 - val_loss: 0.6502 - val_acc: 0.7724
Epoch 24/400
50000/50000 [==============================] - 85s - loss: 0.7203 - acc: 0.7462 - val_loss: 0.6313 - val_acc: 0.7726
Epoch 25/400
50000/50000 [==============================] - 85s - loss: 0.7070 - acc: 0.7510 - val_loss: 0.5965 - val_acc: 0.7873
Epoch 26/400
50000/50000 [==============================] - 85s - loss: 0.6933 - acc: 0.7534 - val_loss: 0.6092 - val_acc: 0.7863
Epoch 27/400
50000/50000 [==============================] - 85s - loss: 0.6875 - acc: 0.7567 - val_loss: 0.6049 - val_acc: 0.7851
Epoch 28/400
50000/50000 [==============================] - 85s - loss: 0.6749 - acc: 0.7599 - val_loss: 0.6245 - val_acc: 0.7819
Epoch 29/400
50000/50000 [==============================] - 85s - loss: 0.6531 - acc: 0.7715 - val_loss: 0.5908 - val_acc: 0.7934
Epoch 30/400
50000/50000 [==============================] - 85s - loss: 0.6530 - acc: 0.7691 - val_loss: 0.5703 - val_acc: 0.7978
Epoch 31/400
50000/50000 [==============================] - 85s - loss: 0.6441 - acc: 0.7710 - val_loss: 0.5635 - val_acc: 0.8024
Epoch 32/400
50000/50000 [==============================] - 85s - loss: 0.6298 - acc: 0.7767 - val_loss: 0.5482 - val_acc: 0.8068
Epoch 33/400
50000/50000 [==============================] - 85s - loss: 0.6200 - acc: 0.7807 - val_loss: 0.5506 - val_acc: 0.8045
Epoch 34/400
50000/50000 [==============================] - 85s - loss: 0.6183 - acc: 0.7817 - val_loss: 0.5386 - val_acc: 0.8121
Epoch 35/400
50000/50000 [==============================] - 85s - loss: 0.6040 - acc: 0.7866 - val_loss: 0.5578 - val_acc: 0.8044
Epoch 36/400
50000/50000 [==============================] - 85s - loss: 0.6002 - acc: 0.7874 - val_loss: 0.5481 - val_acc: 0.8082
Epoch 37/400
50000/50000 [==============================] - 85s - loss: 0.5928 - acc: 0.7886 - val_loss: 0.5445 - val_acc: 0.8085
Epoch 38/400
50000/50000 [==============================] - 85s - loss: 0.5831 - acc: 0.7943 - val_loss: 0.5266 - val_acc: 0.8169
Epoch 39/400
50000/50000 [==============================] - 85s - loss: 0.5743 - acc: 0.7944 - val_loss: 0.5358 - val_acc: 0.8095
Epoch 40/400
50000/50000 [==============================] - 85s - loss: 0.5691 - acc: 0.7986 - val_loss: 0.5268 - val_acc: 0.8131
Epoch 41/400
49984/50000 [============================>.] - ETA: 0s - loss: 0.5616 - acc: 0.7995. 10% epoch=40 acc=0.799480 loss=0.561769 val_acc=0.818400 val_loss=0.517203
        max_acc=0.799480 max_val_acc=0.818400
50000/50000 [==============================] - 85s - loss: 0.5618 - acc: 0.7995 - val_loss: 0.5172 - val_acc: 0.8184
Epoch 42/400
50000/50000 [==============================] - 85s - loss: 0.5567 - acc: 0.8036 - val_loss: 0.5143 - val_acc: 0.8203
Epoch 43/400
50000/50000 [==============================] - 85s - loss: 0.5525 - acc: 0.8025 - val_loss: 0.5107 - val_acc: 0.8213
Epoch 44/400
50000/50000 [==============================] - 85s - loss: 0.5474 - acc: 0.8068 - val_loss: 0.5223 - val_acc: 0.8192
Epoch 45/400
50000/50000 [==============================] - 85s - loss: 0.5390 - acc: 0.8106 - val_loss: 0.5170 - val_acc: 0.8205
Epoch 46/400
50000/50000 [==============================] - 85s - loss: 0.5330 - acc: 0.8122 - val_loss: 0.5102 - val_acc: 0.8224
Epoch 47/400
50000/50000 [==============================] - 85s - loss: 0.5276 - acc: 0.8139 - val_loss: 0.5146 - val_acc: 0.8229
Epoch 48/400
50000/50000 [==============================] - 85s - loss: 0.5234 - acc: 0.8150 - val_loss: 0.5241 - val_acc: 0.8179
Epoch 49/400
50000/50000 [==============================] - 85s - loss: 0.5167 - acc: 0.8168 - val_loss: 0.4943 - val_acc: 0.8272
Epoch 50/400
50000/50000 [==============================] - 85s - loss: 0.5130 - acc: 0.8167 - val_loss: 0.4845 - val_acc: 0.8310
Epoch 51/400
50000/50000 [==============================] - 85s - loss: 0.5091 - acc: 0.8185 - val_loss: 0.4972 - val_acc: 0.8304
Epoch 52/400
50000/50000 [==============================] - 85s - loss: 0.5011 - acc: 0.8236 - val_loss: 0.5138 - val_acc: 0.8225
Epoch 53/400
50000/50000 [==============================] - 85s - loss: 0.4998 - acc: 0.8238 - val_loss: 0.4883 - val_acc: 0.8303
Epoch 54/400
50000/50000 [==============================] - 85s - loss: 0.4965 - acc: 0.8235 - val_loss: 0.4864 - val_acc: 0.8301
Epoch 55/400
50000/50000 [==============================] - 85s - loss: 0.4872 - acc: 0.8264 - val_loss: 0.4875 - val_acc: 0.8283
Epoch 56/400
50000/50000 [==============================] - 85s - loss: 0.4851 - acc: 0.8278 - val_loss: 0.5024 - val_acc: 0.8269
Epoch 57/400
50000/50000 [==============================] - 85s - loss: 0.4817 - acc: 0.8293 - val_loss: 0.4888 - val_acc: 0.8297
Epoch 58/400
50000/50000 [==============================] - 85s - loss: 0.4760 - acc: 0.8321 - val_loss: 0.4810 - val_acc: 0.8335
Epoch 59/400
50000/50000 [==============================] - 85s - loss: 0.4695 - acc: 0.8325 - val_loss: 0.4858 - val_acc: 0.8375
Epoch 60/400
50000/50000 [==============================] - 84s - loss: 0.4745 - acc: 0.8331 - val_loss: 0.4668 - val_acc: 0.8415
Epoch 61/400
49984/50000 [============================>.] - ETA: 0s - loss: 0.4583 - acc: 0.8374. 15% epoch=60 acc=0.837380 loss=0.458295 val_acc=0.835000 val_loss=0.476575
        max_acc=0.837380 max_val_acc=0.841500
50000/50000 [==============================] - 84s - loss: 0.4583 - acc: 0.8374 - val_loss: 0.4766 - val_acc: 0.8350
Epoch 62/400
50000/50000 [==============================] - 84s - loss: 0.4586 - acc: 0.8353 - val_loss: 0.4696 - val_acc: 0.8355
Epoch 63/400
50000/50000 [==============================] - 84s - loss: 0.4599 - acc: 0.8368 - val_loss: 0.4643 - val_acc: 0.8415
Epoch 64/400
50000/50000 [==============================] - 84s - loss: 0.4547 - acc: 0.8382 - val_loss: 0.4662 - val_acc: 0.8389
Epoch 65/400
50000/50000 [==============================] - 84s - loss: 0.4470 - acc: 0.8417 - val_loss: 0.4825 - val_acc: 0.8369
Epoch 66/400
50000/50000 [==============================] - 84s - loss: 0.4480 - acc: 0.8418 - val_loss: 0.4642 - val_acc: 0.8378
Epoch 67/400
50000/50000 [==============================] - 85s - loss: 0.4455 - acc: 0.8421 - val_loss: 0.4698 - val_acc: 0.8367
Epoch 68/400
50000/50000 [==============================] - 85s - loss: 0.4405 - acc: 0.8448 - val_loss: 0.4567 - val_acc: 0.8420
Epoch 69/400
50000/50000 [==============================] - 85s - loss: 0.4368 - acc: 0.8460 - val_loss: 0.4807 - val_acc: 0.8348
Epoch 70/400
50000/50000 [==============================] - 85s - loss: 0.4338 - acc: 0.8464 - val_loss: 0.4570 - val_acc: 0.8426
Epoch 71/400
50000/50000 [==============================] - 85s - loss: 0.4287 - acc: 0.8469 - val_loss: 0.4679 - val_acc: 0.8359
Epoch 72/400
50000/50000 [==============================] - 85s - loss: 0.4253 - acc: 0.8478 - val_loss: 0.4663 - val_acc: 0.8393
Epoch 73/400
50000/50000 [==============================] - 85s - loss: 0.4239 - acc: 0.8503 - val_loss: 0.4691 - val_acc: 0.8363
Epoch 74/400
50000/50000 [==============================] - 85s - loss: 0.4264 - acc: 0.8475 - val_loss: 0.4552 - val_acc: 0.8449
Epoch 75/400
50000/50000 [==============================] - 85s - loss: 0.4183 - acc: 0.8522 - val_loss: 0.4503 - val_acc: 0.8475
Epoch 76/400
50000/50000 [==============================] - 85s - loss: 0.4129 - acc: 0.8536 - val_loss: 0.4740 - val_acc: 0.8403
Epoch 77/400
50000/50000 [==============================] - 85s - loss: 0.4099 - acc: 0.8563 - val_loss: 0.4521 - val_acc: 0.8453
Epoch 78/400
50000/50000 [==============================] - 85s - loss: 0.4074 - acc: 0.8541 - val_loss: 0.4658 - val_acc: 0.8439
Epoch 79/400
50000/50000 [==============================] - 85s - loss: 0.4096 - acc: 0.8555 - val_loss: 0.4719 - val_acc: 0.8415
Epoch 80/400
50000/50000 [==============================] - 85s - loss: 0.4069 - acc: 0.8558 - val_loss: 0.4607 - val_acc: 0.8410
Epoch 81/400
49984/50000 [============================>.] - ETA: 0s - loss: 0.4046 - acc: 0.8559. 20% epoch=80 acc=0.855920 loss=0.404536 val_acc=0.847400 val_loss=0.450646
        max_acc=0.856260 max_val_acc=0.847500
50000/50000 [==============================] - 85s - loss: 0.4045 - acc: 0.8559 - val_loss: 0.4506 - val_acc: 0.8474
Epoch 82/400
50000/50000 [==============================] - 85s - loss: 0.4006 - acc: 0.8572 - val_loss: 0.4721 - val_acc: 0.8405
Epoch 83/400
50000/50000 [==============================] - 85s - loss: 0.3990 - acc: 0.8586 - val_loss: 0.4524 - val_acc: 0.8470
Epoch 84/400
50000/50000 [==============================] - 85s - loss: 0.3904 - acc: 0.8596 - val_loss: 0.4479 - val_acc: 0.8460
Epoch 85/400
50000/50000 [==============================] - 85s - loss: 0.3938 - acc: 0.8596 - val_loss: 0.4591 - val_acc: 0.8452
Epoch 86/400
50000/50000 [==============================] - 85s - loss: 0.3896 - acc: 0.8621 - val_loss: 0.4467 - val_acc: 0.8475
Epoch 87/400
50000/50000 [==============================] - 85s - loss: 0.3869 - acc: 0.8626 - val_loss: 0.4523 - val_acc: 0.8474
Epoch 88/400
50000/50000 [==============================] - 85s - loss: 0.3883 - acc: 0.8616 - val_loss: 0.4385 - val_acc: 0.8493
Epoch 89/400
50000/50000 [==============================] - 85s - loss: 0.3789 - acc: 0.8657 - val_loss: 0.4402 - val_acc: 0.8504
Epoch 90/400
50000/50000 [==============================] - 85s - loss: 0.3813 - acc: 0.8665 - val_loss: 0.4388 - val_acc: 0.8487
Epoch 91/400
50000/50000 [==============================] - 85s - loss: 0.3752 - acc: 0.8660 - val_loss: 0.4404 - val_acc: 0.8494
Epoch 92/400
50000/50000 [==============================] - 85s - loss: 0.3727 - acc: 0.8662 - val_loss: 0.4457 - val_acc: 0.8497
Epoch 93/400
50000/50000 [==============================] - 85s - loss: 0.3750 - acc: 0.8668 - val_loss: 0.4538 - val_acc: 0.8470
Epoch 94/400
50000/50000 [==============================] - 85s - loss: 0.3736 - acc: 0.8678 - val_loss: 0.4446 - val_acc: 0.8480
Epoch 95/400
50000/50000 [==============================] - 85s - loss: 0.3679 - acc: 0.8689 - val_loss: 0.4357 - val_acc: 0.8528
Epoch 96/400
50000/50000 [==============================] - 85s - loss: 0.3656 - acc: 0.8704 - val_loss: 0.4451 - val_acc: 0.8515
Epoch 97/400
50000/50000 [==============================] - 85s - loss: 0.3636 - acc: 0.8707 - val_loss: 0.4348 - val_acc: 0.8530
Epoch 98/400
50000/50000 [==============================] - 85s - loss: 0.3645 - acc: 0.8705 - val_loss: 0.4425 - val_acc: 0.8490
Epoch 99/400
50000/50000 [==============================] - 85s - loss: 0.3624 - acc: 0.8713 - val_loss: 0.4394 - val_acc: 0.8527
Epoch 100/400
50000/50000 [==============================] - 85s - loss: 0.3568 - acc: 0.8722 - val_loss: 0.4442 - val_acc: 0.8503
Epoch 101/400
49984/50000 [============================>.] - ETA: 0s - loss: 0.3556 - acc: 0.8739. 25% epoch=100 acc=0.873780 loss=0.355939 val_acc=0.852300 val_loss=0.436661
        max_acc=0.873780 max_val_acc=0.853000
50000/50000 [==============================] - 85s - loss: 0.3559 - acc: 0.8738 - val_loss: 0.4367 - val_acc: 0.8523
Epoch 102/400
50000/50000 [==============================] - 85s - loss: 0.3556 - acc: 0.8733 - val_loss: 0.4336 - val_acc: 0.8522
Epoch 103/400
50000/50000 [==============================] - 86s - loss: 0.3507 - acc: 0.8761 - val_loss: 0.4424 - val_acc: 0.8489
Epoch 104/400
50000/50000 [==============================] - 86s - loss: 0.3538 - acc: 0.8756 - val_loss: 0.4317 - val_acc: 0.8535
Epoch 105/400
50000/50000 [==============================] - 86s - loss: 0.3457 - acc: 0.8763 - val_loss: 0.4444 - val_acc: 0.8536
Epoch 106/400
50000/50000 [==============================] - 86s - loss: 0.3476 - acc: 0.8761 - val_loss: 0.4352 - val_acc: 0.8518
Epoch 107/400
50000/50000 [==============================] - 85s - loss: 0.3502 - acc: 0.8741 - val_loss: 0.4363 - val_acc: 0.8519
Epoch 108/400
50000/50000 [==============================] - 86s - loss: 0.3499 - acc: 0.8748 - val_loss: 0.4350 - val_acc: 0.8523
Epoch 109/400
50000/50000 [==============================] - 87s - loss: 0.3422 - acc: 0.8779 - val_loss: 0.4392 - val_acc: 0.8513
Epoch 110/400
50000/50000 [==============================] - 86s - loss: 0.3368 - acc: 0.8800 - val_loss: 0.4426 - val_acc: 0.8514
Epoch 111/400
50000/50000 [==============================] - 87s - loss: 0.3398 - acc: 0.8792 - val_loss: 0.4358 - val_acc: 0.8509
Epoch 112/400
50000/50000 [==============================] - 85s - loss: 0.3353 - acc: 0.8804 - val_loss: 0.4475 - val_acc: 0.8521
Epoch 113/400
50000/50000 [==============================] - 85s - loss: 0.3343 - acc: 0.8809 - val_loss: 0.4388 - val_acc: 0.8502
Epoch 114/400
50000/50000 [==============================] - 85s - loss: 0.3359 - acc: 0.8800 - val_loss: 0.4355 - val_acc: 0.8531
Epoch 115/400
50000/50000 [==============================] - 85s - loss: 0.3299 - acc: 0.8826 - val_loss: 0.4225 - val_acc: 0.8544
Epoch 116/400
50000/50000 [==============================] - 85s - loss: 0.3279 - acc: 0.8839 - val_loss: 0.4295 - val_acc: 0.8567
Epoch 117/400
50000/50000 [==============================] - 85s - loss: 0.3303 - acc: 0.8827 - val_loss: 0.4341 - val_acc: 0.8560
Epoch 118/400
50000/50000 [==============================] - 85s - loss: 0.3277 - acc: 0.8837 - val_loss: 0.4476 - val_acc: 0.8502
Epoch 119/400
50000/50000 [==============================] - 85s - loss: 0.3284 - acc: 0.8832 - val_loss: 0.4404 - val_acc: 0.8534
Epoch 120/400
50000/50000 [==============================] - 85s - loss: 0.3245 - acc: 0.8840 - val_loss: 0.4266 - val_acc: 0.8564
Epoch 121/400
49984/50000 [============================>.] - ETA: 0s - loss: 0.3248 - acc: 0.8845. 30% epoch=120 acc=0.884500 loss=0.324790 val_acc=0.856300 val_loss=0.423249
        max_acc=0.884500 max_val_acc=0.856700
50000/50000 [==============================] - 85s - loss: 0.3248 - acc: 0.8845 - val_loss: 0.4232 - val_acc: 0.8563
Epoch 122/400
50000/50000 [==============================] - 85s - loss: 0.3208 - acc: 0.8841 - val_loss: 0.4360 - val_acc: 0.8533
Epoch 123/400
50000/50000 [==============================] - 85s - loss: 0.3161 - acc: 0.8885 - val_loss: 0.4327 - val_acc: 0.8572
Epoch 124/400
50000/50000 [==============================] - 85s - loss: 0.3202 - acc: 0.8846 - val_loss: 0.4240 - val_acc: 0.8586
Epoch 125/400
50000/50000 [==============================] - 85s - loss: 0.3115 - acc: 0.8884 - val_loss: 0.4430 - val_acc: 0.8542
Epoch 126/400
50000/50000 [==============================] - 85s - loss: 0.3162 - acc: 0.8876 - val_loss: 0.4441 - val_acc: 0.8544
Epoch 127/400
50000/50000 [==============================] - 85s - loss: 0.3188 - acc: 0.8857 - val_loss: 0.4289 - val_acc: 0.8551
Epoch 128/400
50000/50000 [==============================] - 85s - loss: 0.3119 - acc: 0.8878 - val_loss: 0.4250 - val_acc: 0.8599
Epoch 129/400
50000/50000 [==============================] - 86s - loss: 0.3080 - acc: 0.8889 - val_loss: 0.4275 - val_acc: 0.8607
Epoch 130/400
50000/50000 [==============================] - 87s - loss: 0.3060 - acc: 0.8907 - val_loss: 0.4326 - val_acc: 0.8570
Epoch 131/400
50000/50000 [==============================] - 86s - loss: 0.3091 - acc: 0.8899 - val_loss: 0.4263 - val_acc: 0.8600
Epoch 132/400
50000/50000 [==============================] - 86s - loss: 0.3073 - acc: 0.8901 - val_loss: 0.4356 - val_acc: 0.8580
Epoch 133/400
50000/50000 [==============================] - 86s - loss: 0.3025 - acc: 0.8916 - val_loss: 0.4276 - val_acc: 0.8578
Epoch 134/400
50000/50000 [==============================] - 86s - loss: 0.3035 - acc: 0.8910 - val_loss: 0.4250 - val_acc: 0.8581
Epoch 135/400
50000/50000 [==============================] - 86s - loss: 0.3030 - acc: 0.8915 - val_loss: 0.4303 - val_acc: 0.8580
Epoch 136/400
50000/50000 [==============================] - 86s - loss: 0.3021 - acc: 0.8928 - val_loss: 0.4421 - val_acc: 0.8574
Epoch 137/400
50000/50000 [==============================] - 86s - loss: 0.3027 - acc: 0.8932 - val_loss: 0.4324 - val_acc: 0.8576
Epoch 138/400
50000/50000 [==============================] - 86s - loss: 0.3004 - acc: 0.8931 - val_loss: 0.4344 - val_acc: 0.8574
Epoch 139/400
50000/50000 [==============================] - 86s - loss: 0.2980 - acc: 0.8934 - val_loss: 0.4371 - val_acc: 0.8574
Epoch 140/400
50000/50000 [==============================] - 86s - loss: 0.2976 - acc: 0.8932 - val_loss: 0.4290 - val_acc: 0.8589
Epoch 141/400
49984/50000 [============================>.] - ETA: 0s - loss: 0.2986 - acc: 0.8926. 35% epoch=140 acc=0.892680 loss=0.298518 val_acc=0.857600 val_loss=0.439679
        max_acc=0.893360 max_val_acc=0.860700
50000/50000 [==============================] - 86s - loss: 0.2985 - acc: 0.8927 - val_loss: 0.4397 - val_acc: 0.8576
Epoch 142/400
50000/50000 [==============================] - 86s - loss: 0.2911 - acc: 0.8960 - val_loss: 0.4361 - val_acc: 0.8581
Epoch 143/400
50000/50000 [==============================] - 86s - loss: 0.2901 - acc: 0.8967 - val_loss: 0.4273 - val_acc: 0.8620
Epoch 144/400
50000/50000 [==============================] - 86s - loss: 0.2934 - acc: 0.8951 - val_loss: 0.4227 - val_acc: 0.8602
Epoch 145/400
50000/50000 [==============================] - 86s - loss: 0.2964 - acc: 0.8941 - val_loss: 0.4279 - val_acc: 0.8593
Epoch 146/400
50000/50000 [==============================] - 86s - loss: 0.2878 - acc: 0.8961 - val_loss: 0.4400 - val_acc: 0.8535
Epoch 147/400
50000/50000 [==============================] - 86s - loss: 0.2914 - acc: 0.8959 - val_loss: 0.4270 - val_acc: 0.8609
Epoch 148/400
50000/50000 [==============================] - 86s - loss: 0.2925 - acc: 0.8944 - val_loss: 0.4251 - val_acc: 0.8618
Epoch 149/400
50000/50000 [==============================] - 86s - loss: 0.2846 - acc: 0.8973 - val_loss: 0.4196 - val_acc: 0.8604
Epoch 150/400
50000/50000 [==============================] - 86s - loss: 0.2814 - acc: 0.9012 - val_loss: 0.4307 - val_acc: 0.8586
Epoch 151/400
50000/50000 [==============================] - 86s - loss: 0.2861 - acc: 0.8981 - val_loss: 0.4224 - val_acc: 0.8612
Epoch 152/400
50000/50000 [==============================] - 86s - loss: 0.2850 - acc: 0.8991 - val_loss: 0.4190 - val_acc: 0.8629
Epoch 153/400
50000/50000 [==============================] - 86s - loss: 0.2841 - acc: 0.8988 - val_loss: 0.4238 - val_acc: 0.8598
Epoch 154/400
50000/50000 [==============================] - 86s - loss: 0.2825 - acc: 0.8986 - val_loss: 0.4263 - val_acc: 0.8604
Epoch 155/400
50000/50000 [==============================] - 86s - loss: 0.2813 - acc: 0.8993 - val_loss: 0.4177 - val_acc: 0.8643
Epoch 156/400
50000/50000 [==============================] - 86s - loss: 0.2744 - acc: 0.9020 - val_loss: 0.4375 - val_acc: 0.8585
Epoch 157/400
50000/50000 [==============================] - 86s - loss: 0.2791 - acc: 0.9000 - val_loss: 0.4256 - val_acc: 0.8610
Epoch 158/400
50000/50000 [==============================] - 86s - loss: 0.2772 - acc: 0.9003 - val_loss: 0.4272 - val_acc: 0.8634
Epoch 159/400
50000/50000 [==============================] - 87s - loss: 0.2787 - acc: 0.9004 - val_loss: 0.4254 - val_acc: 0.8599
Epoch 160/400
50000/50000 [==============================] - 86s - loss: 0.2758 - acc: 0.9024 - val_loss: 0.4205 - val_acc: 0.8619
Epoch 161/400
49984/50000 [============================>.] - ETA: 0s - loss: 0.2755 - acc: 0.9005. 40% epoch=160 acc=0.900480 loss=0.275488 val_acc=0.859400 val_loss=0.432495
        max_acc=0.902380 max_val_acc=0.864300
50000/50000 [==============================] - 86s - loss: 0.2755 - acc: 0.9005 - val_loss: 0.4325 - val_acc: 0.8594
Epoch 162/400
50000/50000 [==============================] - 86s - loss: 0.2760 - acc: 0.9015 - val_loss: 0.4250 - val_acc: 0.8612
Epoch 163/400
50000/50000 [==============================] - 86s - loss: 0.2717 - acc: 0.9044 - val_loss: 0.4301 - val_acc: 0.8609
Epoch 164/400
50000/50000 [==============================] - 86s - loss: 0.2728 - acc: 0.9029 - val_loss: 0.4324 - val_acc: 0.8608
Epoch 165/400
50000/50000 [==============================] - 86s - loss: 0.2697 - acc: 0.9037 - val_loss: 0.4388 - val_acc: 0.8572
Epoch 166/400
50000/50000 [==============================] - 86s - loss: 0.2712 - acc: 0.9038 - val_loss: 0.4319 - val_acc: 0.8588
Epoch 167/400
50000/50000 [==============================] - 86s - loss: 0.2693 - acc: 0.9035 - val_loss: 0.4209 - val_acc: 0.8664
Epoch 168/400
50000/50000 [==============================] - 86s - loss: 0.2712 - acc: 0.9033 - val_loss: 0.4274 - val_acc: 0.8634
Epoch 169/400
50000/50000 [==============================] - 86s - loss: 0.2645 - acc: 0.9063 - val_loss: 0.4298 - val_acc: 0.8635
Epoch 170/400
50000/50000 [==============================] - 86s - loss: 0.2657 - acc: 0.9052 - val_loss: 0.4167 - val_acc: 0.8669
Epoch 171/400
50000/50000 [==============================] - 86s - loss: 0.2657 - acc: 0.9056 - val_loss: 0.4187 - val_acc: 0.8660
Epoch 172/400
50000/50000 [==============================] - 86s - loss: 0.2639 - acc: 0.9062 - val_loss: 0.4153 - val_acc: 0.8678
Epoch 173/400
50000/50000 [==============================] - 86s - loss: 0.2655 - acc: 0.9064 - val_loss: 0.4221 - val_acc: 0.8653
Epoch 174/400
50000/50000 [==============================] - 86s - loss: 0.2636 - acc: 0.9053 - val_loss: 0.4249 - val_acc: 0.8616
Epoch 175/400
50000/50000 [==============================] - 86s - loss: 0.2657 - acc: 0.9048 - val_loss: 0.4209 - val_acc: 0.8636
Epoch 176/400
50000/50000 [==============================] - 86s - loss: 0.2574 - acc: 0.9074 - val_loss: 0.4284 - val_acc: 0.8642
Epoch 177/400
50000/50000 [==============================] - 86s - loss: 0.2615 - acc: 0.9071 - val_loss: 0.4183 - val_acc: 0.8659
Epoch 178/400
50000/50000 [==============================] - 86s - loss: 0.2647 - acc: 0.9051 - val_loss: 0.4160 - val_acc: 0.8648
Epoch 179/400
50000/50000 [==============================] - 86s - loss: 0.2620 - acc: 0.9054 - val_loss: 0.4230 - val_acc: 0.8641
Epoch 180/400
50000/50000 [==============================] - 86s - loss: 0.2536 - acc: 0.9096 - val_loss: 0.4196 - val_acc: 0.8628
Epoch 181/400
49984/50000 [============================>.] - ETA: 0s - loss: 0.2568 - acc: 0.9090. 45% epoch=180 acc=0.909020 loss=0.256842 val_acc=0.861400 val_loss=0.426353
        max_acc=0.909640 max_val_acc=0.867800
50000/50000 [==============================] - 86s - loss: 0.2568 - acc: 0.9090 - val_loss: 0.4264 - val_acc: 0.8614
Epoch 182/400
50000/50000 [==============================] - 86s - loss: 0.2536 - acc: 0.9097 - val_loss: 0.4179 - val_acc: 0.8666
Epoch 183/400
50000/50000 [==============================] - 86s - loss: 0.2555 - acc: 0.9086 - val_loss: 0.4141 - val_acc: 0.8672
Epoch 184/400
50000/50000 [==============================] - 86s - loss: 0.2574 - acc: 0.9086 - val_loss: 0.4215 - val_acc: 0.8637
Epoch 185/400
50000/50000 [==============================] - 86s - loss: 0.2518 - acc: 0.9087 - val_loss: 0.4254 - val_acc: 0.8652
Epoch 186/400
50000/50000 [==============================] - 86s - loss: 0.2464 - acc: 0.9121 - val_loss: 0.4215 - val_acc: 0.8656
Epoch 187/400
50000/50000 [==============================] - 86s - loss: 0.2507 - acc: 0.9106 - val_loss: 0.4232 - val_acc: 0.8650
Epoch 188/400
50000/50000 [==============================] - 86s - loss: 0.2560 - acc: 0.9086 - val_loss: 0.4185 - val_acc: 0.8669
Epoch 189/400
50000/50000 [==============================] - 86s - loss: 0.2469 - acc: 0.9117 - val_loss: 0.4128 - val_acc: 0.8672
Epoch 190/400
50000/50000 [==============================] - 86s - loss: 0.2518 - acc: 0.9106 - val_loss: 0.4165 - val_acc: 0.8683
Epoch 191/400
50000/50000 [==============================] - 86s - loss: 0.2501 - acc: 0.9107 - val_loss: 0.4197 - val_acc: 0.8684
Epoch 192/400
50000/50000 [==============================] - 86s - loss: 0.2477 - acc: 0.9122 - val_loss: 0.4169 - val_acc: 0.8668
Epoch 193/400
50000/50000 [==============================] - 86s - loss: 0.2448 - acc: 0.9122 - val_loss: 0.4302 - val_acc: 0.8645
Epoch 194/400
50000/50000 [==============================] - 86s - loss: 0.2433 - acc: 0.9122 - val_loss: 0.4264 - val_acc: 0.8663
Epoch 195/400
50000/50000 [==============================] - 86s - loss: 0.2433 - acc: 0.9143 - val_loss: 0.4268 - val_acc: 0.8644
Epoch 196/400
50000/50000 [==============================] - 86s - loss: 0.2450 - acc: 0.9114 - val_loss: 0.4191 - val_acc: 0.8693
Epoch 197/400
50000/50000 [==============================] - 86s - loss: 0.2422 - acc: 0.9128 - val_loss: 0.4187 - val_acc: 0.8702
Epoch 198/400
50000/50000 [==============================] - 86s - loss: 0.2450 - acc: 0.9118 - val_loss: 0.4245 - val_acc: 0.8667
Epoch 199/400
50000/50000 [==============================] - 86s - loss: 0.2449 - acc: 0.9138 - val_loss: 0.4169 - val_acc: 0.8668
Epoch 200/400
50000/50000 [==============================] - 86s - loss: 0.2402 - acc: 0.9132 - val_loss: 0.4204 - val_acc: 0.8659
Epoch 201/400
49984/50000 [============================>.] - ETA: 0s - loss: 0.2434 - acc: 0.9138. 50% epoch=200 acc=0.913800 loss=0.243381 val_acc=0.866400 val_loss=0.424940
        max_acc=0.914320 max_val_acc=0.870200
50000/50000 [==============================] - 86s - loss: 0.2434 - acc: 0.9138 - val_loss: 0.4249 - val_acc: 0.8664
Epoch 202/400
50000/50000 [==============================] - 86s - loss: 0.2393 - acc: 0.9146 - val_loss: 0.4310 - val_acc: 0.8636
Epoch 203/400
50000/50000 [==============================] - 86s - loss: 0.2432 - acc: 0.9136 - val_loss: 0.4262 - val_acc: 0.8649
Epoch 204/400
50000/50000 [==============================] - 86s - loss: 0.2405 - acc: 0.9138 - val_loss: 0.4224 - val_acc: 0.8676
Epoch 205/400
50000/50000 [==============================] - 86s - loss: 0.2382 - acc: 0.9152 - val_loss: 0.4243 - val_acc: 0.8661
Epoch 206/400
50000/50000 [==============================] - 86s - loss: 0.2406 - acc: 0.9137 - val_loss: 0.4217 - val_acc: 0.8675
Epoch 207/400
50000/50000 [==============================] - 86s - loss: 0.2399 - acc: 0.9140 - val_loss: 0.4194 - val_acc: 0.8681
Epoch 208/400
50000/50000 [==============================] - 86s - loss: 0.2348 - acc: 0.9163 - val_loss: 0.4251 - val_acc: 0.8662
Epoch 209/400
50000/50000 [==============================] - 86s - loss: 0.2359 - acc: 0.9172 - val_loss: 0.4381 - val_acc: 0.8624
Epoch 210/400
50000/50000 [==============================] - 86s - loss: 0.2345 - acc: 0.9165 - val_loss: 0.4180 - val_acc: 0.8696
Epoch 211/400
50000/50000 [==============================] - 86s - loss: 0.2336 - acc: 0.9167 - val_loss: 0.4238 - val_acc: 0.8710
Epoch 212/400
50000/50000 [==============================] - 86s - loss: 0.2348 - acc: 0.9167 - val_loss: 0.4312 - val_acc: 0.8659
Epoch 213/400
50000/50000 [==============================] - 86s - loss: 0.2330 - acc: 0.9165 - val_loss: 0.4158 - val_acc: 0.8683
Epoch 214/400
50000/50000 [==============================] - 86s - loss: 0.2338 - acc: 0.9175 - val_loss: 0.4236 - val_acc: 0.8661
Epoch 215/400
50000/50000 [==============================] - 86s - loss: 0.2363 - acc: 0.9151 - val_loss: 0.4122 - val_acc: 0.8699
Epoch 216/400
50000/50000 [==============================] - 86s - loss: 0.2321 - acc: 0.9179 - val_loss: 0.4183 - val_acc: 0.8679
Epoch 217/400
50000/50000 [==============================] - 86s - loss: 0.2291 - acc: 0.9191 - val_loss: 0.4275 - val_acc: 0.8667
Epoch 218/400
50000/50000 [==============================] - 86s - loss: 0.2326 - acc: 0.9163 - val_loss: 0.4237 - val_acc: 0.8656
Epoch 219/400
50000/50000 [==============================] - 86s - loss: 0.2294 - acc: 0.9176 - val_loss: 0.4305 - val_acc: 0.8659
Epoch 220/400
50000/50000 [==============================] - 86s - loss: 0.2320 - acc: 0.9168 - val_loss: 0.4198 - val_acc: 0.8671
Epoch 221/400
49984/50000 [============================>.] - ETA: 0s - loss: 0.2325 - acc: 0.9178. 55% epoch=220 acc=0.917800 loss=0.232463 val_acc=0.866900 val_loss=0.424648
        max_acc=0.919060 max_val_acc=0.871000
50000/50000 [==============================] - 86s - loss: 0.2325 - acc: 0.9178 - val_loss: 0.4246 - val_acc: 0.8669
Epoch 222/400
50000/50000 [==============================] - 86s - loss: 0.2260 - acc: 0.9191 - val_loss: 0.4360 - val_acc: 0.8650
Epoch 223/400
50000/50000 [==============================] - 86s - loss: 0.2264 - acc: 0.9203 - val_loss: 0.4195 - val_acc: 0.8695
Epoch 224/400
50000/50000 [==============================] - 86s - loss: 0.2296 - acc: 0.9165 - val_loss: 0.4162 - val_acc: 0.8702
Epoch 225/400
50000/50000 [==============================] - 86s - loss: 0.2261 - acc: 0.9196 - val_loss: 0.4318 - val_acc: 0.8674
Epoch 226/400
50000/50000 [==============================] - 86s - loss: 0.2267 - acc: 0.9192 - val_loss: 0.4237 - val_acc: 0.8699
Epoch 227/400
50000/50000 [==============================] - 87s - loss: 0.2253 - acc: 0.9184 - val_loss: 0.4249 - val_acc: 0.8673
Epoch 228/400
50000/50000 [==============================] - 86s - loss: 0.2248 - acc: 0.9201 - val_loss: 0.4209 - val_acc: 0.8669
Epoch 229/400
50000/50000 [==============================] - 87s - loss: 0.2246 - acc: 0.9198 - val_loss: 0.4288 - val_acc: 0.8700
Epoch 230/400
50000/50000 [==============================] - 87s - loss: 0.2251 - acc: 0.9185 - val_loss: 0.4194 - val_acc: 0.8702
Epoch 231/400
50000/50000 [==============================] - 86s - loss: 0.2251 - acc: 0.9192 - val_loss: 0.4228 - val_acc: 0.8686
Epoch 232/400
50000/50000 [==============================] - 86s - loss: 0.2184 - acc: 0.9225 - val_loss: 0.4296 - val_acc: 0.8710
Epoch 233/400
50000/50000 [==============================] - 87s - loss: 0.2254 - acc: 0.9194 - val_loss: 0.4278 - val_acc: 0.8679
Epoch 234/400
50000/50000 [==============================] - 87s - loss: 0.2205 - acc: 0.9201 - val_loss: 0.4312 - val_acc: 0.8675
Epoch 235/400
50000/50000 [==============================] - 87s - loss: 0.2215 - acc: 0.9210 - val_loss: 0.4283 - val_acc: 0.8691
Epoch 236/400
50000/50000 [==============================] - 86s - loss: 0.2199 - acc: 0.9226 - val_loss: 0.4327 - val_acc: 0.8680
Epoch 237/400
50000/50000 [==============================] - 87s - loss: 0.2185 - acc: 0.9208 - val_loss: 0.4324 - val_acc: 0.8670
Epoch 238/400
50000/50000 [==============================] - 87s - loss: 0.2195 - acc: 0.9206 - val_loss: 0.4276 - val_acc: 0.8691
Epoch 239/400
50000/50000 [==============================] - 87s - loss: 0.2226 - acc: 0.9203 - val_loss: 0.4287 - val_acc: 0.8695
Epoch 240/400
50000/50000 [==============================] - 87s - loss: 0.2192 - acc: 0.9222 - val_loss: 0.4291 - val_acc: 0.8688
Epoch 241/400
49984/50000 [============================>.] - ETA: 0s - loss: 0.2157 - acc: 0.9231. 60% epoch=240 acc=0.923120 loss=0.215661 val_acc=0.869000 val_loss=0.422009
        max_acc=0.923120 max_val_acc=0.871000
50000/50000 [==============================] - 87s - loss: 0.2157 - acc: 0.9231 - val_loss: 0.4220 - val_acc: 0.8690
Epoch 242/400
50000/50000 [==============================] - 87s - loss: 0.2185 - acc: 0.9218 - val_loss: 0.4273 - val_acc: 0.8664
Epoch 243/400
50000/50000 [==============================] - 87s - loss: 0.2110 - acc: 0.9238 - val_loss: 0.4326 - val_acc: 0.8673
Epoch 244/400
50000/50000 [==============================] - 87s - loss: 0.2166 - acc: 0.9225 - val_loss: 0.4259 - val_acc: 0.8679
Epoch 245/400
50000/50000 [==============================] - 87s - loss: 0.2151 - acc: 0.9237 - val_loss: 0.4294 - val_acc: 0.8692
Epoch 246/400
50000/50000 [==============================] - 87s - loss: 0.2164 - acc: 0.9229 - val_loss: 0.4318 - val_acc: 0.8698
Epoch 247/400
50000/50000 [==============================] - 87s - loss: 0.2094 - acc: 0.9264 - val_loss: 0.4260 - val_acc: 0.8685
Epoch 248/400
50000/50000 [==============================] - 87s - loss: 0.2153 - acc: 0.9234 - val_loss: 0.4297 - val_acc: 0.8679
Epoch 249/400
50000/50000 [==============================] - 87s - loss: 0.2141 - acc: 0.9229 - val_loss: 0.4280 - val_acc: 0.8672
Epoch 250/400
50000/50000 [==============================] - 87s - loss: 0.2077 - acc: 0.9256 - val_loss: 0.4247 - val_acc: 0.8682
Epoch 251/400
50000/50000 [==============================] - 87s - loss: 0.2118 - acc: 0.9243 - val_loss: 0.4312 - val_acc: 0.8680
Epoch 252/400
50000/50000 [==============================] - 87s - loss: 0.2135 - acc: 0.9243 - val_loss: 0.4284 - val_acc: 0.8683
Epoch 253/400
50000/50000 [==============================] - 87s - loss: 0.2168 - acc: 0.9220 - val_loss: 0.4291 - val_acc: 0.8685
Epoch 254/400
50000/50000 [==============================] - 87s - loss: 0.2139 - acc: 0.9240 - val_loss: 0.4310 - val_acc: 0.8683
Epoch 255/400
50000/50000 [==============================] - 87s - loss: 0.2101 - acc: 0.9246 - val_loss: 0.4271 - val_acc: 0.8709
Epoch 256/400
50000/50000 [==============================] - 87s - loss: 0.2090 - acc: 0.9255 - val_loss: 0.4268 - val_acc: 0.8709
Epoch 257/400
50000/50000 [==============================] - 87s - loss: 0.2090 - acc: 0.9256 - val_loss: 0.4301 - val_acc: 0.8708
Epoch 258/400
50000/50000 [==============================] - 87s - loss: 0.2092 - acc: 0.9250 - val_loss: 0.4294 - val_acc: 0.8679
Epoch 259/400
50000/50000 [==============================] - 87s - loss: 0.2064 - acc: 0.9263 - val_loss: 0.4240 - val_acc: 0.8709
Epoch 260/400
50000/50000 [==============================] - 87s - loss: 0.2093 - acc: 0.9260 - val_loss: 0.4334 - val_acc: 0.8701
Epoch 261/400
49984/50000 [============================>.] - ETA: 0s - loss: 0.2081 - acc: 0.9257. 65% epoch=260 acc=0.925680 loss=0.208086 val_acc=0.871000 val_loss=0.424436
        max_acc=0.926420 max_val_acc=0.871000
50000/50000 [==============================] - 87s - loss: 0.2081 - acc: 0.9257 - val_loss: 0.4244 - val_acc: 0.8710
Epoch 262/400
50000/50000 [==============================] - 87s - loss: 0.2082 - acc: 0.9259 - val_loss: 0.4234 - val_acc: 0.8695
Epoch 263/400
50000/50000 [==============================] - 87s - loss: 0.2086 - acc: 0.9250 - val_loss: 0.4270 - val_acc: 0.8677
Epoch 264/400
50000/50000 [==============================] - 87s - loss: 0.2022 - acc: 0.9268 - val_loss: 0.4360 - val_acc: 0.8692
Epoch 265/400
50000/50000 [==============================] - 87s - loss: 0.2057 - acc: 0.9251 - val_loss: 0.4402 - val_acc: 0.8648
Epoch 266/400
50000/50000 [==============================] - 87s - loss: 0.2052 - acc: 0.9276 - val_loss: 0.4246 - val_acc: 0.8697
Epoch 267/400
50000/50000 [==============================] - 87s - loss: 0.2061 - acc: 0.9268 - val_loss: 0.4295 - val_acc: 0.8681
Epoch 268/400
50000/50000 [==============================] - 87s - loss: 0.2066 - acc: 0.9273 - val_loss: 0.4281 - val_acc: 0.8688
Epoch 269/400
50000/50000 [==============================] - 87s - loss: 0.2017 - acc: 0.9268 - val_loss: 0.4271 - val_acc: 0.8705
Epoch 270/400
50000/50000 [==============================] - 87s - loss: 0.2090 - acc: 0.9248 - val_loss: 0.4327 - val_acc: 0.8687
Epoch 271/400
50000/50000 [==============================] - 87s - loss: 0.2011 - acc: 0.9284 - val_loss: 0.4298 - val_acc: 0.8649
Epoch 272/400
50000/50000 [==============================] - 87s - loss: 0.2045 - acc: 0.9274 - val_loss: 0.4276 - val_acc: 0.8680
Epoch 273/400
50000/50000 [==============================] - 87s - loss: 0.2050 - acc: 0.9282 - val_loss: 0.4215 - val_acc: 0.8691
Epoch 274/400
50000/50000 [==============================] - 87s - loss: 0.2002 - acc: 0.9280 - val_loss: 0.4313 - val_acc: 0.8671
Epoch 275/400
50000/50000 [==============================] - 87s - loss: 0.2020 - acc: 0.9279 - val_loss: 0.4345 - val_acc: 0.8682
Epoch 276/400
50000/50000 [==============================] - 87s - loss: 0.1987 - acc: 0.9287 - val_loss: 0.4362 - val_acc: 0.8664
Epoch 277/400
50000/50000 [==============================] - 87s - loss: 0.2041 - acc: 0.9273 - val_loss: 0.4261 - val_acc: 0.8721
Epoch 278/400
50000/50000 [==============================] - 87s - loss: 0.2016 - acc: 0.9278 - val_loss: 0.4254 - val_acc: 0.8702
Epoch 279/400
50000/50000 [==============================] - 87s - loss: 0.2011 - acc: 0.9290 - val_loss: 0.4308 - val_acc: 0.8701
Epoch 280/400
50000/50000 [==============================] - 87s - loss: 0.2033 - acc: 0.9272 - val_loss: 0.4273 - val_acc: 0.8698
Epoch 281/400
49984/50000 [============================>.] - ETA: 0s - loss: 0.1990 - acc: 0.9295. 70% epoch=280 acc=0.929520 loss=0.199103 val_acc=0.869900 val_loss=0.420189
        max_acc=0.929520 max_val_acc=0.872100
50000/50000 [==============================] - 87s - loss: 0.1991 - acc: 0.9295 - val_loss: 0.4202 - val_acc: 0.8699
Epoch 282/400
50000/50000 [==============================] - 87s - loss: 0.1946 - acc: 0.9302 - val_loss: 0.4308 - val_acc: 0.8674
Epoch 283/400
50000/50000 [==============================] - 87s - loss: 0.1986 - acc: 0.9284 - val_loss: 0.4253 - val_acc: 0.8664
Epoch 284/400
50000/50000 [==============================] - 87s - loss: 0.1978 - acc: 0.9291 - val_loss: 0.4292 - val_acc: 0.8695
Epoch 285/400
50000/50000 [==============================] - 87s - loss: 0.1977 - acc: 0.9286 - val_loss: 0.4294 - val_acc: 0.8688
Epoch 286/400
50000/50000 [==============================] - 87s - loss: 0.1952 - acc: 0.9304 - val_loss: 0.4333 - val_acc: 0.8666
Epoch 287/400
50000/50000 [==============================] - 87s - loss: 0.1962 - acc: 0.9289 - val_loss: 0.4293 - val_acc: 0.8713
Epoch 288/400
50000/50000 [==============================] - 87s - loss: 0.1990 - acc: 0.9286 - val_loss: 0.4351 - val_acc: 0.8671
Epoch 289/400
50000/50000 [==============================] - 87s - loss: 0.1976 - acc: 0.9301 - val_loss: 0.4331 - val_acc: 0.8683
Epoch 290/400
50000/50000 [==============================] - 87s - loss: 0.1947 - acc: 0.9299 - val_loss: 0.4309 - val_acc: 0.8695
Epoch 291/400
50000/50000 [==============================] - 87s - loss: 0.1960 - acc: 0.9310 - val_loss: 0.4316 - val_acc: 0.8677
Epoch 292/400
50000/50000 [==============================] - 87s - loss: 0.1940 - acc: 0.9304 - val_loss: 0.4255 - val_acc: 0.8705
Epoch 293/400
49984/50000 [============================>.] - ETA: 0s - loss: 0.1985 - acc: 0.929150000/50000 [==============================] - 87s - loss: 0.1985 - acc: 0.9291 - val_loss: 0.4267 - val_acc: 0.8724
Epoch 294/400
50000/50000 [==============================] - 86s - loss: 0.1941 - acc: 0.9299 - val_loss: 0.4378 - val_acc: 0.8695
Epoch 295/400
50000/50000 [==============================] - 86s - loss: 0.1933 - acc: 0.9317 - val_loss: 0.4361 - val_acc: 0.8690
Epoch 296/400
50000/50000 [==============================] - 86s - loss: 0.1945 - acc: 0.9307 - val_loss: 0.4365 - val_acc: 0.8677
Epoch 297/400
50000/50000 [==============================] - 86s - loss: 0.1942 - acc: 0.9300 - val_loss: 0.4290 - val_acc: 0.8734
Epoch 298/400
50000/50000 [==============================] - 86s - loss: 0.1940 - acc: 0.9316 - val_loss: 0.4427 - val_acc: 0.8692
Epoch 299/400
50000/50000 [==============================] - 86s - loss: 0.1890 - acc: 0.9330 - val_loss: 0.4306 - val_acc: 0.8698
Epoch 300/400
50000/50000 [==============================] - 86s - loss: 0.1911 - acc: 0.9315 - val_loss: 0.4343 - val_acc: 0.8714
Epoch 301/400
49984/50000 [============================>.] - ETA: 0s - loss: 0.1913 - acc: 0.9317. 75% epoch=300 acc=0.931680 loss=0.191382 val_acc=0.870400 val_loss=0.434529
        max_acc=0.933020 max_val_acc=0.873400
50000/50000 [==============================] - 86s - loss: 0.1914 - acc: 0.9317 - val_loss: 0.4345 - val_acc: 0.8704
Epoch 302/400
50000/50000 [==============================] - 86s - loss: 0.1903 - acc: 0.9327 - val_loss: 0.4285 - val_acc: 0.8703
Epoch 303/400
50000/50000 [==============================] - 86s - loss: 0.1906 - acc: 0.9322 - val_loss: 0.4245 - val_acc: 0.8729
Epoch 304/400
50000/50000 [==============================] - 86s - loss: 0.1921 - acc: 0.9313 - val_loss: 0.4243 - val_acc: 0.8706
Epoch 305/400
50000/50000 [==============================] - 86s - loss: 0.1898 - acc: 0.9317 - val_loss: 0.4467 - val_acc: 0.8665
Epoch 306/400
50000/50000 [==============================] - 86s - loss: 0.1870 - acc: 0.9325 - val_loss: 0.4313 - val_acc: 0.8717
Epoch 307/400
50000/50000 [==============================] - 86s - loss: 0.1931 - acc: 0.9308 - val_loss: 0.4348 - val_acc: 0.8681
Epoch 308/400
50000/50000 [==============================] - 86s - loss: 0.1878 - acc: 0.9326 - val_loss: 0.4316 - val_acc: 0.8695
Epoch 309/400
50000/50000 [==============================] - 86s - loss: 0.1886 - acc: 0.9327 - val_loss: 0.4354 - val_acc: 0.8671
Epoch 310/400
50000/50000 [==============================] - 85s - loss: 0.1911 - acc: 0.9313 - val_loss: 0.4293 - val_acc: 0.8695
Epoch 311/400
50000/50000 [==============================] - 85s - loss: 0.1858 - acc: 0.9344 - val_loss: 0.4336 - val_acc: 0.8694
Epoch 312/400
50000/50000 [==============================] - 84s - loss: 0.1854 - acc: 0.9344 - val_loss: 0.4343 - val_acc: 0.8693
Epoch 313/400
50000/50000 [==============================] - 85s - loss: 0.1885 - acc: 0.9338 - val_loss: 0.4324 - val_acc: 0.8729
Epoch 314/400
50000/50000 [==============================] - 85s - loss: 0.1869 - acc: 0.9332 - val_loss: 0.4346 - val_acc: 0.8705
Epoch 315/400
50000/50000 [==============================] - 85s - loss: 0.1909 - acc: 0.9317 - val_loss: 0.4491 - val_acc: 0.8661
Epoch 316/400
50000/50000 [==============================] - 85s - loss: 0.1910 - acc: 0.9308 - val_loss: 0.4334 - val_acc: 0.8713
Epoch 317/400
50000/50000 [==============================] - 85s - loss: 0.1906 - acc: 0.9329 - val_loss: 0.4303 - val_acc: 0.8695
Epoch 318/400
50000/50000 [==============================] - 85s - loss: 0.1862 - acc: 0.9330 - val_loss: 0.4368 - val_acc: 0.8710
Epoch 319/400
50000/50000 [==============================] - 84s - loss: 0.1814 - acc: 0.9339 - val_loss: 0.4333 - val_acc: 0.8714
Epoch 320/400
50000/50000 [==============================] - 85s - loss: 0.1872 - acc: 0.9338 - val_loss: 0.4353 - val_acc: 0.8725
Epoch 321/400
49984/50000 [============================>.] - ETA: 0s - loss: 0.1859 - acc: 0.9344. 80% epoch=320 acc=0.934380 loss=0.185912 val_acc=0.872100 val_loss=0.435961
        max_acc=0.934380 max_val_acc=0.873400
50000/50000 [==============================] - 84s - loss: 0.1859 - acc: 0.9344 - val_loss: 0.4360 - val_acc: 0.8721
Epoch 322/400
50000/50000 [==============================] - 85s - loss: 0.1846 - acc: 0.9351 - val_loss: 0.4304 - val_acc: 0.8712
Epoch 323/400
50000/50000 [==============================] - 84s - loss: 0.1779 - acc: 0.9368 - val_loss: 0.4454 - val_acc: 0.8674
Epoch 324/400
50000/50000 [==============================] - 85s - loss: 0.1849 - acc: 0.9335 - val_loss: 0.4438 - val_acc: 0.8723
Epoch 325/400
50000/50000 [==============================] - 84s - loss: 0.1841 - acc: 0.9342 - val_loss: 0.4497 - val_acc: 0.8693
Epoch 326/400
50000/50000 [==============================] - 85s - loss: 0.1845 - acc: 0.9341 - val_loss: 0.4292 - val_acc: 0.8711
Epoch 327/400
50000/50000 [==============================] - 84s - loss: 0.1811 - acc: 0.9352 - val_loss: 0.4357 - val_acc: 0.8718
Epoch 328/400
50000/50000 [==============================] - 85s - loss: 0.1848 - acc: 0.9329 - val_loss: 0.4306 - val_acc: 0.8712
Epoch 329/400
50000/50000 [==============================] - 85s - loss: 0.1808 - acc: 0.9354 - val_loss: 0.4295 - val_acc: 0.8716
Epoch 330/400
50000/50000 [==============================] - 87s - loss: 0.1775 - acc: 0.9373 - val_loss: 0.4342 - val_acc: 0.8744
Epoch 331/400
50000/50000 [==============================] - 85s - loss: 0.1804 - acc: 0.9356 - val_loss: 0.4427 - val_acc: 0.8707
Epoch 332/400
50000/50000 [==============================] - 84s - loss: 0.1821 - acc: 0.9359 - val_loss: 0.4361 - val_acc: 0.8710
Epoch 333/400
50000/50000 [==============================] - 84s - loss: 0.1825 - acc: 0.9349 - val_loss: 0.4332 - val_acc: 0.8732
Epoch 334/400
50000/50000 [==============================] - 85s - loss: 0.1861 - acc: 0.9341 - val_loss: 0.4367 - val_acc: 0.8706
Epoch 335/400
50000/50000 [==============================] - 84s - loss: 0.1834 - acc: 0.9353 - val_loss: 0.4297 - val_acc: 0.8710
Epoch 336/400
50000/50000 [==============================] - 84s - loss: 0.1822 - acc: 0.9362 - val_loss: 0.4431 - val_acc: 0.8703
Epoch 337/400
50000/50000 [==============================] - 84s - loss: 0.1786 - acc: 0.9354 - val_loss: 0.4439 - val_acc: 0.8725
Epoch 338/400
50000/50000 [==============================] - 85s - loss: 0.1813 - acc: 0.9355 - val_loss: 0.4351 - val_acc: 0.8721
Epoch 339/400
50000/50000 [==============================] - 85s - loss: 0.1761 - acc: 0.9370 - val_loss: 0.4312 - val_acc: 0.8723
Epoch 340/400
50000/50000 [==============================] - 85s - loss: 0.1744 - acc: 0.9375 - val_loss: 0.4369 - val_acc: 0.8698
Epoch 341/400
49984/50000 [============================>.] - ETA: 0s - loss: 0.1812 - acc: 0.9354. 85% epoch=340 acc=0.935380 loss=0.181248 val_acc=0.870600 val_loss=0.431464
        max_acc=0.937540 max_val_acc=0.874400
50000/50000 [==============================] - 85s - loss: 0.1812 - acc: 0.9354 - val_loss: 0.4315 - val_acc: 0.8706
Epoch 342/400
50000/50000 [==============================] - 85s - loss: 0.1799 - acc: 0.9351 - val_loss: 0.4351 - val_acc: 0.8716
Epoch 343/400
50000/50000 [==============================] - 85s - loss: 0.1775 - acc: 0.9381 - val_loss: 0.4404 - val_acc: 0.8704
Epoch 344/400
50000/50000 [==============================] - 85s - loss: 0.1750 - acc: 0.9391 - val_loss: 0.4447 - val_acc: 0.8696
Epoch 345/400
50000/50000 [==============================] - 84s - loss: 0.1768 - acc: 0.9371 - val_loss: 0.4423 - val_acc: 0.8722
Epoch 346/400
50000/50000 [==============================] - 85s - loss: 0.1766 - acc: 0.9373 - val_loss: 0.4394 - val_acc: 0.8704
Epoch 347/400
50000/50000 [==============================] - 85s - loss: 0.1784 - acc: 0.9371 - val_loss: 0.4373 - val_acc: 0.8724
Epoch 348/400
50000/50000 [==============================] - 85s - loss: 0.1820 - acc: 0.9352 - val_loss: 0.4442 - val_acc: 0.8688
Epoch 349/400
50000/50000 [==============================] - 85s - loss: 0.1799 - acc: 0.9363 - val_loss: 0.4438 - val_acc: 0.8695
Epoch 350/400
50000/50000 [==============================] - 85s - loss: 0.1775 - acc: 0.9377 - val_loss: 0.4415 - val_acc: 0.8709
Epoch 351/400
50000/50000 [==============================] - 85s - loss: 0.1778 - acc: 0.9367 - val_loss: 0.4342 - val_acc: 0.8732
Epoch 352/400
50000/50000 [==============================] - 85s - loss: 0.1783 - acc: 0.9349 - val_loss: 0.4407 - val_acc: 0.8750
Epoch 353/400
50000/50000 [==============================] - 85s - loss: 0.1732 - acc: 0.9382 - val_loss: 0.4416 - val_acc: 0.8721
Epoch 354/400
50000/50000 [==============================] - 85s - loss: 0.1731 - acc: 0.9379 - val_loss: 0.4302 - val_acc: 0.8729
Epoch 355/400
50000/50000 [==============================] - 85s - loss: 0.1769 - acc: 0.9359 - val_loss: 0.4283 - val_acc: 0.8735
Epoch 356/400
50000/50000 [==============================] - 85s - loss: 0.1755 - acc: 0.9379 - val_loss: 0.4441 - val_acc: 0.8724
Epoch 357/400
50000/50000 [==============================] - 84s - loss: 0.1738 - acc: 0.9375 - val_loss: 0.4281 - val_acc: 0.8755
Epoch 358/400
50000/50000 [==============================] - 85s - loss: 0.1743 - acc: 0.9383 - val_loss: 0.4355 - val_acc: 0.8747
Epoch 359/400
50000/50000 [==============================] - 85s - loss: 0.1703 - acc: 0.9405 - val_loss: 0.4298 - val_acc: 0.8728
Epoch 360/400
50000/50000 [==============================] - 85s - loss: 0.1740 - acc: 0.9373 - val_loss: 0.4394 - val_acc: 0.8741
Epoch 361/400
49984/50000 [============================>.] - ETA: 0s - loss: 0.1721 - acc: 0.9385. 90% epoch=360 acc=0.938480 loss=0.172114 val_acc=0.872200 val_loss=0.435809
        max_acc=0.940540 max_val_acc=0.875500
50000/50000 [==============================] - 84s - loss: 0.1721 - acc: 0.9385 - val_loss: 0.4358 - val_acc: 0.8722
Epoch 362/400
50000/50000 [==============================] - 84s - loss: 0.1722 - acc: 0.9380 - val_loss: 0.4347 - val_acc: 0.8737
Epoch 363/400
50000/50000 [==============================] - 85s - loss: 0.1728 - acc: 0.9386 - val_loss: 0.4394 - val_acc: 0.8705
Epoch 364/400
50000/50000 [==============================] - 85s - loss: 0.1730 - acc: 0.9377 - val_loss: 0.4387 - val_acc: 0.8724
Epoch 365/400
50000/50000 [==============================] - 85s - loss: 0.1736 - acc: 0.9375 - val_loss: 0.4335 - val_acc: 0.8741
Epoch 366/400
50000/50000 [==============================] - 84s - loss: 0.1702 - acc: 0.9391 - val_loss: 0.4332 - val_acc: 0.8742
Epoch 367/400
50000/50000 [==============================] - 85s - loss: 0.1705 - acc: 0.9392 - val_loss: 0.4384 - val_acc: 0.8720
Epoch 368/400
50000/50000 [==============================] - 85s - loss: 0.1709 - acc: 0.9401 - val_loss: 0.4403 - val_acc: 0.8722
Epoch 369/400
50000/50000 [==============================] - 84s - loss: 0.1765 - acc: 0.9372 - val_loss: 0.4325 - val_acc: 0.8741
Epoch 370/400
50000/50000 [==============================] - 85s - loss: 0.1739 - acc: 0.9392 - val_loss: 0.4410 - val_acc: 0.8718
Epoch 371/400
50000/50000 [==============================] - 85s - loss: 0.1717 - acc: 0.9385 - val_loss: 0.4361 - val_acc: 0.8733
Epoch 372/400
50000/50000 [==============================] - 85s - loss: 0.1727 - acc: 0.9383 - val_loss: 0.4420 - val_acc: 0.8747
Epoch 373/400
50000/50000 [==============================] - 84s - loss: 0.1745 - acc: 0.9378 - val_loss: 0.4399 - val_acc: 0.8748
Epoch 374/400
50000/50000 [==============================] - 84s - loss: 0.1737 - acc: 0.9385 - val_loss: 0.4409 - val_acc: 0.8688
Epoch 375/400
50000/50000 [==============================] - 85s - loss: 0.1712 - acc: 0.9384 - val_loss: 0.4347 - val_acc: 0.8704
Epoch 376/400
50000/50000 [==============================] - 85s - loss: 0.1714 - acc: 0.9393 - val_loss: 0.4412 - val_acc: 0.8750
Epoch 377/400
50000/50000 [==============================] - 85s - loss: 0.1676 - acc: 0.9400 - val_loss: 0.4397 - val_acc: 0.8735
Epoch 378/400
50000/50000 [==============================] - 86s - loss: 0.1697 - acc: 0.9387 - val_loss: 0.4334 - val_acc: 0.8720
Epoch 379/400
50000/50000 [==============================] - 86s - loss: 0.1662 - acc: 0.9407 - val_loss: 0.4374 - val_acc: 0.8743
Epoch 380/400
50000/50000 [==============================] - 86s - loss: 0.1706 - acc: 0.9396 - val_loss: 0.4367 - val_acc: 0.8719
Epoch 381/400
49984/50000 [============================>.] - ETA: 0s - loss: 0.1666 - acc: 0.9404. 95% epoch=380 acc=0.940420 loss=0.166581 val_acc=0.874700 val_loss=0.430916
        max_acc=0.940740 max_val_acc=0.875500
50000/50000 [==============================] - 86s - loss: 0.1666 - acc: 0.9404 - val_loss: 0.4309 - val_acc: 0.8747
Epoch 382/400
50000/50000 [==============================] - 86s - loss: 0.1638 - acc: 0.9423 - val_loss: 0.4522 - val_acc: 0.8698
Epoch 383/400
50000/50000 [==============================] - 86s - loss: 0.1680 - acc: 0.9401 - val_loss: 0.4319 - val_acc: 0.8745
Epoch 384/400
50000/50000 [==============================] - 86s - loss: 0.1639 - acc: 0.9417 - val_loss: 0.4411 - val_acc: 0.8735
Epoch 385/400
50000/50000 [==============================] - 86s - loss: 0.1677 - acc: 0.9405 - val_loss: 0.4382 - val_acc: 0.8750
Epoch 386/400
50000/50000 [==============================] - 86s - loss: 0.1668 - acc: 0.9404 - val_loss: 0.4364 - val_acc: 0.8719
Epoch 387/400
50000/50000 [==============================] - 86s - loss: 0.1662 - acc: 0.9413 - val_loss: 0.4372 - val_acc: 0.8750
Epoch 388/400
50000/50000 [==============================] - 86s - loss: 0.1696 - acc: 0.9396 - val_loss: 0.4420 - val_acc: 0.8740
Epoch 389/400
50000/50000 [==============================] - 86s - loss: 0.1655 - acc: 0.9412 - val_loss: 0.4458 - val_acc: 0.8721
Epoch 390/400
50000/50000 [==============================] - 86s - loss: 0.1672 - acc: 0.9409 - val_loss: 0.4337 - val_acc: 0.8741
Epoch 391/400
50000/50000 [==============================] - 86s - loss: 0.1664 - acc: 0.9416 - val_loss: 0.4384 - val_acc: 0.8735
Epoch 392/400
50000/50000 [==============================] - 86s - loss: 0.1671 - acc: 0.9408 - val_loss: 0.4416 - val_acc: 0.8738
Epoch 393/400
50000/50000 [==============================] - 86s - loss: 0.1615 - acc: 0.9417 - val_loss: 0.4489 - val_acc: 0.8733
Epoch 394/400
50000/50000 [==============================] - 86s - loss: 0.1642 - acc: 0.9411 - val_loss: 0.4409 - val_acc: 0.8721
Epoch 395/400
50000/50000 [==============================] - 86s - loss: 0.1614 - acc: 0.9441 - val_loss: 0.4507 - val_acc: 0.8689
Epoch 396/400
50000/50000 [==============================] - 86s - loss: 0.1684 - acc: 0.9397 - val_loss: 0.4442 - val_acc: 0.8744
Epoch 397/400
50000/50000 [==============================] - 86s - loss: 0.1657 - acc: 0.9410 - val_loss: 0.4436 - val_acc: 0.8737
Epoch 398/400
50000/50000 [==============================] - 86s - loss: 0.1622 - acc: 0.9424 - val_loss: 0.4445 - val_acc: 0.8707
Epoch 399/400
50000/50000 [==============================] - 86s - loss: 0.1653 - acc: 0.9416 - val_loss: 0.4397 - val_acc: 0.8751
Epoch 400/400
49984/50000 [============================>.] - ETA: 0s - loss: 0.1640 - acc: 0.9412 99% epoch=399 acc=0.941200 loss=0.164015
50000/50000 [==============================] - 86s - loss: 0.1640 - acc: 0.9412 - val_loss: 0.4410 - val_acc: 0.8730
Train end: 2016-11-29 23:11:15
Total run time: 34452.96 seconds
max_acc = 0.944140  epoc = 394
max_val_acc = 0.875500  epoc = 356
Training: accuracy   = 0.999160 loss = 0.013974
Validation: accuracy = 0.873000 loss = 0.441008
Over fitting score   = 0.053933
Under fitting score  = 0.044921
Saving model6 to "model6.h5"
Saving history dict to pickle file: hist4.p
In [9]:
loss, accuracy = model6.evaluate(X_train, Y_train, verbose=0)
print("Training: accuracy = %f  ;  loss = %f" % (accuracy, loss))
Training: accuracy = 0.999160  ;  loss = 0.013974
In [10]:
loss, accuracy = model6.evaluate(X_test, Y_test, verbose=0)
print("Validation: accuracy = %f  ;  loss = %f" % (accuracy, loss))
Validation: accuracy = 0.873000  ;  loss = 0.441008

Indeed, the SReLU activation has raised training accuracy to 99.91%, but validation accuracy (87.3%) is still a long way behind. Overcoming overfitting still remains an interesting challenge? or do we need more training samples? There are 80 nillion more samples in the CIFAR database, so it can be tried ...

Project

Find photos of dogs, cats, and automobiles, on the internet or in your personnal photo albums, and see if our models are recognizing them as such. You will have to rescale your pictures to fit the model input shape. Take a look at to see how you can feed them to one of the models that we built in this tutorial.
https://blog.rescale.com/neural-networks-using-keras-on-rescale/

Here is a simple minded way to do it (but read scipy manual to understand more):

In [ ]:
import numpy as np
from scipy.misc import imread, imresize
from keras.models import load_model

def rescale_image(image_file):
    im = imresize(imread(image_file, 0, 'RGB'), (32, 32, 3))
    return im

def load_and_scale_imgs(img_files):
    imgs = [rescale_image(img_file)) for img_file in img_files]
    return np.array(imgs)

model = load_model("model5.h5")   # This the last model we saved
predictions = model.predict_classes(imgs)
for image_file,pred in zip(image_ficlass_nameedictions):
    print("%s: %s" % (image_file, class_name[pred]))