# Training a Convolutional Neural Network¶

Sept. 26, 2018

This is the third part of my image classification project. I have been experimenting with different models, and today I am finally able to drastically improve my model's performance.

In part 1 of the project, we used logistic regression and obtained a 0.72 precision and 0.99 recall. This is not bad for such a simple model. This means that the model is having only 0.01 false negative rate, so given that a patient has pneumonia, the model will almost surely find out! This is extremely important in medicine. However, the downside is that the model is making a lot of false positive claims. This means that a lot of healthy patients will be mis-diagnosed as having pneumonia. Although this is not life-threatening, 0.28 is still too high for the false positive rate.

In part 2 we used a simple feed forward neural network, and obtained the same precision and recall. One major problem with the previous two models is overfitting. The test set accuracy is only 0.75 even though the training set accuracy is as high as 0.96. Clearly the model has a low bias and a high variance. Using a simple Convolutional Neural Network, I am able to drastically improve the model's performance. Let's get started!

## Data Preparataion¶

First let's import some libraries.

In [3]:
import numpy as np
import tensorflow as tf
from tensorflow.python.framework import ops
import matplotlib.pyplot as plt
from sklearn.metrics import classification_report
%matplotlib inline
np.random.seed(1)


We are geneting the data using the same method as before, with one exception. For convolutional neural networks, we need the response variable to be encoded as vectors. So instead of using train_labels.append(0), we are going to use train_labels.append([1, 0]), and similarly for train_labels.append([0, 1]). If you are wondering how the data preparation is done, revisit part 1 for the code. I will simply read the previously saved data in .npy format.

In [5]:
saved_data = np.load("CNN_data.npy")
X_train = saved_data[0]
Y_train = saved_data[1]
X_test = saved_data[2]
Y_test = saved_data[3]

print("Training data has dimension: {}".format(X_train.shape))
print("Training label has dimension: {}".format(Y_train.shape))
print("Testing data has dimension: {}".format(X_test.shape))
print("Testing label has dimension: {}".format(Y_test.shape))

Training data has dimension: (5216, 64, 64, 3)
Training label has dimension: (5216, 2)
Testing data has dimension: (624, 64, 64, 3)
Testing label has dimension: (624, 2)


## TensorFlow Model¶

For a single data point, the convolutional neural network architecture we are implementing is best summarized in the following figure I drew.

Note that we are doing the convolutions twice, corresponding to weights $W_1$ and $W_2$ whose dimensoins are $[4, 4, 3, 8]$ and $[2, 2, 8, 16]$, respectively. We use padding to preserve the dimensions of each channel to be 64x64 until the final step where we flatten the 3D array into a single layer of $64\times 64\times 16 = 65536$ neurons. We create the final layer by feeding the outputs from these neurons into the sigmoid function, producing two outputs as the final layer. To make prediction, we can simply use tf.argmax to find the maximum of these outputs. Let's implement it below.

First we create a placeholder for data and label. We use None as the first dimension because this place is reserved for the number of training examples.

In [14]:
# Create placeholders for the data and label
def create_placeholders(n_height, n_width, n_channel, n_class):

X = tf.placeholder(tf.float32, [None, n_height, n_width, n_channel])
Y = tf.placeholder(tf.float32, [None, n_class])

return X, Y


Next we will initialize the parameters $W_1$ and $W_2$ using Xavier initialization.

In [8]:
def initialize_parameters():
"""
The weights have shapes:
W1 : [4, 4, 3, 8]
W2 : [2, 2, 8, 16]
"""

tf.set_random_seed(1)

W1 = tf.get_variable("W1", [4, 4, 3, 8], initializer=tf.contrib.layers.xavier_initializer(seed=0))
W2 = tf.get_variable("W2", [2, 2, 8, 16], initializer=tf.contrib.layers.xavier_initializer(seed=0))

parameters = {"W1": W1, "W2": W2}

return parameters


Forward propagation and the cost function can be easily implemented in TensorFlow.

In [9]:
def forward_propagation(X, parameters, seed=1):

tf.set_random_seed(seed)

# Obtain the parameter as tensors
W1 = parameters['W1']
W2 = parameters['W2']

# First convolution layer
Z1 = tf.nn.conv2d(X, W1, strides=[1, 1, 1, 1], padding='SAME')
A1 = tf.nn.relu(Z1)
P1 = tf.nn.max_pool(A1, ksize = [1, 8, 8, 1], strides = [1, 8, 8, 1], padding='SAME')

# Second convolution layer
Z2 = tf.nn.conv2d(P1, W2, strides=[1, 1, 1, 1], padding='SAME')
A2 = tf.nn.relu(Z2)
P2 = tf.nn.max_pool(A2, ksize = [1, 4, 4, 1], strides = [1, 4, 4, 1], padding='SAME')

# Final layer
P = tf.contrib.layers.flatten(P2)
Z3 = tf.contrib.layers.fully_connected(P, 2, activation_fn=None)

return Z3

In [10]:
def compute_cost(Z3, Y):

cost = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=Z3, labels=Y))

return cost


As usual, if we have a large dataset, we should use mini-batch gradient descent instead. So we define it as follows.

In [11]:
def random_mini_batches(X, Y, mini_batch_size = 64, seed = 0):
"""
X: input data, of shape (m, H, W, C)
Y: label, of shape (m, n_y)
"""

m = X.shape[0]
mini_batches = []
np.random.seed(seed)

# Shuffle the data
permutation = list(np.random.permutation(m))
shuffled_X = X[permutation,:,:,:]
shuffled_Y = Y[permutation,:]

# Divide dataset into minibatches (not including the end case)
num_complete_minibatches = math.floor(m/mini_batch_size)
for k in range(0, num_complete_minibatches):
mini_batch_X = shuffled_X[k * mini_batch_size : k * mini_batch_size + mini_batch_size,:,:,:]
mini_batch_Y = shuffled_Y[k * mini_batch_size : k * mini_batch_size + mini_batch_size,:]
mini_batch = (mini_batch_X, mini_batch_Y)
mini_batches.append(mini_batch)

# Last minibatch
if m % mini_batch_size != 0:
mini_batch_X = shuffled_X[num_complete_minibatches * mini_batch_size : m,:,:,:]
mini_batch_Y = shuffled_Y[num_complete_minibatches * mini_batch_size : m,:]
mini_batch = (mini_batch_X, mini_batch_Y)
mini_batches.append(mini_batch)

return mini_batches


We are ready to build our model.

In [23]:
def model(X_train, Y_train, X_test, Y_test, learning_rate=0.009, num_epochs=20, minibatch_size=64):

ops.reset_default_graph()
tf.set_random_seed(1)
seed = 3

(m, n_H0, n_W0, n_C0) = X_train.shape
n_y = Y_train.shape[1]

costs = []

# Create placeholder for data and label
X, Y = create_placeholders(n_H0, n_W0, n_C0, n_y)

# Initialize the wight parameters W1 and W2
parameters = initialize_parameters()

# Building the computation graph using forward propagation
Z3 = forward_propagation(X, parameters, seed=seed)
cost = compute_cost(Z3, Y)

init = tf.global_variables_initializer()

# Executing the computational graph
with tf.Session() as sess:

sess.run(init)

for epoch in range(num_epochs):

minibatch_cost = 0.
num_minibatches = int(m / minibatch_size)
seed = seed + 1
minibatches = random_mini_batches(X_train, Y_train, minibatch_size, seed)

for minibatch in minibatches:

(minibatch_X, minibatch_Y) = minibatch
_ , temp_cost = sess.run([optimizer, cost], feed_dict={X:minibatch_X, Y:minibatch_Y})
minibatch_cost += temp_cost / num_minibatches

if epoch % 1 == 0:

print ("Epoch %i - Cost: %f" % (epoch, minibatch_cost))
print ("------------------------")

if epoch % 1 == 0:

costs.append(minibatch_cost)

# Obtaining parameters as numpy array
parameters = sess.run(parameters)

# plot the cost
plt.plot(np.squeeze(costs))
plt.ylabel('cost')
plt.xlabel('iterations (per tens)')
plt.title("Learning rate =" + str(learning_rate))
plt.show()

# Calculate the correct predictions
predict_label = tf.argmax(Z3, 1)
actual_label = tf.argmax(Y, 1)

correct_prediction = tf.equal(predict_label, actual_label)

# Calculating accuracy
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))

# Accuracy on the training and test sets
train_accuracy = accuracy.eval({X: X_train, Y: Y_train})
test_accuracy = accuracy.eval({X: X_test, Y: Y_test})

# Obtaining predicted and actual values
predict_label_eval = predict_label.eval({X: X_test, Y: Y_test})
actual_label_eval = actual_label.eval({Y: Y_test})

print("Train Accuracy:", train_accuracy)
print("Test Accuracy:", test_accuracy)

return parameters, predict_label_eval, actual_label_eval

In [27]:
parameters, predicted, actual = model(X_train, Y_train, X_test, Y_test, num_epochs=6)

Epoch 0 - Cost: 0.509968
------------------------
Epoch 1 - Cost: 0.271447
------------------------
Epoch 2 - Cost: 0.191885
------------------------
Epoch 3 - Cost: 0.163414
------------------------
Epoch 4 - Cost: 0.137610
------------------------
Epoch 5 - Cost: 0.149324
------------------------

Train Accuracy: 0.92676383
Test Accuracy: 0.8717949


Wow! We get a test accuracy of 87% which is much much better than 75% from our previous model. The problem of overfitting is gone! Let's look at the classification report.

In [28]:
print(classification_report(np.squeeze(actual), np.squeeze(predicted)))

             precision    recall  f1-score   support

0       0.88      0.76      0.82       234
1       0.87      0.94      0.90       390

avg / total       0.87      0.87      0.87       624



We get a 15% boost in precision in exchange for only 3% in recall. This is a huge improvement from our previous models. Convolutional Neural Network is certainly the best choice for image classification.