Training a Convolutional Neural Network

Sept. 26, 2018

This is the third part of my image classification project. I have been experimenting with different models, and today I am finally able to drastically improve my model's performance.

In part 1 of the project, we used logistic regression and obtained a 0.72 precision and 0.99 recall. This is not bad for such a simple model. This means that the model is having only 0.01 false negative rate, so given that a patient has pneumonia, the model will almost surely find out! This is extremely important in medicine. However, the downside is that the model is making a lot of false positive claims. This means that a lot of healthy patients will be mis-diagnosed as having pneumonia. Although this is not life-threatening, 0.28 is still too high for the false positive rate.

In part 2 we used a simple feed forward neural network, and obtained the same precision and recall. One major problem with the previous two models is overfitting. The test set accuracy is only 0.75 even though the training set accuracy is as high as 0.96. Clearly the model has a low bias and a high variance. Using a simple Convolutional Neural Network, I am able to drastically improve the model's performance. Let's get started!

Data Preparataion

First let's import some libraries.

In [3]:
import numpy as np
import tensorflow as tf
from tensorflow.python.framework import ops
import matplotlib.pyplot as plt
from sklearn.metrics import classification_report
%matplotlib inline
np.random.seed(1)

We are geneting the data using the same method as before, with one exception. For convolutional neural networks, we need the response variable to be encoded as vectors. So instead of using train_labels.append(0), we are going to use train_labels.append([1, 0]), and similarly for train_labels.append([0, 1]). If you are wondering how the data preparation is done, revisit part 1 for the code. I will simply read the previously saved data in .npy format.

In [5]:
saved_data = np.load("CNN_data.npy")
X_train = saved_data[0]
Y_train = saved_data[1]
X_test = saved_data[2]
Y_test = saved_data[3]

print("Training data has dimension: {}".format(X_train.shape))
print("Training label has dimension: {}".format(Y_train.shape))
print("Testing data has dimension: {}".format(X_test.shape))
print("Testing label has dimension: {}".format(Y_test.shape))
Training data has dimension: (5216, 64, 64, 3)
Training label has dimension: (5216, 2)
Testing data has dimension: (624, 64, 64, 3)
Testing label has dimension: (624, 2)

TensorFlow Model

For a single data point, the convolutional neural network architecture we are implementing is best summarized in the following figure I drew.

Note that we are doing the convolutions twice, corresponding to weights $W_1$ and $W_2$ whose dimensoins are $[4, 4, 3, 8]$ and $[2, 2, 8, 16]$, respectively. We use padding to preserve the dimensions of each channel to be 64x64 until the final step where we flatten the 3D array into a single layer of $64\times 64\times 16 = 65536$ neurons. We create the final layer by feeding the outputs from these neurons into the sigmoid function, producing two outputs as the final layer. To make prediction, we can simply use tf.argmax to find the maximum of these outputs. Let's implement it below.

First we create a placeholder for data and label. We use None as the first dimension because this place is reserved for the number of training examples.

In [14]:
# Create placeholders for the data and label
def create_placeholders(n_height, n_width, n_channel, n_class):

    X = tf.placeholder(tf.float32, [None, n_height, n_width, n_channel])
    Y = tf.placeholder(tf.float32, [None, n_class])

    return X, Y

Next we will initialize the parameters $W_1$ and $W_2$ using Xavier initialization.

In [8]:
def initialize_parameters():
    """
    The weights have shapes:
    W1 : [4, 4, 3, 8]
    W2 : [2, 2, 8, 16]
    """

    tf.set_random_seed(1)

    W1 = tf.get_variable("W1", [4, 4, 3, 8], initializer=tf.contrib.layers.xavier_initializer(seed=0))
    W2 = tf.get_variable("W2", [2, 2, 8, 16], initializer=tf.contrib.layers.xavier_initializer(seed=0))

    parameters = {"W1": W1, "W2": W2}

    return parameters

Forward propagation and the cost function can be easily implemented in TensorFlow.

In [9]:
def forward_propagation(X, parameters, seed=1):

    tf.set_random_seed(seed)

    # Obtain the parameter as tensors
    W1 = parameters['W1']
    W2 = parameters['W2']

    # First convolution layer
    Z1 = tf.nn.conv2d(X, W1, strides=[1, 1, 1, 1], padding='SAME')
    A1 = tf.nn.relu(Z1)
    P1 = tf.nn.max_pool(A1, ksize = [1, 8, 8, 1], strides = [1, 8, 8, 1], padding='SAME')

    # Second convolution layer
    Z2 = tf.nn.conv2d(P1, W2, strides=[1, 1, 1, 1], padding='SAME')
    A2 = tf.nn.relu(Z2)
    P2 = tf.nn.max_pool(A2, ksize = [1, 4, 4, 1], strides = [1, 4, 4, 1], padding='SAME')

    # Final layer
    P = tf.contrib.layers.flatten(P2)
    Z3 = tf.contrib.layers.fully_connected(P, 2, activation_fn=None)

    return Z3
In [10]:
def compute_cost(Z3, Y):

    cost = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=Z3, labels=Y))

    return cost

As usual, if we have a large dataset, we should use mini-batch gradient descent instead. So we define it as follows.

In [11]:
def random_mini_batches(X, Y, mini_batch_size = 64, seed = 0):
    """
    X: input data, of shape (m, H, W, C)
    Y: label, of shape (m, n_y)
    """

    m = X.shape[0]
    mini_batches = []
    np.random.seed(seed)

    # Shuffle the data
    permutation = list(np.random.permutation(m))
    shuffled_X = X[permutation,:,:,:]
    shuffled_Y = Y[permutation,:]

    # Divide dataset into minibatches (not including the end case)
    num_complete_minibatches = math.floor(m/mini_batch_size)
    for k in range(0, num_complete_minibatches):
        mini_batch_X = shuffled_X[k * mini_batch_size : k * mini_batch_size + mini_batch_size,:,:,:]
        mini_batch_Y = shuffled_Y[k * mini_batch_size : k * mini_batch_size + mini_batch_size,:]
        mini_batch = (mini_batch_X, mini_batch_Y)
        mini_batches.append(mini_batch)

    # Last minibatch
    if m % mini_batch_size != 0:
        mini_batch_X = shuffled_X[num_complete_minibatches * mini_batch_size : m,:,:,:]
        mini_batch_Y = shuffled_Y[num_complete_minibatches * mini_batch_size : m,:]
        mini_batch = (mini_batch_X, mini_batch_Y)
        mini_batches.append(mini_batch)

    return mini_batches

We are ready to build our model.

In [23]:
def model(X_train, Y_train, X_test, Y_test, learning_rate=0.009, num_epochs=20, minibatch_size=64):

    ops.reset_default_graph()
    tf.set_random_seed(1)
    seed = 3

    (m, n_H0, n_W0, n_C0) = X_train.shape
    n_y = Y_train.shape[1]

    costs = []

    # Create placeholder for data and label
    X, Y = create_placeholders(n_H0, n_W0, n_C0, n_y)

    # Initialize the wight parameters W1 and W2
    parameters = initialize_parameters()

    # Building the computation graph using forward propagation
    Z3 = forward_propagation(X, parameters, seed=seed)
    cost = compute_cost(Z3, Y)
    optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

    init = tf.global_variables_initializer()

    # Executing the computational graph
    with tf.Session() as sess:

        sess.run(init)

        for epoch in range(num_epochs):

            minibatch_cost = 0.
            num_minibatches = int(m / minibatch_size)
            seed = seed + 1
            minibatches = random_mini_batches(X_train, Y_train, minibatch_size, seed)

            for minibatch in minibatches:

                (minibatch_X, minibatch_Y) = minibatch
                _ , temp_cost = sess.run([optimizer, cost], feed_dict={X:minibatch_X, Y:minibatch_Y})
                minibatch_cost += temp_cost / num_minibatches

            if epoch % 1 == 0:

                print ("Epoch %i - Cost: %f" % (epoch, minibatch_cost))
                print ("------------------------")

            if epoch % 1 == 0:

                costs.append(minibatch_cost)

        # Obtaining parameters as numpy array
        parameters = sess.run(parameters)

        # plot the cost
        plt.plot(np.squeeze(costs))
        plt.ylabel('cost')
        plt.xlabel('iterations (per tens)')
        plt.title("Learning rate =" + str(learning_rate))
        plt.show()

        # Calculate the correct predictions
        predict_label = tf.argmax(Z3, 1)
        actual_label = tf.argmax(Y, 1)

        correct_prediction = tf.equal(predict_label, actual_label)

        # Calculating accuracy
        accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))

        # Accuracy on the training and test sets
        train_accuracy = accuracy.eval({X: X_train, Y: Y_train})
        test_accuracy = accuracy.eval({X: X_test, Y: Y_test})

        # Obtaining predicted and actual values
        predict_label_eval = predict_label.eval({X: X_test, Y: Y_test})
        actual_label_eval = actual_label.eval({Y: Y_test})

        print("Train Accuracy:", train_accuracy)
        print("Test Accuracy:", test_accuracy)

        return parameters, predict_label_eval, actual_label_eval
In [27]:
parameters, predicted, actual = model(X_train, Y_train, X_test, Y_test, num_epochs=6)
Epoch 0 - Cost: 0.509968
------------------------
Epoch 1 - Cost: 0.271447
------------------------
Epoch 2 - Cost: 0.191885
------------------------
Epoch 3 - Cost: 0.163414
------------------------
Epoch 4 - Cost: 0.137610
------------------------
Epoch 5 - Cost: 0.149324
------------------------
Train Accuracy: 0.92676383
Test Accuracy: 0.8717949

Wow! We get a test accuracy of 87% which is much much better than 75% from our previous model. The problem of overfitting is gone! Let's look at the classification report.

In [28]:
print(classification_report(np.squeeze(actual), np.squeeze(predicted)))
             precision    recall  f1-score   support

          0       0.88      0.76      0.82       234
          1       0.87      0.94      0.90       390

avg / total       0.87      0.87      0.87       624

We get a 15% boost in precision in exchange for only 3% in recall. This is a huge improvement from our previous models. Convolutional Neural Network is certainly the best choice for image classification.