Distant Viewing with Deep Learning: Part 2

In Part 2 of this tutorial, we introduce the concepts of deep learning and show it yields interesting similarity metrics and is able to extract feature useful features such as the presence and location of faces in the image.

Step 9: Python modules for deep learning

We need to reload all of the Python modules we used in the Part 1.

In [1]:
%pylab inline
import collections

import numpy as np
import scipy as sp
import pandas as pd

import importlib
import os
from os.path import join
from matplotlib.colors import rgb_to_hsv
Populating the interactive namespace from numpy and matplotlib
In [2]:
import matplotlib.pyplot as plt
import matplotlib.patches as patches
plt.rcParams["figure.figsize"] = (8,8)

We also need to reload the wikiart metadata.

In [3]:
wikiart = pd.read_csv("meta/wikiart.csv")

To run the code in this notebook from scratch, you will also need the keras module for working with neural networks. This are not included in the default Anaconda Python installation and need to be installed seperately. The code below checks if you have keras installed. If you do, it will be loaded. Otherwise, a flag will be set so that the code below that requires keras will load the pre-loaded data.

In [4]:
if importlib.util.find_spec("keras") is not None:
    from keras.applications.vgg19 import VGG19
    from keras.preprocessing import image
    from keras.applications.vgg19 import preprocess_input, decode_predictions
    from keras.models import Model
    keras_flag = True
else:
    keras_flag = False
Using TensorFlow backend.
/anaconda3/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: compiletime version 3.6 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.5
  return f(*args, **kwds)

If you are struggling with installing these, we are happy to assist. You'll be able to follow along without keras, but will not be able to apply the techniques you learned today to new datasets without it.

Step 10: Applying deep learning with neural networks

We start by loading a particular neural network model called VGG19. It contains 25 layers and over 143 million parameters. The code below reads in the entire model and prints out it structure (unless keras is unavailable, in which case a saved version of the model is printed just for reference).

In [5]:
if keras_flag:
    vgg19_full = VGG19(weights='imagenet')
    vgg19_full.summary()
else:
    with open('data/vgg19.txt','r') as f:
        for line in f:
            print(line, end='')
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 224, 224, 3)       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 56, 56, 256)       295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_conv4 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 28, 28, 256)       0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 28, 28, 512)       1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_conv4 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 14, 14, 512)       0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv4 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 7, 7, 512)         0         
_________________________________________________________________
flatten (Flatten)            (None, 25088)             0         
_________________________________________________________________
fc1 (Dense)                  (None, 4096)              102764544 
_________________________________________________________________
fc2 (Dense)                  (None, 4096)              16781312  
_________________________________________________________________
predictions (Dense)          (None, 1000)              4097000   
=================================================================
Total params: 143,667,240
Trainable params: 143,667,240
Non-trainable params: 0
_________________________________________________________________

The VGG19 model was trained to identify 1000 classes of objects within an image. It was built as part of the ImageNet challenge, one of the most influential computer vision competitions that has been running since 2010.

We will load a test photo of my dog and see what classes the model predicts for the image. We will use a slightly different function to read in the image that scales it to have 224-by-224 pixels as required by the algorithm.

In [6]:
img_path = join("images", "test", "dog.jpg")
if keras_flag:
    img = image.load_img(img_path, target_size=(224, 224))
    x = image.img_to_array(img)
    x = np.expand_dims(x, axis=0)
    x = preprocess_input(x)
else:
    img = imread(img_path)
    x = img.copy().astype(np.float32)
    x = np.expand_dims(x, axis=0)
    
x.shape
Out[6]:
(1, 224, 224, 3)

Notice that it is now a four dimensional array, a point that we will come back to in a moment. We can look at the image here using the imshow function.

In [7]:
plt.imshow(img)
Out[7]:
<matplotlib.image.AxesImage at 0xb26f00c50>

Assuming you have keras installed, the code here takes the image x and predicts values from the model. Notice that the output of the model is a sequence of 1000 numbers. These indicate the predicted probability that the image contains each of one of the 1000 pre-selected categories. The function decode_predictions converts these to give the names of the five most likely categories.

In [8]:
if keras_flag:
    y = vgg19_full.predict(x)
    print(y.shape)
    for pred in decode_predictions(y)[0]:
        print(pred)
else:
    print((1, 1000))
    import pickle
    with open (join('data', 'dog_pred_class.pickle'), 'rb') as fp:
        results = pickle.load(fp)
    for pred in results:
        print(pred)
(1, 1000)
('n02086240', 'Shih-Tzu', 0.13935143)
('n02086079', 'Pekinese', 0.11240619)
('n02102318', 'cocker_spaniel', 0.111678)
('n02085620', 'Chihuahua', 0.100794666)
('n02086646', 'Blenheim_spaniel', 0.08471184)

The largest predicted class is a "Shih-Tzu", incidently an exact match for his breed! The other dogs are all similarly sized dogs, and obvious choices for making a mistake.

Now, let's compute the category predictions for each image in the corpus. This involves reading in each image in the wikiart corpus and then running them through the VGG19 model. This can take some time, particularly on an older machine, so we have created a flag called process_new. Keep it to False to load pre-computed categories; you can switch it to True if you want to compute them directly

In [9]:
process_new = False

if process_new:
    wikiart_img = np.zeros((wikiart.shape[0], 224, 224, 3))

    for index, row in wikiart.iterrows():
        img_path = join('images', 'wikiart', row['filename'])
        img = image.load_img(img_path, target_size=(224, 224))
        x = image.img_to_array(img)
        wikiart_img[index, :, :, :] = x
        if (index % 50) == 0:
            print("Done with {0:03d}".format(index))
        
    wikiart_img = preprocess_input(wikiart_img)
    wikiart_raw = vgg19_full.predict(wikiart_img, verbose=True)
    wikiart_vgg19 = decode_predictions(wikiart_raw, top=20)
    
else:
    wikiart_vgg19 = np.load("data/wikiart_vgg19_categories.npy")

print(wikiart_vgg19.shape)
(644, 20, 3)

What's the most common top category type for this collection? When can use the Python module collections to look at the top-10 most common:

In [10]:
collections.Counter(wikiart_vgg19[:, 1, 1]).most_common(10)
Out[10]:
[('cliff', 56),
 ('fountain', 39),
 ('jigsaw_puzzle', 34),
 ('lakeside', 29),
 ('book_jacket', 25),
 ('castle', 20),
 ('altar', 17),
 ('valley', 16),
 ('barn', 13),
 ('cliff_dwelling', 13)]

Cliffs and fountains both seem reasonable, but I doubt there are many jigsaw puzzels in the wikiart corpus. Any idea by this might be so common?

Step 11: Neural network embedding

The VGG19 model was constructed in order to predict the objects present in an image, but there is a lot more that we can do with the model. The amazing property of deep learning is that the intermediate results in the neural network operate by detecting lower-level features of the image. For example, the first few detect edges and textures, the next few by understanding shapes, and the latter ones put these together to detect objects. This is incredibly useful because it means that looking at the intermediate outputs can tell us something interesting about the images beyond just the 1000 predicted categories.

Assuming the keras module is installed, we will create a new model that outputs the second-to-last output of the model. The prediction of this contains 4096 dimensions. These do not correspond directly to categories, but (in theory) images containing similar objects should have similar 4096-dimensional values.

In [11]:
if keras_flag:
    vgg_fc2 = Model(inputs=vgg19_full.input, outputs=vgg19_full.get_layer('fc2').output)
    y = vgg_fc2.predict(x)
    print(y.shape)
else:
    print((1, 4096))
(1, 4096)

We can use this new model to predict values on the set of images wikiart_img. As above, this can take a few minutes, so you may want to load the pre-saved data again by keeping process_new equal to False.

In [12]:
process_new = False

if process_new:
    wikiart_fc2 = vgg_fc2.predict(wikiart_img, verbose=True)
    wikiart_fc2.shape
else:
    wikiart_fc2 = np.load("data/wikiart_vgg19_fc2.npy")

print(wikiart_fc2.shape)
(644, 4096)

Now, we can use these values to figure out which images are similar to another image. This is similar to the closest saturation values, but using a more complex numeric metric for comparison. Compare the results here with those from saturation alone:

In [13]:
plt.figure(figsize=(14, 14))

dists = np.sum(np.abs(wikiart_fc2 - wikiart_fc2[1, :]), 1)
idx = np.argsort(dists.flatten())[:12]

for ind, i in enumerate(idx):
    try:
        plt.subplots_adjust(left=0, right=1, bottom=0, top=1)
        plt.subplot(3, 4, ind + 1)

        img_path = join('images', 'wikiart', wikiart.iloc[i]['filename'])
        img = imread(img_path)
        plt.imshow(img)
        plt.axis("off")
    except:
        pass