Distant Viewing with Deep Learning: Part 1

This tutorial provides a hands-on introduction to the use of deep learning techniques in the study of large image corpora. The TensorFlow and Keras libraries within the Python programming language are used to facilitate this analysis. No prior programming experience is required.

If you are seeing this text, you must have already downloaded Python and the data files. Currently, you are running an IPython notebook. This is a document that runs Python in your browser and provides space for integrating text (like this) and code (below!). The format of this tutorial involves a mixture of running code that we have provided and copying/writing your own code. Let's get started!

Step 0: Code of conduct

In order for this tutorial to be successful, we ask that participants take note of the following guidelines throughout the session:

  • This is an interactive workshop, and we expect everyone to follow along with the tutorial.
  • At the same time, please do not work ahead in the tutorial. If you are finished with a section ahead of time, you are more than welcome to hack away at our code. We find that staying together through the tutorial works best for everyone involved.
  • Hands on your own computer. Unless otherwise noted, please refrain from writing code on other's computers. You are more than welcome to explain to your neighbors what is going on in their notebook, but we want everyone to feel comfortable working with the code themselves.

If you have any questions or concerns, please let us know!

Step 1: Goals

We have a busy tutorial planned for today. At the end, we intend for you to be comfortable with the following tasks:

  • Read images into Python and display them
  • Organize a corpus of images and their metadata
  • Extract simple numeric features from images and use these to compare images
  • Apply pre-constructed neural networks to images in order to detect objects
  • Apply pre-constructed neural networks to images to produce more elaborate comparisons
  • Detect and identify faces in images
  • Build a set of webpages to explore a corpus of images and the above metrics

Depending on our pace, we may not have time to finish all of these tasks, in which case you should be able to follow along on your own through the files provided.

Step 2: Running code in an IPython notebook

Below you will see a small snippet of python code. The first line prints out a welcome message and the second adds together two numbers. You can run them by clicking on the code block and hitting the "Run" button towards the top of this window.

In [1]:
print("Welcome to Python!")
3 + 7
Welcome to Python!
Out[1]:
10

You can edit any of the code in this notebook. After clicking on the block just type and edit as you would do in any other online form, such as editing an email. Change the code above to add together the numbers 3 and 10; rerun the block to print out the new answer.

Next, we need to load several python modules that provide functionalities that will be used throughout this tutorial. We will also set-up some default parameters that make the graphical output easier to look at. Make sure you run this block of code prior to proceeding.

In [2]:
%pylab inline

import numpy as np
import scipy as sp
import pandas as pd

import os
from os.path import join
Populating the interactive namespace from numpy and matplotlib
In [3]:
import matplotlib.pyplot as plt
import matplotlib.patches as patches

plt.rcParams["figure.figsize"] = (8,8)

This should run smoothly if you are using Anaconda Python version 3.5 or higher. If there are any errors, please raise your hand and let us know as soon as possible.

Step 3: Python arrays

Much of what we need to do in Python concerning images will involve manipulating collections of numbers, known as arrays. We will not have time to give a full introduction to arrays, but here are some of the most important things to keep in mind.

Here is a one-dimensional array, which is just a linear collection of numbers.

In [4]:
ar = np.array([4, 1, 2, 0, 7, 6, 9, 2, 8, 1, 0, 3])
ar
Out[4]:
array([4, 1, 2, 0, 7, 6, 9, 2, 8, 1, 0, 3])

We can access a particular element in the array using a square bracket with the number of element we want to access. However, Python starts numbering things at zero. So to get the first element we need to write ar[0]:

In [5]:
ar[0]
Out[5]:
4

The third element, similarly, can be accessed as:

In [6]:
ar[2]
Out[6]:
2

Write and run the code in the box below that accesses the 7th element of the array (which is equal to 9)

In [7]:
ar[6]
Out[7]:
9

Arrays can also be a two-dimensional grid of numbers. For example, here is an array with three rows and four columns.

In [8]:
ar = np.array([4, 1, 2, 0, 7, 6, 9, 2, 8, 1, 0, 3]).reshape((3,4))
ar
Out[8]:
array([[4, 1, 2, 0],
       [7, 6, 9, 2],
       [8, 1, 0, 3]])

To access an element in two dimensions, we need to specify the row number, a comma, and then the column number. Again, Python starts at zero.

In [9]:
ar[1, 2]
Out[9]:
9

Finally, we can access a full row or column by using a colon :. It is interpreted as selecting every row/column.

In [10]:
print(ar[0, :])
print(ar[:, 2])
[4 1 2 0]
[2 9 0]

Being familiar with this notation will be useful in the following code snippets, but you will not be required to write any array-based code from scratch so do not worry if this notation is new to you.

Step 4: Images in Python

We are now ready to read images into Python. We have several corpora that we will be working on in a few minutes, but for now let's just read in a test image I took of a teapot in my kitchen at home. To do this, we need to tell Python where the image is (its in a directory called 'test', which is inside a directory called 'images' and the file is called 'teapot.jpg'). Once we have the filename, we can read in the image into Python with the function imread as follows:

In [11]:
img = imread(join("images", "test", "teapot.jpg"))

There is now an object in python called img that contains all of the data that describes my image of a teapot. We can have Python print the image itself by calling the function plt.imshow on the image, as follows:

In [12]:
plt.imshow(img)
Out[12]:
<matplotlib.image.AxesImage at 0x116434630>

It may not look like much, but trust me, it makes delicious tea.

How exactly is Python storing the teapot image? Understanding the internal structure of an image object will be very important for today's tutorial. We can get some idea by looking at the shape property of the image object

In [13]:
img.shape
Out[13]:
(2016, 1512, 3)

It turns out that Python stores images as an array of numbers, but here the array has three dimensions. We can think of it as storing color images as three grids of numbers. These numbers tell Python the degree to which each pixel should be represented by red, green, and blue light. The shape above tells us that the image is 2016 pixels high and 1512 pixels wide. The third number reminds us that the image contains red, green, and blue pixels.

We can print out the actual numbers in the image object, though looking at all of the numbers would be overwhelming. Let's took a slice of the img object from 1000-1010 vertical axis and 600-610 on the horizontal axis.

In [14]:
print("Red:")
print(img[1000:1010, 600:610, 0])

print("Green:")
print(img[1000:1010, 600:610, 1])

print("Blue:")
print(img[1000:1010, 600:610, 2])
Red:
[[198 199 199 197 195 196 196 194 194 198]
 [193 195 191 192 198 196 193 198 198 200]
 [196 192 192 196 197 193 191 194 199 199]
 [195 193 196 198 195 192 191 191 191 193]
 [186 194 195 192 195 194 193 197 191 192]
 [190 195 194 191 193 196 197 200 197 193]
 [198 192 191 194 195 196 197 196 194 193]
 [197 189 183 192 202 198 192 196 193 195]
 [190 195 192 190 195 194 189 190 188 193]
 [191 195 195 192 190 188 188 188 189 191]]
Green:
[[21 22 22 20 18 19 19 17 17 21]
 [16 18 14 15 21 19 16 21 21 23]
 [19 15 15 19 20 16 14 17 22 22]
 [18 16 19 21 18 15 14 14 14 16]
 [ 9 17 18 15 18 17 16 20 14 15]
 [13 18 17 14 16 19 20 23 20 16]
 [24 15 14 17 18 19 20 19 17 16]
 [23 12  6 15 25 21 15 19 16 18]
 [16 21 18 16 21 22 17 18 14 19]
 [17 21 21 18 16 16 16 16 15 17]]
Blue:
[[49 50 48 46 44 45 45 43 43 47]
 [44 46 40 41 47 45 42 47 47 49]
 [47 41 41 45 46 42 40 43 48 48]
 [44 42 45 47 44 41 40 40 40 42]
 [35 43 44 41 44 43 42 46 40 41]
 [39 44 43 40 42 45 46 49 46 42]
 [49 41 40 43 44 45 46 45 43 42]
 [48 38 32 41 51 47 41 45 42 44]
 [41 46 43 41 46 46 41 42 39 44]
 [42 46 46 43 41 40 40 40 40 42]]

The numbers in the image object range from 0 to 255. The higher the number the more that color shows up in a given pixel. If all three colors are 255 that would lead to a white pixel; all three equal to 0 gives a black pixel.

Above, we see that the red pixels are larger than the green and blue. Does this make sense given the image of my teapot and the part of the image that we selected above?

Your turn. Read in a similar photograph I took a bottle of Dickel Rye whiskey. Dickely Rye is an excellent alternative to tea in the evening hours, particularly if you are trying to avoid caffeine.

In [15]:
img = imread(join("images", "test", "dickel.jpg"))

In the code block below, write and execute the code to plot the new image.

In [16]:
plt.imshow(img)
Out[16]:
<matplotlib.image.AxesImage at 0x117914ba8>

Copy and re-run the code we used to see the amount of red, green, and blue pixels used in the middle of the photo.

In [17]:
print("Red:")
print(img[1000:1010, 600:610, 0])

print("Green:")
print(img[1000:1010, 600:610, 1])

print("Blue:")
print(img[1000:1010, 600:610, 2])
Red:
[[41 43 49 54 57 63 60 48 47 59]
 [41 47 50 50 50 49 48 47 49 61]
 [51 46 48 50 45 43 45 43 51 58]
 [57 46 45 49 48 47 49 48 53 54]
 [51 50 43 44 51 49 51 62 62 52]
 [50 49 44 42 46 51 58 69 62 53]
 [54 49 49 47 42 49 60 60 52 55]
 [49 51 53 46 38 44 52 51 58 59]
 [57 54 59 68 55 37 47 59 62 54]
 [59 58 58 68 62 47 52 64 53 48]]
Green:
[[69 71 77 82 85 91 88 76 75 87]
 [69 75 78 78 78 77 76 75 77 89]
 [79 74 76 78 73 71 73 71 79 86]
 [85 74 73 77 76 75 77 76 81 82]
 [79 78 71 72 79 77 79 90 90 80]
 [78 77 72 70 74 76 83 94 87 78]
 [79 74 74 72 67 74 85 85 77 80]
 [74 76 78 71 63 69 77 76 83 84]
 [82 79 84 93 79 61 71 83 87 79]
 [84 83 83 93 87 71 76 88 78 73]]
Blue:
[[47 49 55 60 63 69 66 54 53 65]
 [47 53 56 56 56 55 54 53 55 67]
 [57 52 54 56 51 49 51 49 57 64]
 [62 51 51 55 54 53 55 54 59 60]
 [56 55 48 50 57 55 57 68 68 58]
 [55 54 49 48 52 55 62 73 66 57]
 [57 52 52 51 46 53 64 64 56 59]
 [52 54 56 50 42 48 56 55 62 63]
 [61 57 62 71 57 39 49 61 65 57]
 [63 62 61 71 65 49 54 66 56 51]]

Which color is the most dominant? Is it the one you would expect. How do these colors compare to the ones we saw for the teapot image?

Step 5: Describing images numerically

As we saw, images in Python are represented by large tables of numbers. However, the values for an individual pixel are not particularly meaningful. It is only the image as a whole that holds a larger meaning to us. At the heart of this tutorial is finding ways to bridge the gap between numeric data and visual meaning. As a warning going forward: this is not an easy or entirely solved process, so do not get discourage by our first few attempts.

We have already seen that we can make some sense of an image by looking at the relative amount of red, green, and blue pixels that it uses. We can't look at every single pixel, however. Another strategy would be to take the average of the three color channels. Reading the teapot image back in, we can see this with the following code:

In [18]:
img = imread(join("images", "test", "teapot.jpg"))

print("\nRed mean:")
print(np.mean(img[:, :, 0]))
        
print("\nGreen mean:")
print(np.mean(img[:, :, 1]))
        
print("\nBlue mean:")
print(np.mean(img[:, :, 2]))
Red mean:
195.17275486583523

Green mean:
144.05899070662215

Blue mean:
139.65847328514738

Again, we see that this shows that the teapot has a lot of the color red. How does this work for the bottle of Rye whiskey.

In [19]:
img = imread(join("images", "test", "dickel.jpg"))

print("\nRed mean:")
print(np.mean(img[:, :, 0]))
        
print("\nGreen mean:")
print(np.mean(img[:, :, 1]))
        
print("\nBlue mean:")
print(np.mean(img[:, :, 2]))
Red mean:
157.2604580026455

Green mean:
156.9604513101537

Blue mean:
139.72969583280843

Here, the amount of red has decreased and the amount of green has increased. However, we see that there is still a lot of all three colors here. Why is that? For one thing, the image has a lot of white in the background and all three colors need to light up for this to be visible. Because of this, the absolute intensity of each pixel is often not very useful on its own.

As an alternative, let's compute a new table of numbers img_maxcol showing the maximum value of all the intensities for a given pixel. Similarly, we will compute img_mincol as the minimum color intensity for a pixel. Notice that these have the same number of row and columns as the original image

In [20]:
img = imread(join("images", "test", "teapot.jpg"))

img_maxcol = np.amax(img, 2)
img_mincol = np.amin(img, 2)
print(img_maxcol.shape)
print(img_mincol.shape)
(2016, 1512)
(2016, 1512)

We can define a quantity called the saturation, which measures the richness of a color, as the difference between the maximum and minimum pixel intensities divided by the maximum intensitiy. Applying this to teapot:

In [21]:
img_sat = (img_maxcol - img_mincol) / img_maxcol
plt.imshow(img_sat, cmap='gray')
Out[21]:
<matplotlib.image.AxesImage at 0x1179e6358>