Deep Learning with Caffe

06 Oct 2015

I want to play around with some Deep Learning algorithms. After some readings, I decided to do some experiments with Caffe. Caffe is a deep learning framework made with expression, speed, and modularity in mind.

It is fairly easy to start using some Deep Neural Nets with Caffe. There are plenty of pre-trained model that you can use right away to do some experiments. In this post I want to share my experience with Caffe under Ubuntu 14.04 LTS. During the installation I mainly followed this guide.

Installation

First, I installed some dependencies.

sudo apt-get install libprotobuf-dev libleveldb-dev libsnappy-dev 
sudo apt-get install libopencv-dev libhdf5-serial-dev protobuf-compiler
sudo apt-get install --no-install-recommends libboost-all-dev gfortran
sudo apt-get install cuda python-dev libatlas-dev libatlas-base-dev
sudo apt-get install libgflags-dev libgoogle-glog-dev liblmdb-dev
sudo pip install numpy

Then, I cloned Caffe from the official repository and I compiled it.

cd caffe
mkdir build
cd build
cmake ..
make

Finally I had to install some packages required by the python module…

sudo pip install --upgrade pip
sudo pip install scipy
sudo pip install scikit-image
sudo pip install protobuf
sudo pip install pyyaml

… and I added the caffe’s python folder in the environment variable PYTHONPATH.

export PYTHONPATH=/path/to/caffe/python/

Now we are ready to go… let’s do some test!

Test

I started usign this tutorial. The goal is to have a feature extractor that is working using the pre-trained model from Krizhevsky et al. that is working on the ImageNet dataset. We need to download the model and the labels first.

cd /path/to/caffe
#download the model
./scripts/download_model_binary.py ./models/bvlc_reference_caffenet
#download the labels
./data/ilsvrc12/get_ilsvrc_aux.sh

cd /path/to/sandbox
mkdir images

I downloaded few pictures and added to the images folder.

And with the following script I run the feature-extraction/classification. The script is really simple, however I’m planning to write ASAP a script that can deal with tons of images.

import numpy as np
import caffe

caffe_root = '/path/to/caffe/'  

caffe.set_mode_gpu()
#caffe.set_mode_cpu() if you don't have an NVIDIA card

net = caffe.Net(caffe_root + 'models/bvlc_reference_caffenet/deploy.prototxt',
                caffe_root + 'models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel',
                caffe.TEST)

# input preprocessing: 'data' is the name of the input blob == net.inputs[0]
transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})
transformer.set_transpose('data', (2,0,1))
transformer.set_mean('data', np.load(caffe_root + 'python/caffe/imagenet/ilsvrc_2012_mean.npy').mean(1).mean(1)) # mean pixel
transformer.set_raw_scale('data', 255)  # the reference model operates on images in [0,255] range instead of [0,1]
transformer.set_channel_swap('data', (2,1,0))  # the reference model has channels in BGR order instead of RGB
imagenet_labels_filename = caffe_root + 'data/ilsvrc12/synset_words.txt'
labels = np.loadtxt(imagenet_labels_filename, str, delimiter='\t')
net.blobs['data'].reshape(50,3,227,227)

net.blobs['data'].data[...] = transformer.preprocess('data', caffe.io.load_image('./images/wine.jpg'))
out = net.forward()
top_k = net.blobs['prob'].data[0].flatten().argsort()[-1:-6:-1]
print labels[top_k]

net.blobs['data'].data[...] = transformer.preprocess('data', caffe.io.load_image('./images/dog.jpg'))
out = net.forward()
top_k = net.blobs['prob'].data[0].flatten().argsort()[-1:-6:-1]
print labels[top_k]

net.blobs['data'].data[...] = transformer.preprocess('data', caffe.io.load_image('./images/old_camera.jpg'))
out = net.forward()
top_k = net.blobs['prob'].data[0].flatten().argsort()[-1:-6:-1]
print labels[top_k]

And the results is the following, quite impressive ;)

['n07892512 red wine' 'n04591713 wine bottle' 'n02823428 beer bottle'
 'n04579145 whiskey jug' 'n03690938 lotion']
['n02099601 golden retriever'
 'n02102318 cocker spaniel, English cocker spaniel, cocker'
 'n02108551 Tibetan mastiff' 'n02102480 Sussex spaniel'
 'n02100877 Irish setter, red setter']
['n04069434 reflex camera'
 'n03976467 Polaroid camera, Polaroid Land camera' 'n04009552 projector'
 'n03843555 oil filter' 'n03666591 lighter, light, igniter, ignitor']

Nicola Pezzotti

Scientist, Engineer and Leader in
Artificial Intelligence, Visual Analytics and Computer Science

Deep Learning with Caffe

Installation

Test