Deep Learning with Caffe
I want to play around with some Deep Learning algorithms. After some readings, I decided to do some experiments with Caffe. Caffe is a deep learning framework made with expression, speed, and modularity in mind.
It is fairly easy to start using some Deep Neural Nets with Caffe. There are plenty of pre-trained model that you can use right away to do some experiments. In this post I want to share my experience with Caffe under Ubuntu 14.04 LTS. During the installation I mainly followed this guide.
Installation
First, I installed some dependencies.
sudo apt-get install libprotobuf-dev libleveldb-dev libsnappy-dev
sudo apt-get install libopencv-dev libhdf5-serial-dev protobuf-compiler
sudo apt-get install --no-install-recommends libboost-all-dev gfortran
sudo apt-get install cuda python-dev libatlas-dev libatlas-base-dev
sudo apt-get install libgflags-dev libgoogle-glog-dev liblmdb-dev
sudo pip install numpy
Then, I cloned Caffe from the official repository and I compiled it.
cd caffe
mkdir build
cd build
cmake ..
make
Finally I had to install some packages required by the python module…
sudo pip install --upgrade pip
sudo pip install scipy
sudo pip install scikit-image
sudo pip install protobuf
sudo pip install pyyaml
… and I added the caffe’s python folder in the environment variable PYTHONPATH.
export PYTHONPATH=/path/to/caffe/python/
Now we are ready to go… let’s do some test!
Test
I started usign this tutorial. The goal is to have a feature extractor that is working using the pre-trained model from Krizhevsky et al. that is working on the ImageNet dataset. We need to download the model and the labels first.
cd /path/to/caffe
#download the model
./scripts/download_model_binary.py ./models/bvlc_reference_caffenet
#download the labels
./data/ilsvrc12/get_ilsvrc_aux.sh
cd /path/to/sandbox
mkdir images
I downloaded few pictures and added to the images folder.
And with the following script I run the feature-extraction/classification. The script is really simple, however I’m planning to write ASAP a script that can deal with tons of images.
import numpy as np
import caffe
caffe_root = '/path/to/caffe/'
caffe.set_mode_gpu()
#caffe.set_mode_cpu() if you don't have an NVIDIA card
net = caffe.Net(caffe_root + 'models/bvlc_reference_caffenet/deploy.prototxt',
caffe_root + 'models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel',
caffe.TEST)
# input preprocessing: 'data' is the name of the input blob == net.inputs[0]
transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})
transformer.set_transpose('data', (2,0,1))
transformer.set_mean('data', np.load(caffe_root + 'python/caffe/imagenet/ilsvrc_2012_mean.npy').mean(1).mean(1)) # mean pixel
transformer.set_raw_scale('data', 255) # the reference model operates on images in [0,255] range instead of [0,1]
transformer.set_channel_swap('data', (2,1,0)) # the reference model has channels in BGR order instead of RGB
imagenet_labels_filename = caffe_root + 'data/ilsvrc12/synset_words.txt'
labels = np.loadtxt(imagenet_labels_filename, str, delimiter='\t')
net.blobs['data'].reshape(50,3,227,227)
net.blobs['data'].data[...] = transformer.preprocess('data', caffe.io.load_image('./images/wine.jpg'))
out = net.forward()
top_k = net.blobs['prob'].data[0].flatten().argsort()[-1:-6:-1]
print labels[top_k]
net.blobs['data'].data[...] = transformer.preprocess('data', caffe.io.load_image('./images/dog.jpg'))
out = net.forward()
top_k = net.blobs['prob'].data[0].flatten().argsort()[-1:-6:-1]
print labels[top_k]
net.blobs['data'].data[...] = transformer.preprocess('data', caffe.io.load_image('./images/old_camera.jpg'))
out = net.forward()
top_k = net.blobs['prob'].data[0].flatten().argsort()[-1:-6:-1]
print labels[top_k]
And the results is the following, quite impressive ;)
['n07892512 red wine' 'n04591713 wine bottle' 'n02823428 beer bottle'
'n04579145 whiskey jug' 'n03690938 lotion']
['n02099601 golden retriever'
'n02102318 cocker spaniel, English cocker spaniel, cocker'
'n02108551 Tibetan mastiff' 'n02102480 Sussex spaniel'
'n02100877 Irish setter, red setter']
['n04069434 reflex camera'
'n03976467 Polaroid camera, Polaroid Land camera' 'n04009552 projector'
'n03843555 oil filter' 'n03666591 lighter, light, igniter, ignitor']