Christian’s Blog

Data analysis for drug/ alcohol incidents

2019-01-12T00:00:00+01:00

As part of the Coursera course Communicating Data Science Results I want to present my assignment in this blog post. The aim of the assignment is to analyze and visualize crime incident data for the cities Seattle and San Francisco.

Note: Instead of the provided small subset (summer 2014) of crime data I used a full year worth of data from the original data sources (year: 2017):

The actual jupyter notebooks are located at my github

Report

The following analysis reports on alcohol and narcotic related offenses as reported by the San Francisco and Seattle police departments for the year 2017. The most common offenses reported for San Francisco are “Larceny/Theft”, “Other offences”, “Assault”, whereas the Top 3 for Seattle are “Burglary”, “Car Prowl”, “Other Property”. However, it has to be noted that both data schema are not fully compatible and thus a class mismatch is likely (also, the summary classes are not disentangled). For the preprocessing carried out prior to this analysis and the individual code that produced these plots please see the jupyter notebook here rank on position 10, 20, and 23 (San Francisco) and 11, 22, 32 (Seattle), respectively (see Fig. 1).

Fig.1: Total number of offenses for San Francisco and Seattle (year: 2017, classes partly matched)

Seasonality of crimes/ offenses

Next, the temporal occurrence of these offenses is investigated. As shown in Fig. 2 the number of recorded incidents varies over the weekdays (note that the number of narcotics incidents is scaled by a factor of 10 for visual reasons). It is obvious that alcohol related offenses are are most common at the weekends in both cities. Furthermore, narcotic reports on weekends are lower in both cities, too.

Fig.2: Drug/ alcohol related offenses per weekday.

This effect can also be clearly seen in a work-day/ non-work-day plot (Fig. 3). Here, the offenses were grouped by weekdays (non-work weekend ranges from Friday after 8pm till Monday 4am, US national holidays are also considered non-work-days). The number of incidents in non-work-day episodes is clearly higher for alcohol related offenses but about 20-25% lower for narcotic related offenses).

Fig.3: Offenses split into work- and non-work-days

In addition to the weekly cycle, a diurnal variability of reports can be detected. In Fig. 4 the normalized number of offenses is plotted for 24-hour bins. It can be observed that “Driving under the Influence” is substantially higher from 8pm to 5am in both cities. “Liquor violation” also increased with day time (San Francisco), but the small number of incidents and the inconclusive trend for Seattle point at a schema mismatch for this class in this city. “Narcotics” incidents are dominant at daytime in both cities, with a small secondary peak from 8pm to 11pm in Seattle.

Fig.4: Offenses split into hourly intervals

A classification in day/ night statistics (night: 9pm - 6am) further illustrates the strong difference in occurrence frequency for day and night hours (Fig. 5).

Fig.5: Offenses split into day and night hours

Location of crimes/ offenses

In addition to the temporal variability, the offenses also vary by their location. To illustrate this, incidents were mapped by their reported geographic coordinates.

In Fig. 6 & 7 day and night incidents for the three offenses are mapped (light colors: day, darker colors: night). It is clear that “Narcotics” dominate in the CBD area of San Francisco, where as the other offenses are scattered more white spread.

Fig.6: Location of offenses in San Francisco

In Seattle, “Driving under the Influence” and “Narcotics” dominate in the CBD and the few reported incidents of “Liquor Violation” also occur in this area, although the small sample size makes a good interpretation difficult.

Fig.7: Location of offenses in Seattle

Summary

It was shown that the time and location of drug and alcohol related incidents varies strongly for both San Francisco and Seattle. However, often they do follow similar patterns. A more in-depth analysis is hampered by the different data structures and a more thorough feature mapping is required for more advanced analytics.

Nick that formula like a Pro

2019-01-09T00:00:00+01:00

Ok, this is going to be a shorty. I recently needed some nice formulas from a scientific paper for a presentation and usually I try to just copy them from the source PDF since Keynote.app is pretty good at incorporating high-quality PDF snippets into a slide.

However, once you want to edit the formula (to match symbols or variable names) you have to tediously compose it by hand. For Keynote people like myself there is also the option since mid 2018 to just enter LaTeX or MathML code and have it render beautifully on the slides. For Word or Powerpoint folks adding formulas usually means using the clunky Microsoft Equation Editor. However, it seems Office 365 now supports to enter formulas using LaTeX syntax, too, but I’m still on Office 2016 here.

Anyways, depending on your LaTeX and math skills bigger or more fancy equations still require you to put quite some effort into composing the actual code that gives you the nice formula.

But fear not - with MathPix this workflow is amazingly simple (not endorsed by the developers, I just discovered their tool recently and love it). Their page probably says it best: “Take a screenshot of math and paste the LaTex into your editor, all with a single keyboard shortcut.” Nice! Oh, and it’s free!

The app used OCR to trace the equation and convert it into LaTeX formula syntax which you can paste into any editor or app that can deal with it. And that’s it really.

MathPix: Take a screenshot of math and paste the LaTeX into your editor, all with a single keyboard shortcut

Steps

Grab the app from here: Mac, Win, Ubuntu
Open your source PDF
Hit Ctrl + ⌘ + M (Mac) or Ctrl + Alt + M (Win/ Linux)
Select the formula you want to extract by drawing a bounding box over it
Pick one of the suggested formats
Paste it into where-ever
Done

Some examples

Coming up some tests with various equations. The numbers indicate the source (1), the MathPix detection (2) and the final rendered equation in Keynote (3). The red marker indicates the equation used in the example.

Simple stuff: a sigmoid function

No problem at all (not shown).

More advanced: regularized loss function of a NN

An example from a scientific paper: a L1 loss function regularization (source: He & Yokoya (2018), ISPRS Int. J. Geo-Inf. 7, 389; doi:10.3390/ijgi7100389)

Advanced but low quality

A quick test how a low res equation scan works out (go to source for actual resolution).

Some random equation from the web from a low quality scan (source)

Matrix stuff: detection ok, but Keynote bails

MathPix manages to detect this formula but Keynote gives an error when rendering it.

A rotation matrix from wikipedia (source)

Too hard: big Matrix

Ok, this definitively seems to be too hard for the little app - MathPix bails.

Description of neural net weight (source)

Summary

A really neat tool that stays in my menubar for sure!

Deep Learning for Remote Sensing Applications

2019-01-04T00:00:00+01:00

Happy New Year everybody.

I started this year off with a paper presentation for the TWiML&AI EMEA Meetup titled “Deep Learning for Remote Sensing Applications”. A brief intro to optical and radar-based remote sensing and a deep learning application for generating optimal images based on radar and previous time-step images by He and Yokoya (2018). It’s a nice application illustrating the use of CNN and cGAN and data fusion based on auxiliary radar images to generate optical images that might be compromised by clouds or missing-data.

Deep Learning for Remote Sensing Applications

For your convenience the open-access paper discussed can be found here, the slides can be downloaded here, and the recording of the meetup is provided here (talk starts at about 16:12min in).

Deployment for cheapskates

2018-11-08T00:00:00+01:00

Ok, so Heroku is a nice but kind of expensive PaaS solution for hobby projects ($7/month per dyno). They have a free tier but apps there are deployed on-demand and thus have a nasty start-up delay. In addition, they will autosleep after 30min. As an alternative, let’s use Dokku the “Docker powered mini-Heroku”. They also have a another nice slogan: “Own Your PaaS. Infrastructure at a fraction of the cost”. Sounds good to me. In essence, Dokku gives you your own Heroku. After installation you can push Heroku-compatible applications via git. They autobuild using Heroku buildpacks and then run in isolated containers.

Meet dokku: The smallest PaaS implementation you’ve ever seen

To host Dokku, there are multiple options and people seem to like Digital Ocean who provide a Dokku droplet so you start deploying pretty much right away. But we want to go even cheaper (cheapskate, you know)! As an alternative, I will present a solution that’ll cost you about 3 bucks a month and you can potentially host multiple apps with this (depending on your RAM and disk requirement).

Start a Virtual Private Cloud instance

We need a virtual server/ Virtual Private Cloud (VPC) instance to install Dokku. I chose a German hosting company called Hetzner. I use the base-level CX11 vCPU instance that features 1 vCPU, 2GB of RAM and 20GB NVMe SSD and 20TB of traffic. Then I choose the linux system for the instance (I opted for Ubuntu 18.04 LTS). Next I create a passwordless ssh key and add the public key to the instance.

Install Dokku

Now we ssh into the instance and first set the hostname (replace the IP and hostname with your info).

echo "100.100.100.100 dokku.mydomain.net dokku" >> /etc/hosts

We also add the dokku repository to the system package control (incl. GPG keys).

wget -nv -O - https://packagecloud.io/gpg.key | apt-key add
echo "deb https://packagecloud.io/dokku/dokku/ubuntu/ bionic main" \
    > /etc/apt/sources.list.d/dokku.list

Then we proceed to update the packages and install docker dependencies.

sudo apt-get update
sudo apt-get install apt-transport-https ca-certificates \
	curl software-properties-common

Now we add another repository (this time for docker):

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository \
    "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
    $(lsb_release -cs) stable"

Finally we install docker-ce and dokku to the system.

# When asked, select YES to enable web setup
sudo apt-get update
sudo apt-get upgrade

sudo apt-get install docker-ce
sudo apt-get install dokku

You now also should have a new user and group Dokku in the system

id dokku
# uid=1000(dokku) gid=1000(dokku) groups=1000(dokku),4(adm),999(docker)

If all went well the service should be running by default after the installation (service name: dokku-installer.service). To check type this into your terminal:

systemctl status dokku-installer.service
systemctl is-enabled dokku-installer.service

Finish the installation by adding core dependencies:

dokku plugin:install-dependencies --core

On the machine use the dokku command to start and stop apps, see logs and configure things. See the help pages here for details.

Deploy an app

Dokku relies on git for deployment. First, make sure you have a local repository setup on your local machine (i.e. git init, …). Then you need to add a deployment remote where the app will be pushed to (this is your dokku server you just set up). In the command below, my-app will also be used by dokku to create your app subdomain when deployed. Thus, the command will host your app at my-app.mydomain.net . With git push you simply deploy and trigger a rebuild if the app already exists. Done.

git remote add dokku dokku@mydomain.net:my-app
git push dokku master

Note: For deploying multiple apps you also want a domain hoster that allows you to set wildcard DNS records. I started using Porkbun for hosting my apps at http://cwerner.ai (note that this address redirects, to see a live app go here). They are cheap, offer good features (wildcard DNS entires included), and people seem to like them.

A wildcard DNS record looks like this (replace the ip-address with your IP from your VPC instance):

A   *.mydomain.net  100.100.100.100

Pitfalls

There are some things that you should watch out for.

If you are ever stuck with this error message

 ! [remote rejected] master -> master (pre-receive hook declined)
error: failed to push some refs to 'dokku@mydomain.net:my-app'

There a a couple of things you should check:

Insufficient memory

Check that your servers’ disk and memory are sufficient. I discovered that the Ubuntu 18.04 image has only a small swapfile allocated so I manually increased that to 4GB just to be sure.

# increase swapfile to 4G
sudo swapoff -a
sudo dd if=/dev/zero of=/swapfile bs=4G count=102
sudo mkswap /swapfile
sudo swapon /swapfile

Insufficient disk space

Docker can really fill up your hard drive. So if you experiment a lot it might also be a good idea to clean up old stuff like so:

Delete volumes:

# https://github.com/chadoe/docker-cleanup-volumes
docker volume rm $(docker volume ls -qf dangling=true)
docker volume ls -qf dangling=true | xargs -r docker volume rm

Delete images:

# http://stackoverflow.com/questions/32723111/how-to-remove-old-and-unused-docker-images
docker images
docker rmi $(docker images --filter "dangling=true" -q --no-trunc)

docker images | grep "none"
docker rmi $(docker images | grep "none" | awk '/ / { print $3 }')

Delete containers:

# http://stackoverflow.com/questions/32723111/how-to-remove-old-and-unused-docker-images
docker ps
docker ps -a
docker rm $(docker ps -qa --no-trunc --filter "status=exited")

If this stuff does not help you might need to upgrade your machine or attach a volume to your instance to offload stuff.

Conclusion

I think this a neat possibility to bring apps to live for cheap. Plus, you also learn a bit about PaaS and devops along the way. I will illustrate how to actually build a python based webapp in another post soon.

Guitar classification revisited

2018-10-29T00:00:00+01:00

In a previous post I created a guitar model classifier that was capable of discriminating between two iconic guitars (the Gibson Les Paul and the Fender Stratocaster). Using a ResNet-34 architecture and the fastai v0.7 deep learning python package I created a model that could predict the right class with 95.4% accuracy.
In a next step I wanted to expand this to a multi-class classification. Since then, a new version of fastai launched that builds on the pre-release of pytorch 1.0 and features a pretty different architecture but many substantial improvements. The following results are based on version 1.0.26 (but please check for updates, the library is currently evolving basically on a daily basis).

Fast.ai v1 quick intro

One of the greatest improvements in version 1.0 is a proper documentation page. Due to the rapid development it too changes basically daily, but most foundational APIs seem to be in place now. One noteworthy thing is that the documentation is build from Jupyter notebooks and thus all code can be run from the doc_src directory at their GitHub repository. As mentioned, fastai v1 consists of four applications: vision (image classification, segmentation, etc.), text (natural language processing), tabular (structured data), and collab (collaborative filtering) [see Figure 1]. Apart from collab, all applications are structured into transformations (data pre-processing, data augmentation, tokenization, etc.), data (dataset classes for the specific use case), models (actual model architectures). Data and models are combined into a learner.

General structure of fastai v1 (source: docs.fast.ai).

One big change occurred at version 1.0.22 (I think, things move fast) when the data block API was introduced. I will use this API in this post instead of the older dedicated vision tools as it’s a more generic option and translates nicely also into the other fastai components tabular, text, and collab. In general, you specify the following components that together are used to create a DataBunch - the container that holds training, validation and test data as well as information about data augmentation.

Building a dataset with fastclass

Before we start we need a dataset of images. You can use one of the provided datasets, get your data from kaggle, google, or build your own. I recently wrote the small toolkit fastclass to make the process of downloading and cleaning a custom dataset easier (see blog post and GitHub repo).

Note: I provide the full jupyter notebook here. The dataset can be downloaded from within the notebook off a dropbox link. The guitar dataset consists of approx. 8500 images from 11 different guitar classes (five Fender models and six Gibsons).

Starting: ResNet-34

Using the new data block API we build a DataBunch from the images in our dataset. Since we reuse this bit of code later I divided it into two parts (training and validation data split; image transformation):

# Since src will be reused later and we need to have the same
# images in train and validation sets to avoid data leakage
src = (ImageItemList.from_folder(pathlib.Path.cwd()/'data/guitars')
           .random_split_by_pct()
           .label_from_folder(classes=classes))

# Return a DataBunch with specified image and batch
# size
def get_data(src, sz=224, bs=64):
    """get new databunch with requested resolution"""
    return (src.transform(get_transforms(do_flip=False), size=sz)
               .databunch(bs=bs)).normalize(imagenet_stats)

# example: get a databunch with images of size 299x299 and
#          a batch size of 32
data = get_data(src, sz=299, bs=32)

To have a quick look we can display a sample of the data with:

data.show_batch(rows=3, figsize=(8,6))

A batch of images from the dataset.

To get started we do not train a model from scratch as a model pretrained on a large image dataset is always preferable to learning from random weight initializations. Using transfer learning we can leverage the substantial compute efforts that went into an existing model (ResNet-34 in this case, trained on > 1 million ImageNet images). We will reuse this knowledge and replace the head of the model with a new set of fully-connected layers dedicated to our classification task.
We start with the ResNet-34 base model and a DataBunch containing images of size 224x224 (bs=64). To track our progress we specify the error_rate as a metric. First, we run the learning rate finder to determine the optimal learning rate to improve our model quickly.

learn = create_cnn(data, models.resnet34, path='.', metrics=error_rate)
learn.lr_find(); 
learn.recorder.plot()

Using the learning rate finder to determine the optimum learning rate.

As is clear from the plot, we want to find a learning rate that gives us the smallest loss rate while making the biggest steps in the feature space. As a rule of thump we thus find the lowest point on the curve before the loss shoots up again and go one magnitude to the left (0.01 in this case). We then train the model for five cycles using the fit_one_cycle() method. The one cycle policy is a great technique of setting the hyper parameters (learning rate, momentum and weight decay) in a way to train complex models fast and efficient (it’s the standard approach in fastai). In essence, we want the biggest possible learning rate (determined by lr_find()) to explore the feature space efficiently. Second, the learning rate changes in a cycle from a low value (10 times lower than the lf_find() result) up to the maximum and then back down again. It was observed that the high learning rates at the middle of a cycle also act as regularization method that prevents overfitting. In addition, the momentum of the stochastic gradient descent (SGD) is altered in an anti-cyclical pattern.

lr = 0.01
learn.fit_one_cycle(5, slice(lr))

After only 4:34 min on a K80 GPU we already have a model capable of predicting the right guitar model from a set of eleven classes with 95.1% accuracy!

Total time: 04:34
epoch  train_loss  valid_loss  error_rate
    0.928203    0.450700    0.162353    (00:57)
    0.536406    0.311858    0.098824    (00:54)
    0.347128    0.200679    0.065882    (00:54)
    0.225095    0.175412    0.053529    (00:54)
    0.162299    0.157303    0.049412    (00:54)

We save the model and proceed to improve it by fine-tuning also the lower layers in the architecture (up till now we only trained the new head of the model). First, we unfreeze the model (now all weights will be trained) and run the learning rate finder again to determine the optimal learning rate.

learn.load('guitars-v1-11cl-res34-224px-01')
learn.unfreeze()
learn.lr_find()
learn.recorder.plot()

We set the learning rate to 1e-05 for the lower layers of the model and 0.005 for the head (this is called a discriminate learning rate) as we do not want to destroy to learned features in the lowest layers. Those detect simple features (edges, gradients, simple patterns) that should be pretty universal for all kinds of images. We got them from the pertained model for free and they are based on the model learning from millions of images. After five more cycles (another 6:20min of training) we end up with a model that can predict with 97.3% accuracy.

When we inspect the confusion matrix of the model, we can see where the model get’s it wrong.

# plot a confusion matrix
interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix(figsize=(8,6))

# show largest classification errors
display(interp.most_confused(min_val=2))

Confusion matrix for our customized ResNet-34 model.

Seems the model has a hard time differentiating between a Fender Jaguar and Jazzmaster (who wouldn’t - they are super similar). Dito for the Gibson ES and Les Paul (here, special models exist that lend features from the other guitar ranges, i.e. f-holes, pickup configurations, …).

Level up: ResNet-50

While this result is already quite impressive, we so far only used a relative simple model architecture. We now progress to ResNet-50, that features substantially more layers and thus weights that can potentially learn more features of our data. To not exceed our GPU memory we have to reduce the batch size now from 64 to 32.

First, we build a new DataBunch with the same train/ validation split but the smaller bs=32. We then create a new model based on the ResNet-50 architecture and run our learning rate finder again (the optimum learning rate seems to be 0.01). We immediately train the model for five cycles.

data = get_data(src, sz=224, bs=32)

learn = create_cnn(data, models.resnet50, path='.', metrics=error_rate)
learn.freeze()
learn.lr_find(); 
learn.recorder.plot()

lr = 0.01
learn.fit_one_cycle(5, slice(lr))

Then, we again train the entire model architecture with discriminative learning rates:

learn.load('guitars-v1-11cl-res50-224px-01')
learn.unfreeze()
learn.lr_find()
learn.recorder.plot()

learn.fit_one_cycle(5, slice(1e-6, lr/5))
learn.save('guitars-v1-11cl-res50-224px-02')

After these 2x5 cycles we now have an accuracy of 98%.

Total time: 10:28
epoch  train_loss  valid_loss  error_rate
    0.591651    0.319690    0.105294    (02:11)
    0.461109    0.398586    0.115294    (02:04)
    0.292784    0.192599    0.067647    (02:04)
    0.178708    0.128503    0.041176    (02:04)
    0.098652    0.102441    0.033529    (02:03)

Total time: 13:47
epoch  train_loss  valid_loss  error_rate
    0.122527    0.123221    0.032353    (02:46)
    0.131352    0.129197    0.040588    (02:45)
    0.084470    0.085018    0.028235    (02:45)
    0.055003    0.071305    0.022353    (02:45)
    0.035648    0.065091    0.020000    (02:45)

Progressive resizing

In order to improve the model even more, we now use a technique called progressive resizing. We feed the model larger versions of our images (448x448px instead of the previous 224x224) and again reduce our batch size (bs=16).

# load the previous model version from storage
learn.load('guitars-v1-11cl-res50-224px-02')

# feed the new data (448x448px)
learn.data = get_data(src, sz=448, bs=16)
learn.freeze()

The learning rate finder tells us to use a maximum learning rate of 0.001 and thus we train the head of the model for five cycles.

Lr = 0.001
learn.fit_one_cycle(5, slice(lr))
learn.save('guitars-v1-11cl-res50-448px-01')

With the bigger architecture and substantially larger images we now have to wait for 38 minutes.

Total time: 37:46
epoch  train_loss  valid_loss  error_rate
    0.183134    0.086426    0.028235    (07:42)
    0.099537    0.067973    0.020588    (07:30)
    0.091131    0.060259    0.015294    (07:31)
    0.062417    0.050117    0.013529    (07:30)
    0.049533    0.048065    0.013529    (07:31)

However, as you can see the accuracy of the model improved drastically! Compared to the previous model, we now have an accuracy of 98.6% (a relative error rate improvement of 30%!). Again, we also train the full model.

learn.load('guitars-v1-11cl-res50-448px-01')
learn.unfreeze()
learn.fit_one_cycle(5, slice(1e-06, lr/5))

This takes even longer (49:50min):

Total time: 49:52
epoch  train_loss  valid_loss  error_rate
    0.067349    0.044458    0.015294    (10:02)
    0.055378    0.056939    0.015294    (09:57)
    0.050544    0.045030    0.011765    (09:57)
    0.034476    0.040948    0.012353    (09:57)
    0.032105    0.041326    0.011765    (09:57)

We improve the accuracy again: the final model now has an accuracy of 98.8%. If we check the confusion matrix we see that almost all validation files are predicted correctly.

Confusion matrix of the final model.

Conclusion

As shown, it takes relative little effort to build a custom image classifier capable of some extremely high accuracy. Using a deep learning library like fastai, a pre-trained model architecture, a reasonably-size dataset and some tricks can get you a long way!

What’s next

In the next blog posts I will look at Class Activation Maps to see which regions of an image actually ‘trigger’ the classification. Furthermore, I want to write a small post about how to deploy the model with a flask web app. So stay tuned.

The notebook can be found here.

Build your image dataset faster

2018-10-25T00:00:00+02:00

If there is one thing cumbersome in doing deep learning - apart from fiddling around with hyper parameters - it is to actually get the data to train on in the first place. You can download some excellent training datasets from Kaggle, but if you want to solve your own tasks you’ll have to build your very own image dataset.

xkcd: Is It Worth the Time?

Luckily transfer learning drastically reduces the required number of images for most classification problems, but you still have to come up with 100s to 1000s of images and (depending on the accuracy you’re after and the number of classes you require) this can be challenging.

Recently I struggled with this problem myself and after consulting the xkcd time vs. effort chart I created the python package fastclass to make the process less painful.

FastClass

You can get the script by simply installing from my GitHub like so:

pip install git+https://github.com/cwerner/fastclass.git#egg=fastclass

This will install two script in your $PATH: fastclass download (fcd) to pull images from various sites in the web, and fastclass clean (fcc) that is used to visually inspect the often messy results from such internet crawling.

Step 1: FastClass download

To download image categories from the net you first need to create a query csv file. The package comes with on example that should be located in the install location (your site-packages/fastclass folder).

head -n 3 example/guitars.csv
searchterm,exclude
guitar gibson les paul,guitar
guitar gibson SG,guitar

In the example, 25 different search terms are listed (column searchterm). In addition you specify exclusion terms. These are keywords you need for a successful search but don’t want to use as class labels (search and exclusion terms are separated with whitespace).

You start the download from the command line:

> fcd -c ALL -k -o guitars example/guitars.csv 

FastClass download

This will use all three search crawlers (Google, Bing, and Baidu), resize any image it downloads to the default size (299x299px) but also keep the originals, and store the files in the folder ‘guitars’. For details just use the help flag (‘-h’).

When the script is finished you will find subfolders for each row of your query csv file in the specified dataset folder. Furthermore, a log file containing the source URL for each image is reported. The source URL is also embedded as an EXIF tag:UserComment in the resized images. Duplicated images are detected and removed automatically.

Step 2: FastClass clean

Once the images are located on your drive you can inspect them quickly for the tool fcc. Call it by pointing to the category subfolder you want to inspect:

> fcc guitars/gibson_les_paul

FastClass clean

This will quickly launch a GUI with the first image. Use the arrow keys to navigate. To rate the file or choose a class by pressing the keys [1] to [9]. With [d] you can mark it for deletion and with [x] you terminate the script. Afterwards you will find a copy of the files that were not marked for exclusion and a report file with your ratings.

In future updates I want to improve the interface and possibly store the image information in a database to reduce clutter. I hope it is useful to you and in case of any issues please create an issue at https://github.com/cwerner/fastclass/issues or sent me a pull request.

AxeNet - Guitar Classifier App

2018-10-01T00:00:00+02:00

A web application that allows you to classify a guitar image. Using transfer learning, fast.ai and the image downloader fastclass.

AxeNet Guitar Classifier App

Try it out at http://guitars.cwerner.ai

Is this a Les Paul or is this a Strat?

2018-06-16T12:20:00+02:00

I recently stumbled upon the most excellent podcast “This week in Machine Learning and Artificial Intelligence” (TWiML&AI). Sam Charrington is doing a wonderful job in presenting people and trending topics of all things AI and ML. Go check it out. It’s quality stuff!

Intro

Getting to know fast.ai

Anyways. A recent guest on his show was Rachel Thomas (Episode #138) - a university professor at the University of San Francisco and co-founder of fast.ai. To cite their company’s mission statement: “Fast.ai is dedicated to making the power of deep learning accessible to all. Deep learning is dramatically improving medicine, education, agriculture, transport and many other fields, with the greatest potential impact in the developing world. For its full potential to be met, the technology needs to be much easier to use, more reliable, and more intuitive than it is today.” (see also a blog post of them explaining why they do what they do).

So, in essence they teach state-of-the-art deep learning (DL) for the common (wo)man by providing a free MOOC on their site. What’s quite unique about it is that they decided to use a top-down approach. They basically provide almost no introduction to the basis of the field but have the students train their first deep convolutional neural network with just three lines of code and go from there… Later, they peel layer for layer and expose more and more details about the underlying fundamentals that make the machinery work. The idea is that this supposedly keeps students engaged and helps to facilitate different learning paces and styles. To make all this happen they designed a high-level wrapper that sits on top of the deep learning framework PyTorch - apparently in a similar way as Keras provides a more gentle interface to TensorFlow. As far as I understood, it was originally designed as a help for their courses but matured into a rather stable general-purpose DL library that might also be used for production…

Given that I currently teach a university course (Remote Sensing of Global Ecology (using R)) that is structured on the conventional bottom-up approach I was a intrigued about this style of teaching.

Joining the group

Sam initiated a study group shortly after the fast.ai interview. I thought I’d also check this out and so I joined to keep me motivated and here we are.

Now, in the first lesson students build the (unavoidable?) cat classifier (here it’s a cat vs. dog classifier). The whole model requires only three lines of Python code! Obviously this only works since a lot of hyper-parameters are hidden by the basic interface and a lot of choices are done by the fast.ai package by default and data is provided. Furthermore, the example uses transfer learning and thus really piggybacks on a large existing model that was trained on the massive ImageNet dataset. Nevertheless it really is quite amazing that you can get things from the ground with this little code.

The project

As an exercise students are asked to come up with their own (binary) classification problems and so I thought I’d build a guitar classifier net. To start things off I decided to go for arguably two of the most iconic electrical guitars: the Gibson (R.I.P) Les Paul and the Fender Stratocaster.

Now, a couple of things up first. While the two instruments feature very characteristic body shapes, headstocks and geometries guitars tend to come in all kinds of designs and configurations. So I’d imagine that this task is at least as challenging as differentiating between a fluffy cat and a (less so?) dog - if not much more.

Well, obviously!

Connecting to notebook server

Since I work on Macs (and none features a decent Nvidia GPU) I ssh into a GPU-equipped server that runs the jupyter notebook with GPU acceleration (it also features an anaconda installation and has the fast.ai and other required python libraries installed, I’m not discussing the setup here).

# activate the anaconda environment
source activate fastai

# start a jupyter instance 
jupyter lab --port=9000 --no-browser &

On my machine I open a ssh tunnel and bind the local port 8888 to port 9000 of the remote machine:

# connect ports
ssh -N -f -L 8888:localhost:9000 cwerner@MY_GPU_SERVER

Now I simply access the notebook via my browser at https://localhost:8888 (I also set it to be password protected).

Data setup

“First, there was data…” Well, there needs to be anyways. So one convenient way of getting hold of image data is to use Google Image Search. There is a neat Chrome extension for image harvesting - but I find that Chrome still plain sucks on a Mac so I went for another python library called google_images_download that does the same job.

# install google image downloader and pull images
pip install google_images_download

# get two batches of 1000 images (Gibson Les Pauls, Fender Stratocasters)
# I had to specify the location of chromdwriver, too
googleimagesdownload -k "gibson les paul" -pr "gibson_lp" -th -o gibson_lp -l 1000 --chromedriver /usr/local/bin/chromedriver
googleimagesdownload -k "fender stratocaster" -pr "fender_strat" -th -o fender_strat -l 1000 --chromedriver /usr/local/bin/chromedriver

For some reason the script only managed to pull ~500 images (which should still be enough for the exercise), but to get a better dataset I found that I had to manually weed through the files and delete files with missing suffixes, wrong classification, or only showing guitar parts. A neat way to do this on the Mac is simply to use the quick view in Finder and scroll through the directory and delete as necessary. I also only used the thumbnail images as the scripts currently only uses 224x224px images anyways.

Finally, I wrote some quick lines of code that created a file structure suitable for the fast.ai ImageClassifierData object.

import glob
import math
import os
import random
import shutil

# path structure: 
# guitars_small/gibson_lp/gibson_lp.1.imagedescription.jpg
gibson_files = glob.glob('guitars_small/gibson*')
fender_files = glob.glob('guitars_small/fender*')

for dir in gibson_files + fender_files:
    # create fast.ai data folder structure
    dname = os.path.basename(dir)
    npath1 = os.path.join('guitars_small', 'train', dname)
    npath2 = os.path.join('guitars_small', 'valid', dname)
    try:
		    os.makedirs(npath1, exist_ok=True)
		    os.makedirs(npath2, exist_ok=True)
    except:
        pass

    # split files into train and validation sets (80/20)   
    all_files = glob.glob(os.path.join(dir, '*.jpg'))

    idx = list(range(len(all_files))
    random.shuffle(idx)
    cut = math.ceil(0.8 * len(idx))

    train_files = [all_files[x] for x in idx[:cut]]
    valid_files = [all_files[x] for x in idx[cut:]]
		
	# copy files into appropriate folders
    for src in train_files:
        shutil.copy(src, npath1)

The model

Once the files are copied to the remote server we can create the model. First let’s import libraries and define some defaults.

import torch
from fastai.transforms import *
from fastai.conv_learner import *
from fastai.model import *
from fastai.dataset import *
from fastai.sgdr import *
from fastai.plots import *

# some constants
PATH = "data/guitars_small" # the data path
sz = 224                    # the image size 
bs = 16                     # the batch size

Now we load and fit a pre-trained model (Resnet34) to our dataset. The line containing learn.fit() executes the model training (using a learning_rate of 0.01 and for two epochs).

arch = resnet34
data = ImageClassifierData.from_paths(PATH, bs=bs, tfms=tfms_from_model(arch, sz))
learn = ConvLearner.pretrained(arch, data, precompute=True)
learn.fit(0.01, 2)

The output of the first model training using our Les Paul vs. Strat dataset

This already gives us an accuracy of 92.9%. Pretty remarkable. Now, lets see what the model identifies correctly and where it fails (I’m using some helper functions from the course for this which I do not show in this post to save space, see the GitHub repository for the full code listing; 0 = Fender Strat, 1 = Gibson Les Paul).

Some examples of correctly and incorrectly classified images

Improving the model

In essence the lesson suggests to add these improvements to get even better results:

data augmentation (to vary the training data by scaling, flipping and tilting images; essentially adding more labeled data)
fine tuning the model layers (unfreeze early layers)
adding learning rate annealing
add data augmentation at inference time

# define data augmentation (we use transforms_top_down since guitars
# could be depicted from all kins of angles (the other choice would be
# transforms_side_on)

tfms = tfms_from_model(resnet34, sz, aug_tfms=transforms_top_down, max_zoom=1.1)

# new data object with transforms
data = ImageClassifierData.from_paths(PATH, tfms=tfms)

# start the training
learn = ConvLearner.pretrained(arch, data, precompute=True)
learn.fit(1e-2, 1)

# since the model was pretrained with precimpute=True data augmentation takes no effect.
# To add this we need to switch precompute to False
learn.precompute=False

# we now add some more epochs of training (using stochastic gradient descent with restart)
learn.fit(1e-2, 3, cycle_len=1)

We now have a model where the last layer was trained while all previous layers are still frozen to the original ImageNet weights. To give the model some wiggle room to fine-tune the network to our classification domain we can unfreeze the early layers, too, and provide a separate learning rate to early, central and late layers in the model (the idea is to have small learning rates for the early layers as they should be rather generic and larger learning rates for the late layers as they characterise more specific concepts).

learn.unfreeze()

lr = np.array([1e-4, 1e-3, 1e-2])

learn.fit(lr, 3, cycle_len=1, cycle_mult=2)

Test Time Augmentation (TTA) was something I never heard about but apparently it can further improve results quite a bit. TTA computes 4 augmented test images and judges the quality at test time based on the majority vote on all five images which helps the model to generalise better.

log_preds,y = learn.TTA()
probs = np.mean(np.exp(log_preds),0)
accuracy_np(probs, y)

In my setup this final model now achieves an accuracy of 95.4%. Given the diverse input data and relatively small sample set I find that quite amazing.

Some evaluation

First, let’s look at the confusion matrix. This illustrates the accuracy of the model for the individual classes (the diagonal is the correct prediction for all classes). In total there were 102 Strat and 94 Les Paul images in the validation dataset (the split was 80/20 of the total images).

Confusion matrix of model predictions

As we can see the model incorrectly predicted three Les Pauls to be Strats and six Strats for Les Pauls. Now let’s look again at some images (top: most confident Fenders, middle: most confident Gibsons, bottom: most uncertain images)

Best classification results and results where the model is most uncertain about the class.

Next steps

Now, while the results are not as great as the dog vs. cat classifier in the fast.ai lesson that consisted of a much larger dataset, I still believe results are quite neat. I currently think about the following steps for further experiments:

Get more images: the number of images is still small (400 training, 100 validation images per class)
Try the same exercise with the full resolution images (I have the feeling that the thumbnails are too small for the network and sz setting
Further selection of images that are a) too small, b) have multiple guitars in the image, show too little of the guitar (some only had the fingerboard or the headstock), c) remove images with backside shots.
I also want to extend this to mulit-class classification: Gibson Les Paul, SG, Firebird, Explorer and Fender Stratocaster, Jaguar, Mustang and Telecaster. This should be interesting!

Getting things started for 2018…

2018-01-02T00:00:00+01:00

New years’ resolutions everyone.

After a couple of trials and failed attempts this year will be different! A new blog (Jekyll), a new style (Minimal Mistakes) and new content. This should be good. Expect loads of modifications in the next weeks…