First steps with AI & image recognition (using TensorFlow)

F

After reading the excellent O’Reilly book/essay collection What is Artificial Intelligence? by Mike Loukides and Ben Lorica, I got curious—and, finally, emboldened enough—to give get my hands dirty with some n00b level AI and machine learning.

Pete Warden’s Tensorflow for Poets, part of Google Code Lab, seemed like a logical starting point for me: My coding skills are very basic (and fairly dismal, tbh), and this is technically way beyond my skill level and comfort zone. But I feel confident that with a bit of tutorial-based hand-holding I can work my way through the necessary command-line action. Then, later, I can take it from there.

For this first time I would stick to the exact instructions, line by line, mostly by copy & paste. It’s not the deepest learning curve that way but it helps me to walk through the process once before then changing things up.

So, basic setup. I won’t include links here as they’re updated and maintained over on Tensorflow for Poets.

Get Docker up and running

Docker creates a Linux virtual machine that runs on my MacBook Pro. This created a first small bump, which after some reading up on Docker configuration turned out to have the oldest solution of all of tech: Relaunch the Docker app. Boom, works.

Get Tensorflow installed & download images for training set

As I continued the setup and installed Tensorflow as per instructions, there was some downtime while the system was downloading and installing.

The tutorial suggests experimenting with the Tensorflow Playground. Which is great, but I’d done that before. Instead, I decided to prepare my own set of images to train the Inception model on later. (After first following the tutorial exactly, including using their flower-based training image set.)

The training set consists of flower photos for five different types of flowers, and a few hundred photos each. This might take a while.

First round of (re)training: Inception

The Inception network (v3, in this case) is a pre-trained Tensorflow network which we can re-train on our own images. It’s a tad over-powered for what we need here according to our tutorial: “Inception is a huge image classification model with millions of parameters that can differentiate a large number of kinds of images. We’re only training the final layer of that network, so training will end in a reasonable amount of time.”

Inception downloads and goes to work. This is my cue: I go have lunch. It might take up to 30 minutes.

Half an hour later I’m back. I’ve had lunch, the Roomba has cleaned the kitchen. The training was done.

Final test accuracy = 91.4% (N=162)

Train your own

Now it was time for me to take it to the next level: Put Tensorflow to work on my own image training set. I decided to go with a few members of the ThingsCon family. Iskander, Marcel, Max, Monique, Simon, and myself: 6 people total, with around 10-20 photos of each.

Now, these photos are mostly from conferences and other ThingsCon-related activities: During our summer camp and our Shenzhen trip. I added some personal ones, too.

A bunch are really horrible photos I included to really test the results: In addition to a tiny sample of training images, some are really hard to discern even for human eyes. (There’s one that contains only a small part of Max’s face, for example—his gorgeous giant blond beard, but nothing else.) Lots are group pics. Many contain not just one but two or more of the people in this sample. These are hard images to train on.

Let’s see how it goes. I swap out the folders and files and run Inception again.

ZeroDivisionError

I had been warned about this. If a sample is too tiny, the network sometimes can’t handle it. We need for pics! I pull a few more from personal files, a few off of the web. Now it’s just over 20 images per “category”, aka person. Let’s try this again.

ZeroDivisionError

Still no luck. My working theory is that it’s too many photos with several of the yet-to-learn people in them, so the results are ambiguous. I add more pics I find online for every person.

I don’t want to make it too easy though, so I keep adding lots of pics in super low resolution. Think thumbnails. Am I helping? Probably not. But hey, onwards in the name of science!

Going back through the training set I realize just how many of these pics contain several of the yet-to-learn categories. Garbage in, garbage out. No wonder this isn’t working!

Even something as simple as this drives home the big point of machine learning: It’s all about your data set!

I do some manual cropping so that Inception has something to work with. A clean data set with unambiguous categories. And voilà, it runs.

Now, after these few tests, I snap two selfies, one with glasses and one without.

The output without glasses:

peter (score = 0.66335) max (score = 0.14525) monique (score = 0.07219) simon (score = 0.05728) marcel (score = 0.04428) iskander (score = 0.01765)

The output with glasses:

peter (score = 0.75252) max (score = 0.12352) simon (score = 0.05971) monique (score = 0.04001) marcel (score = 0.01397) iskander (score = 0.01027)

Interestingly, with glasses the algorithm recognizes me better even though I don’t wear any in the other images. Mysterious, but two out of two. I’ll take it!

How about accuracy?

The tests above are the equivalent of a “hello world” for machine learning: The most basic, simple program you can try. They use the Inception network that’s been built and trained for weeks by Google, and just add one final layer on top, to great effect.

That said, it’s still interesting to look at the outcomes, and which factors influence the results. So let’s run the same analysis for 500 iterations compared to, say, 4.000!

The test image I use is a tricky one: It’s of Michelle, a hand in front of her face.

500 iterations on a set of photos (this time, of family members):

michelle (score = 0.53117)

This isn’t the result of a confident algorithm!

So for comparison, let’s see the results for 4.000 iterations on the same training set:

michelle (score = 0.75689)

Now we’re talking!

At this point I’m quite happy with the results. For a first test, this delivers impressive results and, maybe even more importantly, is an incredible demonstration of the massive progress we’ve seen in the tooling for machine learning over the last few years.