Traffic Sign Recognition [Self Driving]

Programming Language:

  • Python

The goals of this project are the following:

  • Load the data set (see below for links to the project data set)
  • Explore, summarize and visualize the data set
  • Design, train and test a model architecture
  • Use the model to make predictions on new images
  • Analyze the softmax probabilities of the new images
  • Summarize the results with a written report

Step1: Data Set Summary & Exploration

1. A basic summary of the data set.

I use Pickle data load function to read our dataset, and then use basic python function to analyze the dataset. In total, we have:

  • Training samples: 34,799
  • Validation samples: 4,410
  • Testing samples: 12,630
  • Each sample (image) has (32, 32, 3) shape.
  • We have 43 unique classes/labels:
ClassId SignName ClassId SignName ClassId SignName
0 Speed limit (20km/h) 1 Speed limit (30km/h) 2 Speed limit (50km/h)
3 Speed limit (60km/h) 4 Speed limit (70km/h) 5 Speed limit (80km/h)
6 End of speed limit (80km/h) 7 Speed limit (100km/h) 8 Speed limit (120km/h)
9 No passing 10 No passing for vehicles over 3.5 metric tons 11 Right-of-way at the next intersection
12 Priority road 13 Yield 14 Stop
15 No vehicles 16 Vehicles over 3.5 metric tons prohibited 17 No entry
18 General caution 19 Dangerous curve to the left 20 Dangerous curve to the right
21 Double curve 22 Bumpy road 23 Slippery road
24 Road narrows on the right 25 Road work 26 Traffic signals
27 Pedestrians 28 Children crossing 29 Bicycles crossing
30 Beware of ice/snow 31 Wild animals crossing 32 End of all speed and passing limits
33 Turn right ahead 34 Turn left ahead 35 Ahead only
36 Go straight or right 37 Go straight or left 38 Keep right
39 Keep left 40 Roundabout mandatory 41 End of no passing
42 End of no passing by vehicles over 3.5 metric tons


2. Here is an exploratory visualization of the data set.


Step2: Design and Test a Model Architecture

1. Image data preprocessing

I use two methods to do data augmentation (preprocessing).

  1. Randomly rotation: all training images have 50% probability to be rotated within (-30, 30) angle. Rotation is needed considering that in daily life, the camera on a self-driving car may capture a traffice sign with a radom angle. Thus, by doing this help the model has the capability of handling such an application scenario.
  2. Randomly crop: all training images have 50% probability to be cropped from (0.8, 1) of width and height. After being cropped, the images will be resized back to 32x32. Cropping is necessary because the captured traffic signs have variant sizes. Thus, using cropping adds more training data. In addition, randomly cropping provides some smaples with width and height shifts.

Here are some processed images:


2. Model architecture

My model consisted of the following layers:

Layer Input_Size Output_size Description
Convolution(5x5) 32x32x3 28x28x64 (1x1) stride, VALID padding, Relu Activation
Convolution(3x3) 28x28x64 28x28x64 (1x1) stride, SAME padding, Relu Activation
Pooling 28x28x64 14x14x6
Convolution(5x5) 14x14x64 10x10x128 (1x1) stride, VALID padding, Relu Activation
Convolution(3x3) 10x10x128 10x10x128 (1x1) stride, SAME padding, Relu Activation
Pooling 10x10x128 5x5x12
Convolution(3x3) 5x5x128 3x3x256 (1x1) stride, VALID padding, Relu Activation
Pooling 3x3x256 1x1x25
Flatten 1x1x256 256
Fully Connected 256 128 Relu Activation
Dropout layer
Fully Connected 128 43 Softmax Activation

3. Training parameters

I use the following experimental settings to train my model:

  • Optimizer: Adam
  • Initialial Learning Rate: 0.001
  • Batch Size: 128
  • Training Epochs: 100
  • Loss function: cross entropy.

4. Accuracy

During training, I tracked the validation accuracy and only saved the weights achieved the highest validation accuracy. Within 100 training epochs, my architecture achieves the highest val_acc at 84 epoch. (All details about training process can be found in Cell [10])

  • The accuracy on training set: 99.9%
  • The accuracy on validation set: 96.2%
  • The accuracy on testing set: 93.4%

The followings are the reasons why I chose such a model for this task:

  1. In the first convolutional laeyrs, the model focuses on detecting fundamental features, e.g. lines, shapes, or textures. Thus, at the first two blocks (4 convolutional layers), I used two continuous convolutional layers to make sure that the model generates more useful basic features.
  2. Following that, a block with one convolutional layer is used to generate semantic information, e.g. arrows, circles, and etc.. Comparing to the basic features, we have more semantic features, thus, more filters were used in this layer.
  3. A fully connected layer is used to convert a feature map into a verctor for classification.
  4. One Dropout layer is used to avoid overfitting.

Test a Model on New Images

1. Unseen data during traing

Here are five German traffic signs that I found on the web:


2. Prediction results on these new traffic signs.

Here are the results of the prediction:

Image Ground Truth Prediction Predicted Correctly
Go straight or right Go straight or right True
Speed limit (50km/h) Speed limit (50km/h) True
Stop Yield False
Road work Road work True
Turn right ahead Turn right ahead True

The model was able to correctly guess 4 of the 5 traffic signs, which gives an accuracy of 80%. We can see it drops a lot comparing to the accuracy on our testing dataset. The reason is three-fold:

  1. The training dataset is not balance. For example, class 2 (Speed limit (50km/h)) has 2010 training samples, while class 14 (Stop) only consistes of 330 training samples.
  2. The downloaded images are much clearer than the images in our dataset, which may add more noise in the images.
  3. 5 images are not enough to evaluate the performance of a trained model.

3. Prediction details

For the first image, the model is 100% sure that it is a “Go straight or right”. It is correctly predict this image.

Probability Prediction
1.0000000 36 (Go straight or right)
2.8858561e-16 3 (Speed limit (60km/h))
2.1692284e-31 0 (Speed limit (20km/h))
7.6920753e-32 20 (Dangerous curve to the right)
2.2140123e-32 28 (Children crossing)

For the second image, the model is 100% sure that it is a “Speed limit (50km/h)”. It is correctly predict this image.

Probability Prediction
9.9999976e-01 2 (Speed limit (50km/h))
2.8747601e-07 5 (Speed limit (80km/h))
2.2901984e-10 1 (Speed limit (30km/h))
2.2646303e-18 3 (Speed limit (60km/h))
4.5397839e-35 6 (End of speed limit (80km/h))

For the third image, the model is 100% sure that it is a “Yield”, while the ground truth is “Stop”. There are only 690 “Stop” training samples in our dataset, while “Yield” has 1290 training images. This is caused by our un-balance training data.

Probability Prediction
1.0000000e+00 13 (Yield)
4.3748559e-12 28 (Children crossing)
1.5492816e-12 1 (Speed limit (30km/h))
1.0894177e-12 38 (Keep right)
6.2159755e-13 2 (Speed limit (50km/h))

For the fourth image, the model is 100% sure that it is a “Road work”. It is correctly predict this image.

Probability Prediction
1.0000000e+00 25
0.0000000e+00 0 (Speed limit (20km/h))
0.0000000e+00 1 (Speed limit (30km/h))
0.0000000e+00 2 (Speed limit (50km/h))
0.0000000e+00 3 (Speed limit (60km/h))

For the fourth image, the model is 100% sure that it is a “Turn right ahead”. It is correctly predict this image.

Probability Prediction
9.9085110e-01 33 (Turn right ahead)
8.0864038e-03 14 (Stop)
1.0624588e-03 4 (Speed limit (70km/h))
2.1159234e-11 13 (Yield)
7.8768702e-12 2 (Speed limit (50km/h))

Visualizing the model

I visulized the output of the first block. The circle and arrows are very clear.


