Projects - Tiago R. de Almeida

RL Studies

Similarly to what I have done with the bag of models for Deep Learning, here I present my Reinforcement Learning studies. It comprises some playground projects where I use reinforcement learning to train agents...

Smart Shower App

This is a homemade project that emulates a smart hands-free shower/tap. This POC is based on a low-cost prototype composed of a Raspberry Pi 3b+, a picamera, an ultrasonic, 3 LEDs, and a servo motor.

Deep Learning Studies

Mar - July 2020

This is a guide for Deep Learning practitioners. It covers Tensorflow and Pytorch techniques to train the best-known models for Image Classification and...

As mentioned before, the study of object detection in this project came at the same time as a task that I had to complete professionally. At that time, I was studying object detection architectures in different road environments for autonomous driving. Therefore, after studying several available datasets, the one that proved to be most representative in terms of the quality/diversity of information ratio was the BDD100K. This dataset is composed of several different types of annotations, and the one we worked on (road object bounding boxes) is divided into 10 different classes: bus, light, sign, person, bike, truck, motor, car, train, and rider.

After choosing the dataset, I also study the state-of-the-art of Object Detection arquitectures. From here, two major types of architectures for the Object Detection task arose: proposals networks and single shot methods. The former is represented by Faster R-CNN, which I had already used in another project. The latter is composed of SSD (Single Shot Detection) and all versions of YOLO. Hence, these are the architectures that I would use to perform this study. Since I had deadlines to present results in my work, the models that I used here are not originally made by me, but based on works of other authors as I will reference later.

1. Faster R-CNN

Faster RCNN is one of the most widely used deep learning models for object detection. Although, its high-latency comparing to single-shot methods, Faster RCNN is performant detecting both small and large objects. The authors of this DL architecture divide the overall architecture into 2 modules, however, it is fairer to divide it into 3 modules: feature maps extractor, RPN (Region Proposals Network) and Fast R-CNN detector. The former is composed of a traditional classification architecture, which is responsible for producing feature maps. In our approach we choose a MobileNetV2 to perform this task due to its low-latency. After that, a small network slides over the feature maps predicting multiple possible proposals for each of its cells. This small network returns a lower-dimensional feature, which is then fed to two 1 * 1 convolutional layers. These layers yield the probability of a proposal bounding a target, and the encoded coordinates of each proposal, respectively. Finally, the features that correspond to objects pass through an ROI pooling layer that crops and rescales each feature. During inference, the non-maximum suppression (NMS) algorithm is computed to filter out the best-located bounding boxes.

The work that we developed here in terms of training and model creation was based on the torchvision module of Pytorch framework.

The numeric results for the validation set, based on COCO metrics are represented in the table below.

	IoU Thresholds	Scales	maxDets	AP/AR values
AP	[0.50 : 0.05 : 0.95]	all	100	0.202
	0.50	all	100	0.409
	0.75	all	100	0.175
	0.95	small	100	0.050
	[0.50 : 0.05 : 0.95]	medium	100	0.243
	[0.50 : 0.05 : 0.95]	large	100	0.432
AR	[0.50 : 0.05 : 0.95]	all	1	0.158
	[0.50 : 0.05 : 0.95]	all	10	0.277
	[0.50 : 0.05 : 0.95]	all	100	0.290
	[0.50 : 0.05 : 0.95]	small	100	0.116
	[0.50 : 0.05 : 0.95]	medium	100	0.355
	[0.50 : 0.05 : 0.95]	large	100	0.519

Finally, I release videos that demonstrate part of the qualitative results of the trained model in frames acquired in Aveiro roads. One example of those videos is shown below.

2. SSD512

Single shot models can process the input faster due to the respective tasks - localization and classification - be done in a single forward fashion. Here, SSD is presented as well as its results in the validation set of the dataset used in this work. This architecture is characterized by its base network (or backbone), the usage of multi-scaled feature maps for the detection task, and the respective convolutional predictors. MobileNetV2 was used to perform the perception of the image features and then was truncated before the classification layers. Hence, some of the final layers of MobileNet and additional feature layers allow multiple scales predictions. Each of these extra layers can produce a fixed set of detection predictions using a set of convolutional filters. Finally, the output of the model is the score for a category and the location of the box that bounds the target object.

This work, in terms of code, is based on the one of qfgaohao. However, here I did some adaptations to increase the performance of the model. One of them is the 512*512 input size.

Finally, the numeric results for the BDD100K validation set are represented in the table below.

	IoU Thresholds	Scales	maxDets	AP/AR values
AP	[0.50 : 0.05 : 0.95]	all	100	0.083
	0.50	all	100	0.131
	0.75	all	100	0.085
	0.95	small	100	0.002
	[0.50 : 0.05 : 0.95]	medium	100	0.044
	[0.50 : 0.05 : 0.95]	large	100	0.293
AR	[0.50 : 0.05 : 0.95]	all	1	0.068
	[0.50 : 0.05 : 0.95]	all	10	0.093
	[0.50 : 0.05 : 0.95]	all	100	0.093
	[0.50 : 0.05 : 0.95]	small	100	0.005
	[0.50 : 0.05 : 0.95]	medium	100	0.052
	[0.50 : 0.05 : 0.95]	large	100	0.334

Although a huge difference between the numerical results for the validation set between the two architectures presented so far, this model is also performant on Aveiro roads. Please, check the video below.

3. YOLOV4

All YOLO architectures are also single-shot methods, and that is why they achieve high-speed predictions. The authors have been presenting several evolutions, which is proved by the amount of YOLO versions that exist - 4 until the writing date of this post (YOLO, YOLOv2, YOLOv3, and YOLOv4). This architecture has always shown low-latency and, therefore, what has been the focus along the various versions is the localization performance. YOLOv4 is composed of a Cross Stage Partial (CSP) Darknet53 with an SPP module, a path-aggregation net (PANet), and a YOLOv3 head. CSP networks have similar basis and purposes to a DenseNet. Therefore, this type of architectures enhances the features reuse by reducing the amount of repeated gradient information observed in a DenseNet. To do so, it divides the base feature map, then a part of the channels passes through a partial dense block and the other part undergoes to the final partial transition layer. After activation maps production, the only difference between YOLOv3 and YOLOv4 in terms of architecture's layout is the global features concatenation. Instead of the FPN technique, a custom PANet approach is performed. PANet is simply an enhanced version of FPN; after the FPN's block composed of a top-down pathway with lateral connections, PANet also propagates low-level features through a bottom-up path augmentation block. This block allows the addition (concatenation for YOLOv4) of the FPN resulting features with the output of those feature maps with 3*3 convolutions, which yields an even better understanding of the low-level features.

This work, in terms of code, is based on the one of Ultralytics with some changes to allow the usage of the Ignite framework.

Finally, the numeric results for the BDD100K validation set are represented in the table below.

	IoU Thresholds	Scales	maxDets	AP/AR values
AP	[0.50 : 0.05 : 0.95]	all	100	0.105
	0.50	all	100	0.209
	0.75	all	100	0.092
	0.95	small	100	0.053
	[0.50 : 0.05 : 0.95]	medium	100	0.223
	[0.50 : 0.05 : 0.95]	large	100	0.326
AR	[0.50 : 0.05 : 0.95]	all	1	0.107
	[0.50 : 0.05 : 0.95]	all	10	0.220
	[0.50 : 0.05 : 0.95]	all	100	0.257
	[0.50 : 0.05 : 0.95]	small	100	0.187
	[0.50 : 0.05 : 0.95]	medium	100	0.467
	[0.50 : 0.05 : 0.95]	large	100	0.511

I also deployed this model on a Nvidia Jetson AGX Xavier device and you can check the result in the video below and the demo code is available in tmralmeida.

You can check the repository at tmralmeida.

Data Matrix Detection

Mar 2020

This work presents an implementation of a Faster R-CNN model to detect Data Matrix. This architecture demonstrated quite accurate and consistent results by...