Task-1 : Localization Problem

Problem Statement

The task is to do the localization along with classification for the given images. With object localization the network identifies where the object is, putting a bounding box around it. Additionally, class label for particular detected object would be given by this network. In other words, the neural network will output the four numbers (for bounding box), plus the probability of class labels.

Pre-processing

Converted all images to dimensions to 224*224 pixels using linear interpolation. This helped in dealing with images of various dimensions available in different classes. Images were resized to (224, 224, 3). The corresponding bounding boxes coordinates were also resized accordingly.

Resized coordinates of bounding boxes according to new image size

   ratio = NEW_IMAGE_SIZE/OLD_IMAGE_SIZE
   (new_x1, new_y1) = (ratio\*x1, ratio\*y1)

Normalized images by dividing each pixel by 255

Model

I used a VGG16 as base model.

   model=VGG16(include_top=False, weights=None, input_shape=(224,224,3))

After getting conv. layer from VGG16, I created two separate networks - one for classification and other for regression.

Optimizer=Adam
Loss= {‘classification’:’categorical_crossentropy’,’regression’:’mean_squared_error’}

Model Architecture : Part 1 (single object) Model

Model Architecture : Part 2 (multiple object) Model