CascadeMask R-CNN
CascadeMask R-CNN extends Cascade R-CNN by adding an extra mask head to the cascade.
Here we can see that the segmentation branch "S" is placed parallel to the detection branch.
Cascade Mask R-CNN in Model Playground
Hyperparameters
Typically, the following hyperparameters are tweaked when using Cascade Mask R-CNN:
Backbone network
Specifying the architecture for the network on which Cascade Mask R-CNN is built.
IoU thresholds
These thresholds are used to decide if an anchor box generated contains an object or is part of the background.
Everything that is above the upper IoU threshold of the proposed anchor box and ground truth label will be classified as an object and forwarded. Everything below the lower threshold will be classified as background and the network will be penalized. For all the anchor boxes with an IoU between the thresholds, we're not sure if it's for- or background and we'll just ignore them.
Number of convolution filters in the ROI box head
This is identifying how many convolution filters the final layer to make the classification contains. To a certain degree, increasing the number of filters will enable the network to learn more complex features, but the effect vanishes if you add too many filters and the network will perform worse (see the original ResNet paper to understand why you cannot endlessly chain convolution filters).
Number of fully connected layers in the ROI box head
This is identifying how many fully connected layers (FC) the last part of the network contains. Increasing the number of FCs can increase performance for a computational cost, but you might overfit the sub-network if you add too many.
Weights
It's the weights to use for model initialization, and in Model Playground R50-Cascade-COCO weights are used.
NMS number of proposals
Pre NMS
The maximum of proposals that are taken into consideration by NMS. The proposals are sorted descending after confidence and only the ones with the highest confidence are chosen.
Post NMS
The maximum of proposals that will be forwarded to the ROI box head. Again, the proposals are sorted descending after confidence and only the ones with the highest confidence are chosen.
Config for training
Low numbers of NMS proposals in training will result in a lower recall, but higher precision. Vice versa.
Config for testing
Here, the number of NMS proposals for the simple forward pass, e.g., inference, is defined. Less NMS proposals will increase inference speed, but higher numbers yield greater performance.
Advanced Options
Stages to Freeze
Freezing the stages in a Neural Network is a technique that was introduced to reduce the computation. After the stages have been frozen, the architecture doesn't have to backpropagate to it. Freezing many stages will help the computation be faster but aggressive freezing can degrade the model output and can result in sub-optimal predictions.
Pooler resolution (Box Head)
It is the spatial size to pool proposals before feeding them to the box predictor, in the model playground default value is set as 7.
Pooler resolution (Mask Head)
It is the spatial size to pool proposals before feeding them to the mask predictor, in the model playground default value is set as 14.
Pooler Sampling Ratio
After extracting the Region of Interests from the feature map, they should be adjusted to a certain dimension before feeding them to the fully connected layer that will later do the actual object detection. For this, ROI Align is used which makes use of points that would be sampled from a defined grid, to resize the ROIs. The number of points that we use is defined by Pooler Sampling Ratio.