The majority of the segmentation architectures like Pyramid Architecture Network, PSPnet, and U-Net have used the encoding-decoding strategy. The encoder encodes information into feature space, and the decoder maps this information into spatial categorization to perform segmentation. Since the encoder encodes the information in feature space, the image is heavily downsampled. The major issue with semantic segmentation is upsampling this feature map to the original resolution and preserving the categorization of the pixels.
In LinkNet, the input of each encoder layer is also passed to the output of its corresponding decoder. By doing this lost spatial information is recovered that can be used by the decoder and its upsampling operations.
We can see the passing of the encoded information to the corresponding decoder block.
LinkNet has given state-of-the-art results on CamVid data.
The network that extracts the feature map. Here, the encoder network can be selected to be ResNet or Efficient Net.
It is the initialization of the weights of the LinkNet. Here, the weights are initialized randomly.
The initialization here is not completely random because the encoder block has been initialized with the ResNet18-ImageNet or EfficientNet-B0-ImageNet depending on which encoder network is selected.