Know about EfficientNet & Implementation from Scratch Using Pytorch

Sahil -

Follow

4 min readApr 16, 2024

--

Hi Guys! In this blogs, I will share my knowledge, after reading this research paper, what it is all about!

Abstract

Study model scaling and identify that carefully balancing network depth, width and resolution can lead to better performance.
Propose — a new scaling method that uniformly scales all dimensions of depth/width/resolution using a simple called ‘compound coefficient’
Use — design a new baseline network and scale it up to obtain a family of models called ‘EfficientNets’ which achieve better accuracy and efficiency.
Result — EfficientNet-B7 achieve top-1 accuracy on ImageNet while being 8.4x smaller and 6.1 faster on inference than earliers ConvNets models.

Introduction

The process of scaling up ConvNets has never been well understood and there are many ways to do it.
Common way is to increase the depth or width.
Another less common way is to scale up models by image resolution.
Though it is possible to scale two or three dimensions arbitrarily, scaling requires tedious manual tuning and still often yields sub-optimal accuracy and efficiency.
Questions arises?

Is there a principled method to scale up ConvNets that can achieve better accuracy and efficiency?

Empirical study shows that it is critical to balance all dimensions of network width/depth/resolution and balance can be achieved by simply scaling each of them with constant ratio.

So, this paper was the first empirically quantify the relationship among all three dimensions of network width, depth and resolution.

Compound Model Scaling

Problem Formulation

In this section, this paper explained the mathematics how ConvNets is computed.

Now, as this paper proposed that the network should scaled based on width (w), depth (d) and resolution (r).

Scaling Dimensions

Setting baseline model of one multiplier, and others to keep constant. For example, increase width multiplier and keep other two constants value 1.

Based on these, the graph has shown that accuracy quickly saturated after certain value of the multiplier (width or depth or resolution).

Compound Scaling

EfficientNet Architecture

Table for EfficientNet-B0 Baseline Architecture

In this paper, the MBConv block information was provided too vague. I explored from Google and ChatGPT. This is where the idea came into this picture.

Code

In the above code, you have noticed that gamma was not used for resolution multiplier. During the training, we have to provide the resolution multiplier (r) and according to (r), we need to adjust the width multiplier (w) and depth multiplier (d). Make sure that these are under the constraint on these multipliers.

Results

Comparison EfficientNet with earlier ConvNets models

Comparison on different dataset for pre-Trained EfficientNet model with other ConvNet models

I tried my best to build from scratch to get “close” with number of parameter mentioned in research paper. It is not much nearly but not specifically mentioned where the author has tweaked the parameter to match.