Neural architecture search

E.g., for object detection, the learned cells integrated with the Faster-RCNN framework improved performance by 4.0% on the COCO dataset.

The controller is trained with policy gradient to select a subgraph that maximizes the validation set's expected reward.

Multiple child models share parameters, ENAS requires fewer GPU-hours than other approaches and 1000-fold less than "standard" NAS.

At each iteration, BO uses a surrogate to model this objective function based on previously obtained architectures and their validation errors.

One then chooses the next architecture to evaluate by maximizing an acquisition function, such as expected improvement, which provides a balance between exploration and exploitation.

Recently, BANANAS[18] has achieved promising results in this direction by introducing a high-performing instantiation of BO coupled to a neural predictor.

Another group used a hill climbing procedure that applies network morphisms, followed by short cosine-annealing optimization runs.

The approach yielded competitive results, requiring resources on the same order of magnitude as training a single network.

RL or evolution-based NAS require thousands of GPU-days of searching/training to achieve state-of-the-art computer vision results as described in the NASNet, mNASNet and MobileNetV3 papers.

More recent works further combine this weight-sharing paradigm, with a continuous relaxation of the search space,[25][26][27][28] which enables the use of gradient-based optimization methods.

These approaches are generally referred to as differentiable NAS and have proven very efficient in exploring the search space of neural architectures.

[27] However, DARTS faces problems such as performance collapse due to an inevitable aggregation of skip connections and poor generalization which were tackled by many future algorithms.

[29][30][31][32] Methods like [30][31] aim at robustifying DARTS and making the validation accuracy landscape smoother by introducing a Hessian norm based regularisation and random smoothing/adversarial attack respectively.

[33] Differentiable NAS has shown to produce competitive results using a fraction of the search-time required by RL-based search methods.

[37][38] Neural architecture search often requires large computational resources, due to its expensive training and evaluation phases.

To overcome this limitation, NAS benchmarks[39][40][41][42] have been introduced, from which one can either query or predict the final performance of neural architectures in seconds.