Neural Network Activation Functions

Which activation function should you use for the hidden layers of your deep neural networks?

Although your mileage will vary, in general SELU > ELU > leaky ReLU (and its variants) > ReLU > tanh > logistic.

If the network’s architecture prevents it from self-normalizing, then ELU may perform better than SELU (since SELU is not smooth at z = 0).

If you care a lot about runtime latency, then you may prefer leaky ReLU.

If you don’t want to tweak yet another hyperparameter, you may use the default α values used by Keras (e.g., 0.3 for leaky ReLU).

If you have spare time and computing power, you can use cross-validation to evaluate other activation functions, such as RReLU if your network is overfitting or PReLU if you have a huge training set.

That said, because ReLU is the most used activation function (by far), many libraries and hardware accelerators provide ReLU-specific optimizations; therefore, if speed is your priority, ReLU might still be the best choice.

Reference: Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, ۲nd Edition, by Aurélien Géron, Sep 2019

Neural Network Activation Functions

یوسف مهرداد

دیدگاهتان را بنویسید لغو پاسخ

Neural Network Activation Functions

یوسف مهرداد

دیدگاهتان را بنویسید لغو پاسخ

برای خروج از جستجو کلید ESC را بفشارید