Which activation function should you use for the hidden layers of your deep neural networks?
Although your mileage will vary, in general SELU > ELU > leaky ReLU (and its variants) > ReLU > tanh > logistic.
If the network’s architecture prevents it from self-normalizing, then ELU may perform better than SELU (since SELU is not smooth at z = 0).
If you care a lot about runtime latency, then you may prefer leaky ReLU.
If you don’t want to tweak yet another hyperparameter, you may use the default α values used by Keras (e.g., 0.3 for leaky ReLU).
If you have spare time and computing power, you can use cross-validation to evaluate other activation functions, such as RReLU if your network is overfitting or PReLU if you have a huge training set.
That said, because ReLU is the most used activation function (by far), many libraries and hardware accelerators provide ReLU-specific optimizations; therefore, if speed is your priority, ReLU might still be the best choice.
Reference: Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, ۲nd Edition, by Aurélien Géron, Sep 2019
دیدگاهتان را بنویسید