Activation Functions: Dive into an optimal activation function
Abstract
Activation functions have come up as one of the essential components of neural networks. The choice of adequate activation function can impact the accuracy of these methods. In this study, we experiment for finding an optimal activation function by defining it as a weighted sum of existing activation functions and then further optimizing these weights while training the network. The study uses three activation functions, ReLU, tanh, and sin, over three popular image datasets, MNIST, FashionMNIST, and KMNIST. We observe that the ReLU activation function can easily overlook other activation functions. Also, we see that initial layers prefer to have ReLU or LeakyReLU type of activation functions, but deeper layers tend to prefer more convergent activation functions.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.