Activation functions are used in the model to introduce nonlinearity. You can think of nonlinearity as a curve line, which means that without a nonlinearity models can only use straight lines only to learn the parameters.
Relu
Relu activation function can be used anywhere in the model, it is used after the final layer only if your output is natural number. Relu retire zero is if the input number is positive and if the input number is positive then it remains the same.
The only disadvantage to relu is that its output is zero when the input is a negative number.
Relu Range: 0 to inf
LeakyReLU
Leaky Relu activation function is used in the intermediate layers of deep learning models.
Leaky Relu Range: -inf to +inf
There are some other activation functions which are commonly used but these activation functions contribute to the vanishing gradient, because these reduce the big positive number and makes them close to 1, and any negative number will be close to 0. If you use such an activation function multiple times in a model then the numbers will keep getting smaller along with the gradient of those numbers.
Sigmoid
Sigmoid is commonly used after the last layer in deep learning, because it is one of the activation functions which contribute to the vanishing gradient.
Sigmoid Range: 0 to 1
Tanh
Tanh function is mostly used in the intermediate layers of model,
Tanh Range: -1 to +1
Softmax
Softmax function is used when we have multiple classes for our model, such as if a picture contains a human or dog or car, in this case we have three different classes. But a picture can only contain one of these three at a time in a single image.
Softmax Range: 0 to 1
In the above example we will have 3 numbers, [0.2, 0.3, 0.5] sum of these numbers will be equal to 1. You can interpret it as 20% chance it is a Human, 30% chance it is a Dog and 50% chance it is a Car.