Tuesday, June 30, 2026
HomeDeep LearningMulti-Task Learning in Deep Learning | Optimization and its Objective Function

Multi-Task Learning in Deep Learning | Optimization and its Objective Function

Multi-Task Learning: Multi-Task gaining knowledge of is a sub-field of machine learning that goals to remedy more than one distinctive duties at the same time, via taking gain of the similarities among unique obligations.

Multitask mastering is a manner to enhance generalization with the aid of pooling the examples arising out of several obligations. Different supervised duties shared the equal enter and some intermediate-level illustration.

We can view multi-project studying as a shape of inductive switch. Inductive transfer can assist improve a model via introducing an inductive bias, which reasons a model to choose a few hypotheses over others. For instance, a not unusual shape of inductive bias isl, regularization, which results in a preference for sparse solutions.

Multi-Task Learning in Deep Learning

Types of Multi-Task Learning

Multi-tasking getting to know is of two types: Hard or smooth parameter sharing of hidden layers.

Hard parameter: It is usually applied through sharing the hidden layers among all obligations, even as preserving numerous venture-particular output layers. Hard parameter sharing substantially reduces the risk of overfitting.

Soft parameter sharing provides a constraint to obtain similarity among related parameters in place of sharing the same fee. Furthermore, we penalize the distinction in parameters across the fashions that we teach for every challenge. By loosely connecting the shared space representations, this approach, in comparison to inflexible sharing, allows sports more flexibility.

In smooth parameter sharing on the other hand, every challenge has its own version with its own parameters. The distance among the parameters of the version is then regularized a good way to encourage the parameters to be similar.

Advantages of Multi-Task Learning

  • Reduce overfitting
  • Implicit records augmentation
  • Regularization: MTL acts as a regularizer with the aid of introducing an inductive bias

Focus on applicable functions: If a assignment is very noisy or records is limited and excessive- dimensional, it can be hard for a model to distinguish among relevant and irrelevant functions. MTL the version recognition its interest on those capabilities that in reality remember as different tasks will provide extra evidence for the relevance or irrelevance of those functions.

Optimization

Optimization trouble in gadget studying algorithms is set finding the suitable set of inputs for a characteristic that gives the most effectiveness in the function assessment.

Be it becoming in the logical regression fashions in device gaining knowledge of or schooling the neural networks various datasets, the problem of optimization arises in all levels of a system getting to know version.

Out of loads of optimization algorithms to be had, it will become tough to pick a single algorithm that could supply the best overall performance even as carried out in our gadget getting to know model. One suggestable technique to execute those optimization algorithms and make the most use of it is to organization those algorithms and execute them on the machine getting to know.

Most of the instances, the “continuous feature optimization” trouble arises in majority of the gadget learning algorithms, in which maximum of the enter given are the actual numbers, and so are the outputs. These forms of issues that take simplest discrete values as enter are usually known as “Combinatorial Optimization Problems”.

The optimization algorithms nation that if extra records about the goal feature may be made to be had, it will become less complicated to optimize that feature and that statistics can also be applied successfully for in addition records processing.

Multi-Task Learning in Deep Learning

The essential point that comes throughout the execution of an optimization set of rules is to decide whether or not we are able to differentiate an objective feature at a given point or now not.

That is, can we calculate the first by-product of a feature for a given solution or no longer? Based in this point, the optimization algorithms are in addition classified into  classes: One that differentiates the function and other that does not. Hence, on this section, we will speak the “differentiable” and “non differentiable” goal capabilities that may be used to institution more than one optimization algorithms.

Differentiable Objective Function

If we will calculate the spinoff of a function at any given point at the same time as input is given to the gadget such function may be referred to as “Differential Function”.

One can define the derivative of a feature because the charge at which the function changes its value at a given point of time. This is frequently referred to as a slope too. One can apply the optimization strategies on these by-product features the use of simple calculus.

Optimization techniques can sound less complicated if the derivatives of these “non-stop features”, as cited above, can be calculated. Some of the algorithms that makes use of those gradient values of the derivatives are as follows:

1. Bracketing algorithms

This technique is while there are issues having best one input variable and the optima exist within the pre-described unique criteria or variety.

These algorithms can effortlessly navigate this variety that is regarded already and discover the optima. The handiest disadvantage is that the algorithm assumes that there’s handiest one optima gift within the model.

The advantage of the use of this set of rules is that it can be applied in a model even supposing from time to time there’s no derivative records approximately the variables to be had. Some of the examples that use bracketing algorithms are Fibonacci search, Golden Section seek, Bisection method, etc.

2. Local descent algorithms

These algorithms work for the fashions in which there are multiple enter variables with one international optima. The set of rules is broadly used in the line seek trouble. This hassle consists of the definition of the path to transport all through a search area and then it plays the bracketing type seek in a line within the path chosen.The set of rules executes till no different iteration of finding the improved guidelines is possible.

These iterations make the set of rules steeply-priced as it keeps its execution until an effective direction is acquired.

3. First order algorithms

These algorithms use the first order derivatives (gradient) to determine the path to move in the search area. This set of rules works by using first calculating the first by-product of the function, after which following it within the opposite route, for example going downhill to minimum price for minimization issues, with the help of step length, also known as “learning charge”.

This step size or mastering fee, is a hyperparameter in the algorithm, that comes to a decision the to cowl or how a long way to cowl in a search area, that’s contrary to the commonly used neighborhood descent algorithms, which do now not have this hyperparameter and plays a complete line search in every guidelines distinct.

These algorithms also are referred to as “Gradient Descent” algorithms and following the advent to some of the minor extensions, those also are known as Momentum, Adagrad, RMSProp, Adam, and many others.

These gradient descent algorithms are also beneficial in education the artificial neural networks and imposing deep learning fashions in it, with the aid of presenting the template for Stochastic Gradient Descent, useful for synthetic neural networks.

Here, the gradient may be based totally on assumption, instead of direct calculation, using the prediction strategies at the skilled records.

4. Second order algorithms

These algorithms use the second order by-product of the input variables for selecting the direction of motion in the seek space. The algorithms work appropriately handiest for the objective functions wherein the Hessian matrix wishes to be calculated.

  • Some of the examples wherein the second order algorithms are used.
  • Newton’s approach
  • Secant approach
  • These algorithms also are referred to as Quasi Newton strategies.

Also Read: Bagging and Boosting in Deep Learning | Advantages and Disadvantages

RELATED ARTICLES

Most Popular

Recent Comments