Jump to content

Deep learning weight adjustment clarity sought

Featured Replies

Hello,

 

Would someone be able to help explain the meaning of a term in a formula? The one in question is the deep learning weight adjustment formula from DeepNeuralNetworks.

 

And here it is.

 

post-110712-0-88479600-1423729238.png
So this shows the the iterative adjustment for the weights. Or is it the adjustment to the change applied to weights (as indicated by the delta)?
But the main part I am unclear about is this.
post-110712-0-30310600-1423729239.png

 

C is the cost function but what does this term mean?

 

post-110712-0-88479600-1423729238.png

post-110712-0-30310600-1423729239.png

I know nothing about the underlying subject matter of neural networks, deep or otherwise.

 

But the equation you refer to looks like a standard finite element iteration/discretisation from t to (t+1) where delta is the shift function and t is the iteration counter in some numerical approximation of the underlying mathematical equation.

 

Since steepest descent methods are discussed, I would hazard a guess that this is due to linearization of an underlying nonlinear controlling mathematical equation. This is a common numerical approach in such cases.

Edited by studiot

  • Author

 

But the equation you refer to looks like a standard finite element iteration/discretisation from t to (t+1) where delta is the shift function and t is the iteration counter in some numerical approximation of the underlying mathematical equation.

 

Since steepest descent methods are discussed, I would hazard a guess that this is due to linearization of an underlying nonlinear controlling mathematical equation. This is a common numerical approach in such cases.

 

I'm afraid I understood very little of what you said, and am not sure if you were addressing my question. Are you able to explain what the term I referred to is in a simple way?

If you're referring to the notation itself, it denotes a partial derivative, in this case the partial derivative of the cost function with respect to the variable wij.

 

Roughly, conceptually, you can think of this as referring to the rate at which the value of the function changes with respect to the variable. That is to say (since I'm not sure what mathematical training you've had), holding any other variables constant, we change wij and see how the value of the cost function varies in response.

Edited by John

  • Author

If you're referring to the notation itself, it denotes a partial derivative, in this case the partial derivative of the cost function with respect to the variable wij.

 

Roughly, conceptually, you can think of this as referring to the rate at which the value of the function changes with respect to the variable. That is to say (since I'm not sure what mathematical training you've had), holding any other variables constant, we change wij and see how the value of the cost function varies in response.

 

Ok thanks. I now recall doing partial derivatives at school, but that was over thirty years ago so am trying to remember what they mean. So in this context it is the rate of change of the cost function.

 

In practical terms is this change (of an unknown function) computed just by taking the change of the function from the previous iteration? Could it also be done by taking a moving average (exponential smoothing) of the change?

 

With regards to the weight adjustment this would mean that if the cost function increases the change (the partial derivative) is positive and so the weight is increased. If the cost function decreases the change is negative and so the weight is decreased. In this way the weight should converge on a value that keeps the cost function at a maximum. If the weight value goes to high and results in a decrease of the cost function the adjustment will be in the opposite direction. (Change signs to minimize rather than maximize the function).

 

Does that sound about right, or are there other things that should be taken into account?

In practical terms is this change (of an unknown function) computed just by taking the change of the function from the previous iteration? Could it also be done by taking a moving average (exponential smoothing) of the change?

For the first question, if I'm understanding you correctly, then yes. For the second, I don't know.

 

With regards to the weight adjustment this would mean that if the cost function increases the change (the partial derivative) is positive and so the weight is increased. If the cost function decreases the change is negative and so the weight is decreased. In this way the weight should converge on a value that keeps the cost function at a maximum. If the weight value goes to high and results in a decrease of the cost function the adjustment will be in the opposite direction. (Change signs to minimize rather than maximize the function).

 

Does that sound about right, or are there other things that should be taken into account?

Well, with the caveat that I know very little about machine learning, I believe the idea is to iteratively minimize the cost function. I don't know why the equation on Wikipedia involves addition rather than subtraction. It may be a typo, or it may be that I'm misunderstanding how gradient descent is applied to training deep neural networks.

Archived

This topic is now archived and is closed to further replies.

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.