Jump to content

kaan97

New Members
  • Posts

    1
  • Joined

  • Last visited

Posts posted by kaan97

  1. I have received the following problem:

    Concider the following simple model of a neuron

    z = wx + b logits,

    yˆ = g(z) activation,

    L2 (w, b) = 12 (y − ŷ)^2 quadratic loss (Mean Squared Error (MSE), L2 loss, l2-norm),

    L1 (w, b) = |y − yˆ| absolut value loss (Mean Absolut Error (MAE), L1 loss, l1-norm),

    with x,w,b ∈ R. Calculate the derivatives ∂L/∂w and ∂L/∂b for updating the weight w and bias b. Determine the results for both loss functions (L1, L2) and assume a sigmoid and a tanh activation function g(z). Write down all steps of your derivation. Hint: You have to use the chain rule.

    I have considered the following approach for example L2 derived after w:

     

    \begin{equation}
    L_2 = \frac{1}{2} (y - \hat{y})^2 = \frac{1}{2} \left( y - g(z) \right)^2 = \frac{1}{2} \left( y - g(wx + b) \right)^2 = \frac{1}{2} \left( y - \frac{1}{1 + e^{-wx + b}} \right)^2
    \end{equation}

    This expression could then be easily derived using the chain rule. 
    However, in the literature I find the following approach:

     

    \begin{equation}
    \frac{\partial L_2}{\partial w} = \frac{\partial L_2}{\partial \hat{y}} \times \frac{\partial \hat{y}}{\partial z} \times \frac{\partial z}{\partial w} = -(y - \hat{y}) \times g'(z) \times x
    \end{equation}


    So which of the two approaches is the right one in this context? Or is there even an adner connection?

     

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.