Jump to content

kaan97

New Members
  • Posts

    1
  • Joined

  • Last visited

kaan97's Achievements

Lepton

Lepton (1/13)

0

Reputation

  1. I have received the following problem: Concider the following simple model of a neuron z = wx + b logits, yˆ = g(z) activation, L2 (w, b) = 12 (y − ŷ)^2 quadratic loss (Mean Squared Error (MSE), L2 loss, l2-norm), L1 (w, b) = |y − yˆ| absolut value loss (Mean Absolut Error (MAE), L1 loss, l1-norm), with x,w,b ∈ R. Calculate the derivatives ∂L/∂w and ∂L/∂b for updating the weight w and bias b. Determine the results for both loss functions (L1, L2) and assume a sigmoid and a tanh activation function g(z). Write down all steps of your derivation. Hint: You have to use the chain rule. I have considered the following approach for example L2 derived after w: \begin{equation} L_2 = \frac{1}{2} (y - \hat{y})^2 = \frac{1}{2} \left( y - g(z) \right)^2 = \frac{1}{2} \left( y - g(wx + b) \right)^2 = \frac{1}{2} \left( y - \frac{1}{1 + e^{-wx + b}} \right)^2 \end{equation} This expression could then be easily derived using the chain rule. However, in the literature I find the following approach: \begin{equation} \frac{\partial L_2}{\partial w} = \frac{\partial L_2}{\partial \hat{y}} \times \frac{\partial \hat{y}}{\partial z} \times \frac{\partial z}{\partial w} = -(y - \hat{y}) \times g'(z) \times x \end{equation} So which of the two approaches is the right one in this context? Or is there even an adner connection?
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.