Gradient-Based Supervised Learning Machine (lecture note)

*lecture note

Neural Nets, and many other models 1

Decision Rule

y = F(W, X)

where F is some function, and W some parameter vector

Loss function

L(W, y^i, X^i) = D(y^i, F(W, X^i))

where D(y, f) measures the “discrepancy” between A and B.

Gradient of loss

\frac{\partial L(W, y^i, X^i)'}{\partial W} = \frac{\partial D(y^i, f)}{\partial f} \frac{\partial F(W, X^i)}{\partial W}

Update rule

W(t+1) = W(t) - \eta(t) \frac{\partial D(y^i, f)}{\partial f} \frac{\partial F(W, X^i)}{\partial W}

Three questions:

  • What architecture F(W, X)
  • What loss function L(W, y^i, X^i)
  • What optimization method
Advertisements

One thought on “Gradient-Based Supervised Learning Machine (lecture note)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s