Bias
- Symptom
$ J_{cv}(\theta) -1 $ and $ J_{train}(\theta) $ are high. - Prescription
- Getting additional features
- Adding polynomial features ($x_{1}^{2}, x_{2}^{2}, x_1x_2, etc $)
- Decreasing $\lambda $
Variance
Symptom
$ J_{cv}(\theta) \gg J_{train}(\theta) $ and $J_{train}(\theta) $ is low.Prescription
- Getting more training samples
- Getting rid of some features
- Increasing $\lambda$
Regularization
Very big $\lambda \rightarrow $ Bias(underfiiting)
very small $\lambda \rightarrow $ Variance(overfilling)
$\lambda $ selection : use the same training set and select the lambda that leads to the smallest CV Error and to check the Test Error.
Learning Curve
- High Bias
$J_{train}(\theta) $ is close to $J_{cv}(\theta) $.
Getting more data is useless!
- High Variance
There is a gap between $J_{train}(\theta) $ and $J_{cv}(\theta) $.
Getting more data may give a better result.
Neural networks and overfitting
- Using “large” neural network with good regularization to address overfiting is usually better than “small” neural network, but the computation cost is more expensive.