Bias vs. Variance


  • Symptom
    $ J_{cv}(\theta) -1 $ and $ J_{train}(\theta) $ are high.
  • Prescription
  1. Getting additional features
  2. Adding polynomial features ($x_{1}^{2}, x_{2}^{2}, x_1x_2, etc $)
  3. Decreasing $\lambda $


  • Symptom
    $ J_{cv}(\theta) \gg J_{train}(\theta) $ and $J_{train}(\theta) $ is low.

  • Prescription

    1. Getting more training samples
    2. Getting rid of some features
    3. Increasing $\lambda$


Very big $\lambda \rightarrow $ Bias(underfiiting)
very small $\lambda \rightarrow $ Variance(overfilling)

$\lambda $ selection : use the same training set and select the lambda that leads to the smallest CV Error and to check the Test Error.

Learning Curve

  • High Bias
    $J_{train}(\theta) $ is close to $J_{cv}(\theta) $.

Getting more data is useless!

  • High Variance
    There is a gap between $J_{train}(\theta) $ and $J_{cv}(\theta) $.

Getting more data may give a better result.

Neural networks and overfitting

  • Using “large” neural network with good regularization to address overfiting is usually better than “small” neural network, but the computation cost is more expensive.

   Reprint policy

《Bias vs. Variance》 by 卢宁 is licensed under a Creative Commons Attribution 4.0 International License