Bias vs. Variance


  • Symptom
    $ J_{cv}(\theta) -1 $ and $ J_{train}(\theta) $ are high.
  • Prescription
  1. Getting additional features
  2. Adding polynomial features ($x_{1}^{2}, x_{2}^{2}, x_1x_2, etc $)
  3. Decreasing $\lambda $


  • Symptom
    $ J_{cv}(\theta) \gg J_{train}(\theta) $ and $J_{train}(\theta) $ is low.

  • Prescription

    1. Getting more training samples
    2. Getting rid of some features
    3. Increasing $\lambda$


Very big $\lambda \rightarrow $ Bias(underfiiting)
very small $\lambda \rightarrow $ Variance(overfilling)

$\lambda $ selection : use the same training set and select the lambda that leads to the smallest CV Error and to check the Test Error.

Learning Curve

  • High Bias
    $J_{train}(\theta) $ is close to $J_{cv}(\theta) $.

Getting more data is useless!

  • High Variance
    There is a gap between $J_{train}(\theta) $ and $J_{cv}(\theta) $.

Getting more data may give a better result.

Neural networks and overfitting

  • Using “large” neural network with good regularization to address overfiting is usually better than “small” neural network, but the computation cost is more expensive.


《Bias vs. Variance》 卢宁 采用 知识共享署名 4.0 国际许可协议 进行许可。