Bias Vs. Variance

Jul 22, 2017


  • Symptom $ J_{cv}(\theta) -1 $ and $ J_{train}(\theta) $ are high.
  • Prescription
    1. Getting additional features
    2. Adding polynomial features ($x_{1}^{2}, x_{2}^{2}, x_1x_2, etc $)
    3. Decreasing $\lambda $


  • Symptom $ J_{cv}(\theta) \gg J_{train}(\theta) $ and $J_{train}(\theta) $ is low.
  • Prescription

    1. Getting more training samples
    2. Getting rid of some features
    3. Increasing $\lambda$


Very big $\lambda \rightarrow $ Bias(underfiiting) very small $\lambda \rightarrow $ Variance(overfilling)

$\lambda $ selection : use the same training set and select the lambda that leads to the smallest CV Error and to check the Test Error.

Learning Curve

  • High Bias $J_{train}(\theta) $ is close to $J_{cv}(\theta) $. Getting more data is useless!
  • High Variance There is a gap between $J_{train}(\theta) $ and $J_{cv}(\theta) $. Getting more data may give a better result.

Neural networks and overfitting

  • Using “large” neural network with good regularization to address overfiting is usually better than “small” neural network, but the computation cost is more expensive.