Curse of Dimensionality
- The inductive bias of the kNN corresponds to the
assumption that the classification of an instance will be most
similar to other instances that are nearby in Euclidean
distance.
- This means that all attributes are used.
- A problem: 20 attributes but only 2 are relevant to the
target function.
- This is the curse of dimensionality
- One approach is to weigh each attribute differently.
- Stretch $j$th axis by weight $z_j$, where $z_1,
\ldots, z_n$ chosen to minimize prediction error.
- Use cross-validation to automatically choose weights
$z_1, \ldots, z_n$.
- Note setting $z_j$ to zero eliminates this dimension altogether (another approach).
José M. Vidal
.
7 of 18