In stats, predictors/independent variables = input, responses/dependent variables = output.
Regression when we predict quantitative outputs, and classification when we predict qualitative outputs.
We will typically denote an input variable by the symbol X. Quantitative outputs will be denoted by Y , and qualitative outputs by G (for group).
Observed values are written in lowercase; hence the i-th observed value of X is written as .
Matrices are represented by bold uppercase letters; for example, a set of N input p-vectors , i = 1,…,N would be represented by the N×p matrix X.
In general, vectors will not be bold, except when they have N components.
Nearest-neighbor methods use those observations in the training set closest in input space to to form . Specifically, the k-nearest neighbor fit for is defined as follows:
where is the neighborhood of defined by the closest points in the training sample.
It has high variance and low bias.