# Chapter 2. Overview of Supervised Learning

In stats, predictors/independent variables = input, responses/dependent variables = output.

## Difference between regression and classification

Regression when we predict quantitative outputs, and classification when we predict qualitative outputs.

## Some notations on dataset

We will typically denote an input variable by the symbol X. Quantitative outputs will be denoted by Y , and qualitative outputs by G (for group).

Observed values are written in lowercase; hence the i-th observed value of X is written as .

Matrices are represented by bold uppercase letters; for example, a set of N input p-vectors , i = 1,…,N would be represented by the N×p matrix X.

In general, vectors will not be bold, except when they have N components.

## What is Nearest Neighbors?

Nearest-neighbor methods use those observations in the training set closest in input space to to form . Specifically, the k-nearest neighbor fit for is defined as follows:

where is the neighborhood of defined by the closest points in the training sample.

It has high variance and low bias.