# Chapter 2. Overview of Supervised Learning

In stats, **predictors/independent variables** = **input**, **responses/dependent variables** = **output**.

## Difference between **regression** and **classification**

Regression when we predict quantitative outputs, and classification when we predict qualitative outputs.

## Some **notations** on dataset

We will typically denote an **input** variable by the symbol **X**. **Quantitative outputs** will be denoted by **Y** , and **qualitative outputs** by **G** (for group).

**Observed values** are written in lowercase; hence the i-th observed value of **X** is written as .

**Matrices** are represented by bold uppercase letters; for example, a set of N input p-vectors , i = 1,…,N would be represented by the N×p matrix **X**.

In general, vectors will not be bold, except when they have N components.

## What is **Nearest Neighbors**?

Nearest-neighbor methods use those observations in the training set closest in input space to to form . Specifically, the k-nearest neighbor fit for is defined as follows:

where is the neighborhood of defined by the closest points in the training sample.

It has **high variance and low bias**.