## Section: Scientific Foundations

### Nearest neighbor estimates

See  6.1 and  6.2 .

In pattern recognition and statistical learning, nearest neighbor algorithms are amongst the most simple available. Nevertheless, they are also very powerful and since the pioneering works by Fix and Hodges  [41] , [42] they have generated a large amount of literature and developments. Basically, given a training set of data, i.e. an N –sample of i.i.d. object–feature pairs (Xi, Yi) for , with real–valued features, we want to be able to generalize, that is to guess the feature Y associated with any new object X , with the same probability distribution as the Xi 's. To achieve this, one chooses some integer k smaller than N , and takes the mean–value of the k features associated with the k objects that are nearest to the new object X , for some given metric. From the beginning it was clear that even simple, this method is very powerful.

In general, there is no way to guess exactly the value of Y , and the minimal error that can be done is that of the Bayes estimate , which cannot be computed by lack of knowledge of the distribution of the pair, but the Bayes estimate will help us to characterize the strength of the method. So the best we can wish is that our estimate converges, say when the sample size grows, to the Bayes estimate. This is what has been proved in great generality by Stone  [71] for the mean square convergence, provided that X is a d -dimensional vector, Y is square–integrable, and the ratio k/N goes to 0. Nearest neighbor estimate is not the only local averaging estimate having this property, but it is arguably the simplest.

The situation is radically different in general infinite dimensional spaces. In this respect, Cérou and Guyader [3] present counterexamples indicating that the estimate is not consistent, and they argue that restrictions on the state space and the distribution of (X, Y) cannot be dispensed with. First of all, it must be separable for the norm used to compute the neighbors, as already noticed by Cover and Hart  [33] . But this is not enough. By working out arguments in Preiss  [67] , Cérou and Guyader [3] exhibit a random variable X with Gaussian distribution in a separable Hilbert space for which the estimate fails to be consistent. On the positive side, these authors provide a general condition, called the –continuity condition, which ensures the consistency of the estimate. Even with this recent results, the situation in infinite dimension is not completly clear, and this is still an interesting field for investigation.

In settings for which the estimate is convergent, there is still the question of the rate of convergence, and how to choose the parameter k in order to achive the best rate of convergence. As noticed by Kulkarni and Posner  [57] , the rate of convergence of the nearest neighbors is closely related to the notion of entropy, introduced in the late fifties by Kolmogorov and Tikhomirov  [56] . These tools are to be used to study cases and algorithm refinements that are not yet to be found in the literature.

Logo Inria