## Section: Research Program

### Machine Learning & Structured Prediction

The foundation of statistical inference is to learn a function that minimizes the expected loss of a prediction with respect to some unknown distribution

where $\ell (f,x,y)$ is a problem specific loss function that encodes a penalty for predicting $f\left(x\right)$ when the correct prediction is $y$. In our case, we consider $x$ to be a medical image, and $y$ to be some prediction, e.g. the segmentation of a tumor, or a kinematic model of the skeleton. The loss function, $\ell $, is informed by the costs associated with making a specific misprediction. As a concrete example, if the true spatial extent of a tumor is encoded in $y$, $f\left(x\right)$ may make mistakes in classifying healthy tissue as a tumor, and mistakes in classifying diseased tissue as healthy. The loss function should encode the potential physiological damage resulting from erroneously targeting healthy tissue for irradiation, as well as the risk from missing a portion of the tumor.

A key problem is that the distribution $P$ is unknown, and any algorithm that is to estimate $f$ from labeled training examples must additionally make an implicit estimate of $P$. A central technology of empirical inference is to approximate $\mathcal{R}\left(f\right)$ with the empirical risk,

$\mathcal{R}\left(f\right)\approx \widehat{\mathcal{R}}\left(f\right)=\frac{1}{n}\sum _{i=1}^{n}\ell (f,{x}_{i},{y}_{i}),$ | (3) |

which makes an implicit assumption that the training samples $({x}_{i},{y}_{i})$ are drawn i.i.d. from $P$. Direct minimization of $\widehat{\mathcal{R}}\left(f\right)$ leads to overfitting when the function class $f\in \mathcal{F}$ is too rich, and regularization is required:

$\underset{f\in \mathcal{F}}{min}\lambda \Omega (\parallel f\parallel )+\widehat{\mathcal{R}}\left(f\right),$ | (4) |

where $\Omega $ is a monotonically increasing function that penalizes complex functions.

Equation Eq. 4 is very well studied in classical statistics for the case that the output, $y\in \mathcal{Y}$, is a binary or scalar prediction, but this is not the case in most medical imaging prediction tasks of interest. Instead, complex interdependencies in the output space leads to difficulties in modeling inference as a binary prediction problem. One may attempt to model e.g. tumor segmentation as a series of binary predictions at each voxel in a medical image, but this violates the i.i.d. sampling assumption implicit in Equation Eq. 3. Furthermore, we typically gain performance by appropriately modeling the inter-relationships between voxel predictions, e.g. by incorporating pairwise and higher order potentials that encode prior knowledge about the problem domain. It is in this context that we develop statistical methods appropriate to structured prediction in the medical imaging setting.