Section: Research Program
Inverse problems in Neuroimaging
Many problems in neuroimaging can be framed as forward and inverse problems. For instance, brain population imaging is concerned with the inverse problem that consists in predicting individual information (behavior, phenotype) from neuroimaging data, while the corresponding forward problem boils down to explaining neuroimaging data with the behavioral variables. Solving these problems entails the definition of two terms: a loss that quantifies the goodness of fit of the solution (does the model explain the data well enough?), and a regularization scheme that represents a prior on the expected solution of the problem. These priors can be used to enforce some properties on the solutions, such as sparsity, smoothness or being piecewise constant.
Let us detail the model used in typical inverse problem: Let $\mathbf{X}$ be a neuroimaging dataset as an $({n}_{subjects},{n}_{voxels})$ matrix, where ${n}_{subjects}$ and ${n}_{voxels}$ are the number of subjects under study, and the image size respectively, $\mathbf{Y}$ a set of values that represent characteristics of interest in the observed population, written as $({n}_{subjects},{n}_{features})$ matrix, where ${n}_{features}$ is the number of characteristics that are tested, and $\mathbf{w}$ an array of shape $({n}_{voxels},{n}_{features})$ that represents a set of patternspecific maps. In the first place, we may consider the columns ${\mathbf{Y}}_{1},..,{\mathbf{Y}}_{{n}_{features}}$ of $Y$ independently, yielding ${n}_{features}$ problems to be solved in parallel:
where the vector contains ${\mathbf{w}}_{i}$ is the ${i}^{th}$ row of $\mathbf{w}$. As the problem is clearly illposed, it is naturally handled in a regularized regression framework:
${\widehat{w}}_{i}={\text{argmin}}_{{w}_{i}}{\parallel {\mathbf{Y}}_{i}{\mathrm{\mathbf{X}\mathbf{w}}}_{i}\parallel}^{2}+\Psi \left({\mathbf{w}}_{i}\right),$  (1) 
where $\Psi $ is an adequate penalization used to regularize the solution:
$\Psi (\mathbf{w};{\lambda}_{1},{\lambda}_{2},{\eta}_{1},{\eta}_{2})={\lambda}_{1}{\parallel \mathbf{w}\parallel}_{1}+{\lambda}_{2}{\parallel \mathbf{w}\parallel}_{2}+{\eta}_{1}{\parallel \nabla \mathbf{w}\parallel}_{2,1}+{\eta}_{2}{\parallel \nabla \mathbf{w}\parallel}_{2,2}$  (2) 
with ${\lambda}_{1},\phantom{\rule{0.166667em}{0ex}}{\lambda}_{2},\phantom{\rule{0.166667em}{0ex}}{\eta}_{1},\phantom{\rule{0.166667em}{0ex}}{\eta}_{2}\ge 0$ (this formulation particularly highlights the fact that convex regularizers are norms or quasinorms). In general, only one or two of these constraints is considered (hence is enforced with a nonzero coefficient):

When ${\lambda}_{1}>0$ only (LASSO), and to some extent, when ${\lambda}_{1},{\lambda}_{2}>0$ only (elastic net), the optimal solution $\mathbf{w}$ is (possibly very) sparse, but may not exhibit a proper image structure; it does not fit well with the intuitive concept of a brain map.

Total Variation regularization (see Fig. 1) is obtained for (${\eta}_{1}>0$ only), and typically yields a piecewise constant solution. It can be associated with Lasso to enforce both sparsity and sparse variations.

Smooth lasso is obtained with (${\eta}_{2}>0$ and ${\lambda}_{1}>0$ only), and yields smooth, compactly supported spatial basis functions.
Note that, while the qualitative aspect of the solutions are very different, the predictive power of these models is often very close.

The performance of the predictive model can simply be evaluated as the amount of variance in ${\mathbf{Y}}_{i}$ fitted by the model, for each $i\in \{1,..,{n}_{features}\}$. This can be computed through crossvalidation, by learning ${\widehat{\mathbf{w}}}_{i}$ on some part of the dataset, and then estimating $\parallel {\mathbf{Y}}_{i}\mathbf{X}{\widehat{w}}_{i}{\parallel}^{2}$ using the remainder of the dataset.
This framework is easily extended by considering

Grouped penalization, where the penalization explicitly includes a prior clustering of the features, i.e. voxelrelated signals, into given groups. This amounts to enforcing structured priors on the solution.

Combined penalizations, i.e. a mixture of simple and groupwise penalizations, that allow some variability to fit the data in different populations of subjects, while keeping some common constraints.

Logistic and hinge regression, where a nonlinearity is applied to the linear model so that it yields a probability of classification in a binary classification problem.

Robustness to betweensubject variability to avoid the learned model overly reflecting a few outlying particular observations of the training set. Note that noise and deviating assumptions can be present in both $\mathbf{Y}$ and $\mathbf{X}$

Multitask learning: if several target variables are thought to be related, it might be useful to constrain the estimated parameter vector $\mathbf{w}$ to have a shared support across all these variables.
For instance, when one of the variables ${\mathbf{Y}}_{i}$ is not well fitted by the model, the estimation of other variables ${\mathbf{Y}}_{j},j\ne i$ may provide constraints on the support of ${\mathbf{w}}_{i}$ and thus, improve the prediction of ${\mathbf{Y}}_{i}$.
$\widehat{w}={\text{argmin}}_{\mathbf{w}=\left({\mathbf{w}}_{i}\right),i=1..{n}_{f}}\sum _{i=1}^{{n}_{f}}{\parallel {\mathbf{Y}}_{\mathbf{i}}{\mathrm{\mathbf{X}\mathbf{w}}}_{\mathbf{i}}\parallel}^{2}+\lambda \sum _{j=1}^{{n}_{voxels}}\sqrt{{\sum}_{i=1}^{{n}_{f}}{\mathbf{w}}_{\mathbf{i},\mathbf{j}}^{\mathbf{2}}}$ (4)