PDF e-Pub

## Section: New Results

### A deep learning architecture for temporal sleep stage classification using multivariate and multimodal time series

Sleep stage classification constitutes an important preliminary exam in the diagnosis of sleep disorders. It is traditionally performed by a sleep expert who assigns to each 30 s of signal a sleep stage, based on the visual inspection of signals such as electroencephalograms (EEG), electrooculograms (EOG), electrocardiograms (ECG) and electromyograms (EMG). We introduce here the first deep learning approach for sleep stage classification that learns end-to-end without computing spectrograms or extracting hand-crafted features, that exploits all multivariate and multimodal Polysomnography (PSG) signals (EEG, EMG and EOG), and that can exploit the temporal context of each 30 s window of data. For each modality the first layer learns linear spatial filters that exploit the array of sensors to increase the signal-to-noise ratio, and the last layer feeds the learnt representation to a softmax classifier. Our model is compared to alternative automatic approaches based on convolutional networks or decisions trees. Results obtained on 61 publicly available PSG records with up to 20 EEG channels demonstrate that our network architecture yields state-of-the-art performance. Our study reveals a number of insights on the spatio-temporal distribution of the signal of interest: a good trade-off for optimal classification performance measured with balanced accuracy is to use 6 EEG with 2 EOG (left and right) and 3 EMG chin channels. Also exploiting one minute of data before and after each data segment offers the strongest improvement when a limited number of channels is available. As sleep experts, our system exploits the multivariate and multimodal nature of PSG signals in order to deliver state-of-the-art classification performance with a small computational cost.

Figure 5. Time distributed architecture to process a sequence of inputs ${S}_{t}^{k}=\left\{{X}_{t-k},\cdots ,{X}_{t},\cdots ,{X}_{t+k}\right\}$ with k = 1. ${X}_{k}$ stands for the multivariate input data over 30 s that is fed into the feature extractor $Z$. Features are extracted from consecutive 30 s samples: ${X}_{t-k}$, ..., ${X}_{t}$, ..., ${X}_{t+k}$. Then the obtained features are aggregated $\left[{z}_{t-k},\cdots ,{z}_{t},\cdots ,{z}_{t+k}\right]$. The resulting aggregation of features is finally fed into a classifier to predict the label ${y}_{t}$ associated with the sample ${X}_{t}$.