Sleep Quality Estimation using Accelerometer Data from Thigh-Mounted Devices During in Free Living Conditions

Background

Sleep plays a vital role in health, thus, improving the assessment of sleep–wake outside of a laboratory environment is critical
The gold standard (PSG) is costly and inconvenient.
Methods for estimating sleep/wake based on accelerometry exist, primarily from wrist-worn devices
Cole-Kripke and Sadeh algorithms are commonly used
determine in-bed time is difficult, usually set by sleep log and/or human scorers
detect wakefulness is difficult, worse performance in populations with sleep disorders
typically a two level analysis: epoch based and summarized across night(s)
Zmachine-derived sleep stats

Purpose…

But Esben, what about them sleep stages!?

I did free-living PSG recordings of sleep but…
- Super fragile -> shitty data
- Combersome and time consuming
- free-living when wired up like a robot?
- would surface skin temperature + acc be enough? Most likely needs HR

It was likely a dead end from the get-go :(

Methods

data preparation, big time-consumer is handling raw acc data
only thigh data used. HSBC and other is only thigh data…
all zm recording is considered as in-bed (sensor problem?)
no sleep stages, only sleep/awake
sensor problems during sleep, up to 20 consecutive epochs (200 sec) are treated as sleep

Exclusion Criteria

Features

Basic Features

Weekday
Time of Day
Placement
Temperature

ACC derived features¹

Mean ACC X
Mean ACC Y
Mean ACC Z
Standard Deviation X
Standard Deviation Y
Standard Deviation Z
Max Standard Deviation
Inclination

Sensor-Independent Features²

Clock Proxy Linear
Clock Proxy Cosinus

Human Circadian Clock

Forger, Jewett, and Kronauer (1999): a so-called cubic van der Pol equation

\[\frac{dx_c}{dt}=\frac{\pi}{12}\begin{cases}\mu(x_c-\frac{4x^3}{3})-x\begin{bmatrix}(\frac{24}{0.99669\tau_x})^2+kB\end{bmatrix}\end{cases}\]

This thing is dependent on ambient light and body temperature!

Walch et al. (2019) incorporated this feature using step counts from the Apple Watch

But as demonstrated by Walch et al. (2019), a simple cosine function does the trick just as well :)

Circadian Proxy Features

building Models

Estimate Sleep Quality Metrics

Results

Epoch-Based

Performance Metrics
- F1 Score
- Accuracy
- Sensitivity
- Specificity
- ROC curves

Summarized across nights

Agreement With Zmachine Sleep Stats
- Sleep Period Time
- Total Sleep Time
- Sleep Efficiency
- Latency Until Persistent Sleep
- Wake After Sleep Onset

ROC Curves

Lots of Metrics

	Logistic Regression	Neural Network	Decision Tree	XGboost

Performance of the models to predict each class seperately
In-bed Prediction
F1 Score	90.88%	93.69%	93.37%	93.77%
Accuracy	92.87%	94.81%	94.46%	94.85%
Sensitivity	85.43%	92.64%	93.83%	93.16%
Precision	97.07%	94.75%	92.92%	94.39%
Specificity	98.17%	96.35%	94.91%	96.06%
Sleep Prediction
F1 Score	86.57%	89.59%	89.34%	89.62%
Accuracy	90.77%	92.41%	92.10%	92.39%
Sensitivity	84.65%	92.95%	94.20%	93.49%
Precision	88.59%	86.47%	84.96%	86.06%
Specificity	94.09%	92.12%	90.96%	91.79%

	Logistic Regression	Neural Network	Decision Tree	XGboost

Performance of the models to predict each combined class
In-Bed Awake Prediction
F1 Score	15.88%	25.45%	26.41%	27.54%
Accuracy	92.05%	92.95%	93.04%	93.26%
Sensitivity	11.67%	18.73%	19.44%	19.93%
Precision	24.83%	39.69%	41.18%	44.58%
Specificity	97.57%	98.05%	98.09%	98.30%
In-Bed Sleep Prediction
F1 Score	86.56%	89.54%	89.35%	89.61%
Accuracy	90.76%	92.39%	92.11%	92.38%
Sensitivity	84.61%	92.69%	94.18%	93.45%
Precision	88.60%	86.60%	84.99%	86.07%
Specificity	94.10%	92.23%	90.98%	91.80%

Bland-Altman Analysis

	Bias (95% CI)	Lower LOA (95% CI)	Upper LOA (95% CI)
Sleep Period Time (hrs)
Logistic Regression	-1.28 (-1.41; -1.15)	-4.08 (-4.48; -3.78)	1.53 (1.22; 1.93)
Neural Net	-0.39 (-0.51; -0.27)	-3.09 (-3.49; -2.76)	2.31 (1.95; 2.75)
Decision Tree	-0.19 (-0.34; -0.08)	-2.96 (-3.37; -2.63)	2.59 (2.15; 3.03)
XGboost	-0.37 (-0.49; -0.25)	-3 (-3.46; -2.69)	2.27 (1.92; 2.73)
Total Sleep Time (hrs)
Logistic Regression	-0.59 (-0.7; -0.48)	-3.04 (-3.31; -2.83)	1.87 (1.66; 2.12)
Neural Net	-0.04 (-0.14; 0.07)	-2.36 (-2.63; -2.15)	2.29 (2.04; 2.55)
Decision Tree	0.05 (-0.06; 0.15)	-2.21 (-2.42; -2)	2.3 (2.08; 2.55)
XGboost	-0.02 (-0.11; 0.09)	-2.31 (-2.59; -2.07)	2.27 (2.02; 2.54)
Sleep Efficiency (%)
Logistic Regression	5.76 (5.17; 6.36)	-8.48 (-10.99; -6.81)	20.01 (18.42; 22.28)
Neural Net	3.34 (2.49; 4.17)	-14.88 (-17.32; -12.46)	21.57 (20; 23.35)
Decision Tree	2.42 (1.55; 3.32)	-17.02 (-19.82; -14.47)	21.86 (20.24; 23.84)
XGboost	3.2 (2.45; 4.05)	-14.77 (-17.39; -12.52)	21.17 (19.6; 23.27)
Latency Until Persistent Sleep (min)
Logistic Regression	4.29 (0.58; 8.46)	-83.37 (-117.62; -62.4)	91.95 (68.85; 127.44)
Neural Net	-2.63 (-6.9; 1.79)	-91.37 (-120.36; -71.83)	86.12 (64.1; 114.28)
Decision Tree	0.21 (-4.83; 5.5)	-88.71 (-117.7; -65.86)	89.13 (62.66; 122.78)
XGboost	-2.91 (-6.33; -0.1)	-68.68 (-101.27; -53.33)	62.86 (48.66; 87.14)
Wake After Sleep onset (min)
Logistic Regression	-13.61 (-16.43; -11.16)	-78.31 (-92.92; -69.58)	51.08 (44.03; 60.52)
Neural Net	-10.39 (-13.51; -7.51)	-79.51 (-92.72; -71.68)	58.72 (51.88; 69.27)
Decision Tree	-6.66 (-9.98; -3.68)	-81.36 (-94.24; -71.45)	68.04 (59.35; 78.93)
XGboost	-8.57 (-11.71; -6.06)	-76.69 (-88.54; -68.07)	59.55 (52.69; 69.57)
Bootstrapped mixed effects limits of agreement with multiple observations per subject (Parker et al. 2016)

In-bed classification flow

Sleep classification flow

Discussion

heteroscedasticity
Cheung 2018 table 4: actigraphy provides a sufficiently narrow range of possible mean differences (CI 95%) clinical significant thresholds
could be interesting to build models on thigh and hip ocmbined.
multiclass vs multilabel classification
in-bed awake/sleep is highly imbalanced -> maybe train a new classifier accounting for imbalanced data (SMOTE)
model combined preds instead?

References

Forger, D. B., M. E. Jewett, and R. E. Kronauer. 1999. “A Simpler Model of the Human Circadian Pacemaker.” Journal of Biological Rhythms 14 (6): 532–37. https://doi.org/10.1177/074873099129000867.

Hirshkowitz, Max, Kaitlyn Whiton, Steven M Albert, Cathy Alessi, Oliviero Bruni, Lydia DonCarlos, Nancy Hazen, et al. 2015. “National Sleep Foundation’s Sleep Time Duration Recommendations: Methodology and Results Summary.” Sleep Health, 4.

Skotte, Jørgen, Mette Korshøj, Jesper Kristiansen, Christiana Hanisch, and Andreas Holtermann. 2014. “Detection of Physical Activity Types Using Triaxial Accelerometers.” Journal of Physical Activity and Health 11 (1): 76–84. https://doi.org/10.1123/jpah.2011-0347.

Walch, Olivia, Yitong Huang, Daniel Forger, and Cathy Goldstein. 2019. “Sleep Stage Prediction with Raw Acceleration and Photoplethysmography Heart Rate Data Derived from a Consumer Wearable Device.” Sleep 42 (12): zsz180. https://doi.org/10.1093/sleep/zsz180.