Analysis of wearable device data using functional data models

Analysis of “big N” wearable
device data using functional data
models
Julia Wrobel, PhD
Department of Biostatistics and Bioinformatics

View Slide

2
BIOSTATISTICS,
EPIDEMIOLOGY, &
RESEARCH
DESIGN FORUM
Advances and Challenges
in Wearables Research
Friday, November 3
Advances and Challenges
in Wearables Research
Julia Wrobel, PhD
Keynote Speaker
Friday, November 3
10:00 AM — 3:00 PM
REGISTER: bit.ly/BERD2023
In-Person: Morehouse School of
Medicine, Building A, 4th Floor Sr. Biostatistician
Virtual: Zoom

View Slide

Wearable devices

View Slide

Accelerometers
• Physical activity is key to many health-related questions
• Active individuals tend to live longer and healthier lives
• Traditionally, this has been done using retrospective questionnaires
• Accelerometers have become hugely popular
• Objective
• Collection “in the wild”
• High resolution
7

View Slide

Accelerometer data processing pipeline

View Slide

• PA measures: Total steps / counts, MVPA minutes
• Sedentary measures: Sedentary time, number of sedentary bouts
Accelerometer data processing pipeline

View Slide

Reproducibility and rigor
• Much of this is still up for debate
• Consider moderate-to-vigorous physical activity (MVPA)
• How are “activity counts” generated?
• How are cut points formed (no PA / light PA/ MVPA)?
• Are these consistent across devices? Age groups? Placements?
• Some general recommendations
• Keep data in rawest form possible
• Process using non-proprietary software
11

View Slide

Functional data analysis (FDA)
• Wearables devices record signal over 24-hour periods- the exact
focus of FDA!
• In FDA, outcome is curve or function 𝑌! 𝑡
• For accelerometer data 𝑌! 𝑡 is a 24-hour activity profiles
12
𝑡 (hour)
𝑌!
(𝑡)

View Slide

Uses for FDA in wearables
• Less pre-processing of the raw data
• Less information is discarded
• Better ways of imputing data
• Missing data is a big problem in wearables
• Time-dependent interpretations
• Timing and consistency
• Does it matter when and how regularly someone moves?
13

View Slide

FDA tools for massive accelerometer studies
• Function-on-scalar regression (FoSR)
• Functional outcome, scalar predictors (e.g. age)
• UK Biobank Accelerometry Study
• 80,000+ participants
• Generalized functional principal components analysis (gFPCA)
• National Health and Nutrition Examination Survey (NHANES)
• 4,000+ participants (2011-2014 wave)
• Registration
• How does timing of wake/sleep, PA differ across people?
• Baltimore Longitudinal Study on Aging (BLSA)
• 500+ participants
14

View Slide

Function-on-scalar regression
Patterns in physical activity across ages in the UK Biobank study
15

View Slide

Function-on-scalar regression
𝑌!
𝑡 = 𝛽"
𝑡 + &
#$%
&
𝛽#
𝑡 𝑋!#
+ 𝑏!
𝑡 + 𝜖!
𝑡
• 𝑌!
𝑡 : Magnitude of physical activity at time 𝑡
• 𝑋!#
: Scalar covariate (e.g. age) for subject 𝑖
• 𝛽#
𝑡 : Coefficient function for covariate 𝑝
• 𝑏!
𝑡 ∼ 𝐺𝑃 0, Σ'
; 𝜖!
𝑡 ~!!( 𝑁 0, 𝜎)
*
16

View Slide

FDA of 88,693 subjects from UK Biobank study
• Average daily activity patterns across ages from functional regression
• Left are males, right panel are females
17
J. Wrobel, J. Muschelli, and A. Leroux (2021). Sensors.

View Slide

Fast generalized functional
principal components analysis
for ultra-high dimensional non-Gaussian wearable device data
18

View Slide

Exponential family functional data
• Functional data methods assume 𝑌!
𝑡 is Gaussian
• Wearable device data is often non-Gaussian
• Poisson 𝑌! 𝑡 ∈ 0, 1, 2, … (activity counts)
• Binary 𝑌! 𝑡 ∈ {0, 1} (sedentary/active minutes)
• Instead assume 𝑌!
𝑡 follows exponential family distribution
• Assumes smooth latent subject-specific mean 𝜇!
𝑡 = 𝐸 𝑌!
𝑡
• Leads to GLM-like framework 𝑔 𝐸 𝑌!
𝑡 = 𝜂!
𝑡

View Slide

Example binary “curve” or “binary activity profile”
• Subject shown below is from BLSA data
• Active 𝑌!
𝑡 = 1 vs. inactive 𝑌!
𝑡 = 0
20

View Slide

Example binary “curve” or “binary activity profile”
• Subject shown below is from BLSA data
• Active 𝑌!
𝑡 = 1 vs. inactive 𝑌!
𝑡 = 0
21

View Slide

Binary activity profiles for studying sedentary behavior
• Raw counts at each minute dichotomized at low value to detect
activity vs. inactivity
22

View Slide

Generalized functional principal components analysis
• Generalized FPCA and generalized regression model exponential family
functional data using a (GLM)-like framework
𝑔 𝐸 𝑌!
𝑠 = 𝜂!
𝑠 = 𝛽"
𝑠 + 𝑏!
𝑠
= 𝛽"
𝑠 + +
#$%
&
𝜉!#
𝜙#
𝑠
• 𝑌!
∼ 𝐸𝑥𝑝𝑜𝑛𝑒𝑛𝑡𝑖𝑎𝑙 𝐹𝑎𝑚𝑖𝑙𝑦; 𝑔(⋅) is a link function
• 𝛽& 𝑠 is a population mean function
• 𝜙'
𝑠 are population level eigenfunctions
• 𝜉!'
are subject-specific scores
23

View Slide

The NHANES 2011-2014 accelerometer study
• National Health and Nutrition Examination Survey
• Accelerometer data from 2011-2014 wave released in 2021
• Accelerometer data over multiple days from > 4000 subjects
• 1440 minutes per day of PA measurement
• Goal is to understand population patterns in sedentary behavior
• Existing FDA methods cannot handle data of this size
• We proposed a fast, general-purpose algorithm for generalized FPCA
24

View Slide

𝑔 𝐸 𝑌!
𝑠 = 𝜂!
𝑠 = 𝛽"
𝑠 + 𝑏!
𝑠
= 𝛽"
𝑠 + +
#$%
&
𝜉!#
𝜙#
𝑠
1. Bin the data along the functional domain 𝑠 into 𝐿 bins
2. Estimate separate local GLMMs in each bin to obtain 𝜂! 𝑠(!
at each
bin midpoint
3. Estimate FPCA on local latent estimates 𝜂! 𝑠(!
to obtain
eigenfunctions 𝝓 𝑠
4. Estimate global model conditioning on eigenfunctions 𝝓 𝑠 by re-
estimating subject-specific scores 𝜉!'
Four-step fast GFPCA algorithm
A. Leroux, C. Crainiceanu, and J. Wrobel (2023+). Fast generalized functional principal components analysis. Under review.

View Slide

fastGFPCA simulation results
• Compared with two existing methods
• Variational Bayes binary FPCA (Wrobel, 2019), bfpca
• Can’t estimate Poisson or other distributions
• Two-step conditional model (Gertheiss, 2017), tsGFPCA
• Breaks for N > 100
• fastGFPCA is
• More accurate than tsGFPCA for binary and Poisson data
• Order of magnitude faster
• As or more accurate than bfpca for binary data
• Comparable computation time
26

View Slide

GFPCA results for NHANES data
• 4286 participants with 1440 observations each
• 3-4 hours of computation time (step 4 is the slow step)
• Subsampled version of step 4 led to ~22 minutes of computation time

View Slide

Curve registration
for exponential family functional data
28

View Slide

Misalignment in accelerometer data
• Time variation: subjects start and end the day at different times
• Activity level variation: people have higher or lower levels of activity
29

View Slide

• Same subjects, but probabilities of activity are shown below
30

View Slide

• Same subjects, but probabilities of activity are shown below
31

View Slide

Registration methods align functional data by warping
the domain
• Most methods are computationally inefficient and handle only
continuous data
𝜇!
𝑡!
∗ ℎ!
#$ 𝑡!
∗ = 𝑡 𝜇!
ℎ!
#$ 𝑡!
∗ = 𝜇!
𝑡

View Slide

Two-step exponential family registration algorithm
• Computationally efficient and geared towards binary data
33
Step 1:
estimate template
Step 2:
estimate warping
𝑌!
𝑡!
∗
𝑌!
𝑡

View Slide

Algorithm and software optimized for computational
efficiency
• Step 1: Estimates template to which curves are registered
• uses fast, novel variational EM algorithm for binary functional data
• Step 2: Estimates warping function for each subject
• uses constrained maximum likelihood estimation
• Implemented in R package registr
• Implemented in C++
34
• Wrobel, Goldsmith (2019). Registration for exponential family functional data. Biometrics.
• Wrobel (2018). registr: Registration for exponential family functional data. Journal of Open Source Software. 3.

View Slide

Activity profiles pre-registration
35

View Slide

Activity profiles post-registration
36

View Slide

Future methods work in these areas
• Fast GFPCA
• Multilevel data (Monday-Sunday)
• Xinkai Zhou
• Sparse and irregular data
• Fast Generalized function-on-scalar regression
• Dustin Rogers
• Registration
• Multilevel registration

View Slide

Acknowledgements
Colorado SPH Biostatistics
• Andrew Leroux
• Dustin Rogers
Columbia Biostatistics
Functional
Data
Analysis
Working
Group
• Jeff Goldsmith
Johns Hopkins School of
Public Health
WIT: Wearable and
Implantable
Technology
• Vadim Zipunnikov
• Jennifer Schrack
• John Muschelli
• Ciprian Crainiceanu
• Xinkai Zhou

View Slide

Thanks!
39
Contact Info
[email protected]
juliawrobel.com
github.com/julia-wrobel

View Slide

Step 1: bin the data
Choose 𝐿 bins where 𝑚+
is the midpoint bin
𝑙 ∈ 1, … , 𝐿
Considerations
• Bin width: simplicity- equidistance and non-
overlapping
• Number of bins

View Slide

is the midpoint bin
𝑙 ∈ 1, … , 𝐿
Considerations
overlapping
• Number of bins
• Too many bins: bin width is too small, identifiability
issues

View Slide

is the midpoint bin
𝑙 ∈ 1, … , 𝐿
Considerations
overlapping
• Number of bins
• Too many bins: bin width is too small, identifiability
issues
• Too few bins: bins width too big, don’t capture shape
of underlying function

View Slide

Step 2: fit Generalized Linear Mixed Model in each bin
Fit separate GLMM in each bin to get latent estimates
• 𝑔 𝐸 𝑌! 𝑠"!
= 𝛽$ 𝑠"!
+ 𝑏! 𝑠"!
= 𝜂! 𝑠"!
• 𝑠"!
: time 𝑠 at the midpoint of bin 𝑙
• 𝛽$ 𝑠"!
: fixed effect mean
• 𝑏! 𝑠"!
: subject-specific random effect
• 𝜂! 𝑠"!
: linear predictor, local latent estimates
• Estimates are not on the original domain
• On domain defined by bin midpoints
• Model assumes constant effect for 𝛽%
, 𝑏!
across each bin
• Used for estimating covariance matrix and eigenfunctions

View Slide

Step 3: estimate eigenfunctions using fPCA
Estimate FPCA using linear predictor from Step 2
• +
𝜂! 𝑠"!
= ,
𝛽$ 𝑠"!
+ ∑%&'
( ,
𝜉!%
/
𝜙% 𝑠"!
• Estimated using refund::fpca.face()
• Eigenfunctions F
𝝓 characterize covariance
• 𝐾 : chosen by percent variance explained
• Evaluated at bin midpoint rather than original
domain
• Project eigenfunctions onto original domain

View Slide

Step 4: estimate GFPCA
Estimate GFPCA conditional on eigenfunctions from Step 3
• 𝑔 𝐸 𝑌! 𝑠 | = 𝛽$ 𝑠 + ∑%&'
( 𝜉!%
/
𝜙% 𝑠
• Eigenfunctions are orthogonal basis functions
• Reduces number of covariance parameters that need to be estimated for random effects
• Simple implemention
• mgcv::bam()

View Slide

Analysis of wearable device data using functional data models

Analysis of wearable device data using functional data models

Julia Wrobel

More Decks by Julia Wrobel

Other Decks in Research

Featured

Transcript