Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Analysis of wearable device data using functional data models

Analysis of wearable device data using functional data models

Talk for Georgia statistics day 2023

Julia Wrobel

October 08, 2023
Tweet

More Decks by Julia Wrobel

Other Decks in Research

Transcript

  1. Analysis of “big N” wearable
    device data using functional data
    models
    Julia Wrobel, PhD
    Department of Biostatistics and Bioinformatics

    View Slide

  2. 2
    BIOSTATISTICS,
    EPIDEMIOLOGY, &
    RESEARCH
    DESIGN FORUM
    Advances and Challenges
    in Wearables Research
    Friday, November 3
    Advances and Challenges
    in Wearables Research
    Julia Wrobel, PhD
    Keynote Speaker
    Friday, November 3
    10:00 AM — 3:00 PM
    REGISTER: bit.ly/BERD2023
    In-Person: Morehouse School of
    Medicine, Building A, 4th Floor Sr. Biostatistician
    Virtual: Zoom

    View Slide

  3. Wearable devices

    View Slide

  4. Wearable devices

    View Slide

  5. Wearable devices

    View Slide

  6. Wearable devices

    View Slide

  7. Accelerometers
    • Physical activity is key to many health-related questions
    • Active individuals tend to live longer and healthier lives
    • Traditionally, this has been done using retrospective questionnaires
    • Accelerometers have become hugely popular
    • Objective
    • Collection “in the wild”
    • High resolution
    7

    View Slide

  8. Accelerometer data processing pipeline

    View Slide

  9. Accelerometer data processing pipeline

    View Slide

  10. • PA measures: Total steps / counts, MVPA minutes
    • Sedentary measures: Sedentary time, number of sedentary bouts
    Accelerometer data processing pipeline

    View Slide

  11. Reproducibility and rigor
    • Much of this is still up for debate
    • Consider moderate-to-vigorous physical activity (MVPA)
    • How are “activity counts” generated?
    • How are cut points formed (no PA / light PA/ MVPA)?
    • Are these consistent across devices? Age groups? Placements?
    • Some general recommendations
    • Keep data in rawest form possible
    • Process using non-proprietary software
    11

    View Slide

  12. Functional data analysis (FDA)
    • Wearables devices record signal over 24-hour periods- the exact
    focus of FDA!
    • In FDA, outcome is curve or function 𝑌! 𝑡
    • For accelerometer data 𝑌! 𝑡 is a 24-hour activity profiles
    12
    𝑡 (hour)
    𝑌!
    (𝑡)

    View Slide

  13. Uses for FDA in wearables
    • Less pre-processing of the raw data
    • Less information is discarded
    • Better ways of imputing data
    • Missing data is a big problem in wearables
    • Time-dependent interpretations
    • Timing and consistency
    • Does it matter when and how regularly someone moves?
    13

    View Slide

  14. FDA tools for massive accelerometer studies
    • Function-on-scalar regression (FoSR)
    • Functional outcome, scalar predictors (e.g. age)
    • UK Biobank Accelerometry Study
    • 80,000+ participants
    • Generalized functional principal components analysis (gFPCA)
    • National Health and Nutrition Examination Survey (NHANES)
    • 4,000+ participants (2011-2014 wave)
    • Registration
    • How does timing of wake/sleep, PA differ across people?
    • Baltimore Longitudinal Study on Aging (BLSA)
    • 500+ participants
    14

    View Slide

  15. Function-on-scalar regression
    Patterns in physical activity across ages in the UK Biobank study
    15

    View Slide

  16. Function-on-scalar regression
    𝑌!
    𝑡 = 𝛽"
    𝑡 + &
    #$%
    &
    𝛽#
    𝑡 𝑋!#
    + 𝑏!
    𝑡 + 𝜖!
    𝑡
    • 𝑌!
    𝑡 : Magnitude of physical activity at time 𝑡
    • 𝑋!#
    : Scalar covariate (e.g. age) for subject 𝑖
    • 𝛽#
    𝑡 : Coefficient function for covariate 𝑝
    • 𝑏!
    𝑡 ∼ 𝐺𝑃 0, Σ'
    ; 𝜖!
    𝑡 ~!!( 𝑁 0, 𝜎)
    *
    16

    View Slide

  17. FDA of 88,693 subjects from UK Biobank study
    • Average daily activity patterns across ages from functional regression
    • Left are males, right panel are females
    17
    J. Wrobel, J. Muschelli, and A. Leroux (2021). Sensors.

    View Slide

  18. Fast generalized functional
    principal components analysis
    for ultra-high dimensional non-Gaussian wearable device data
    18

    View Slide

  19. Exponential family functional data
    • Functional data methods assume 𝑌!
    𝑡 is Gaussian
    • Wearable device data is often non-Gaussian
    • Poisson 𝑌! 𝑡 ∈ 0, 1, 2, … (activity counts)
    • Binary 𝑌! 𝑡 ∈ {0, 1} (sedentary/active minutes)
    • Instead assume 𝑌!
    𝑡 follows exponential family distribution
    • Assumes smooth latent subject-specific mean 𝜇!
    𝑡 = 𝐸 𝑌!
    𝑡
    • Leads to GLM-like framework 𝑔 𝐸 𝑌!
    𝑡 = 𝜂!
    𝑡

    View Slide

  20. Example binary “curve” or “binary activity profile”
    • Subject shown below is from BLSA data
    • Active 𝑌!
    𝑡 = 1 vs. inactive 𝑌!
    𝑡 = 0
    20

    View Slide

  21. Example binary “curve” or “binary activity profile”
    • Subject shown below is from BLSA data
    • Active 𝑌!
    𝑡 = 1 vs. inactive 𝑌!
    𝑡 = 0
    21

    View Slide

  22. Binary activity profiles for studying sedentary behavior
    • Raw counts at each minute dichotomized at low value to detect
    activity vs. inactivity
    22

    View Slide

  23. Generalized functional principal components analysis
    • Generalized FPCA and generalized regression model exponential family
    functional data using a (GLM)-like framework
    𝑔 𝐸 𝑌!
    𝑠 = 𝜂!
    𝑠 = 𝛽"
    𝑠 + 𝑏!
    𝑠
    = 𝛽"
    𝑠 + +
    #$%
    &
    𝜉!#
    𝜙#
    𝑠
    • 𝑌!
    ∼ 𝐸𝑥𝑝𝑜𝑛𝑒𝑛𝑡𝑖𝑎𝑙 𝐹𝑎𝑚𝑖𝑙𝑦; 𝑔(⋅) is a link function
    • 𝛽& 𝑠 is a population mean function
    • 𝜙'
    𝑠 are population level eigenfunctions
    • 𝜉!'
    are subject-specific scores
    23

    View Slide

  24. The NHANES 2011-2014 accelerometer study
    • National Health and Nutrition Examination Survey
    • Accelerometer data from 2011-2014 wave released in 2021
    • Accelerometer data over multiple days from > 4000 subjects
    • 1440 minutes per day of PA measurement
    • Goal is to understand population patterns in sedentary behavior
    • Existing FDA methods cannot handle data of this size
    • We proposed a fast, general-purpose algorithm for generalized FPCA
    24

    View Slide

  25. 𝑔 𝐸 𝑌!
    𝑠 = 𝜂!
    𝑠 = 𝛽"
    𝑠 + 𝑏!
    𝑠
    = 𝛽"
    𝑠 + +
    #$%
    &
    𝜉!#
    𝜙#
    𝑠
    1. Bin the data along the functional domain 𝑠 into 𝐿 bins
    2. Estimate separate local GLMMs in each bin to obtain 𝜂! 𝑠(!
    at each
    bin midpoint
    3. Estimate FPCA on local latent estimates 𝜂! 𝑠(!
    to obtain
    eigenfunctions 𝝓 𝑠
    4. Estimate global model conditioning on eigenfunctions 𝝓 𝑠 by re-
    estimating subject-specific scores 𝜉!'
    Four-step fast GFPCA algorithm
    A. Leroux, C. Crainiceanu, and J. Wrobel (2023+). Fast generalized functional principal components analysis. Under review.

    View Slide

  26. fastGFPCA simulation results
    • Compared with two existing methods
    • Variational Bayes binary FPCA (Wrobel, 2019), bfpca
    • Can’t estimate Poisson or other distributions
    • Two-step conditional model (Gertheiss, 2017), tsGFPCA
    • Breaks for N > 100
    • fastGFPCA is
    • More accurate than tsGFPCA for binary and Poisson data
    • Order of magnitude faster
    • As or more accurate than bfpca for binary data
    • Comparable computation time
    26

    View Slide

  27. GFPCA results for NHANES data
    • 4286 participants with 1440 observations each
    • 3-4 hours of computation time (step 4 is the slow step)
    • Subsampled version of step 4 led to ~22 minutes of computation time

    View Slide

  28. Curve registration
    for exponential family functional data
    28

    View Slide

  29. Misalignment in accelerometer data
    • Time variation: subjects start and end the day at different times
    • Activity level variation: people have higher or lower levels of activity
    29

    View Slide

  30. Misalignment in accelerometer data
    • Same subjects, but probabilities of activity are shown below
    30

    View Slide

  31. Misalignment in accelerometer data
    • Same subjects, but probabilities of activity are shown below
    31

    View Slide

  32. Registration methods align functional data by warping
    the domain
    • Most methods are computationally inefficient and handle only
    continuous data
    𝜇!
    𝑡!
    ∗ ℎ!
    #$ 𝑡!
    ∗ = 𝑡 𝜇!
    ℎ!
    #$ 𝑡!
    ∗ = 𝜇!
    𝑡

    View Slide

  33. Two-step exponential family registration algorithm
    • Computationally efficient and geared towards binary data
    33
    Step 1:
    estimate template
    Step 2:
    estimate warping
    𝑌!
    𝑡!

    𝑌!
    𝑡

    View Slide

  34. Algorithm and software optimized for computational
    efficiency
    • Step 1: Estimates template to which curves are registered
    • uses fast, novel variational EM algorithm for binary functional data
    • Step 2: Estimates warping function for each subject
    • uses constrained maximum likelihood estimation
    • Implemented in R package registr
    • Implemented in C++
    34
    • Wrobel, Goldsmith (2019). Registration for exponential family functional data. Biometrics.
    • Wrobel (2018). registr: Registration for exponential family functional data. Journal of Open Source Software. 3.

    View Slide

  35. Activity profiles pre-registration
    35

    View Slide

  36. Activity profiles post-registration
    36

    View Slide

  37. Future methods work in these areas
    • Fast GFPCA
    • Multilevel data (Monday-Sunday)
    • Xinkai Zhou
    • Sparse and irregular data
    • Fast Generalized function-on-scalar regression
    • Dustin Rogers
    • Registration
    • Multilevel registration

    View Slide

  38. Acknowledgements
    Colorado SPH Biostatistics
    • Andrew Leroux
    • Dustin Rogers
    Columbia Biostatistics
    Functional
    Data
    Analysis
    Working
    Group
    • Jeff Goldsmith
    Johns Hopkins School of
    Public Health
    WIT: Wearable and
    Implantable
    Technology
    • Vadim Zipunnikov
    • Jennifer Schrack
    • John Muschelli
    • Ciprian Crainiceanu
    • Xinkai Zhou

    View Slide

  39. Thanks!
    39
    Contact Info
    [email protected]
    juliawrobel.com
    github.com/julia-wrobel

    View Slide

  40. Step 1: bin the data
    Choose 𝐿 bins where 𝑚+
    is the midpoint bin
    𝑙 ∈ 1, … , 𝐿
    Considerations
    • Bin width: simplicity- equidistance and non-
    overlapping
    • Number of bins

    View Slide

  41. Step 1: bin the data
    Choose 𝐿 bins where 𝑚+
    is the midpoint bin
    𝑙 ∈ 1, … , 𝐿
    Considerations
    • Bin width: simplicity- equidistance and non-
    overlapping
    • Number of bins
    • Too many bins: bin width is too small, identifiability
    issues

    View Slide

  42. Step 1: bin the data
    Choose 𝐿 bins where 𝑚+
    is the midpoint bin
    𝑙 ∈ 1, … , 𝐿
    Considerations
    • Bin width: simplicity- equidistance and non-
    overlapping
    • Number of bins
    • Too many bins: bin width is too small, identifiability
    issues
    • Too few bins: bins width too big, don’t capture shape
    of underlying function

    View Slide

  43. Step 1: bin the data
    Choose 𝐿 bins where 𝑚+
    is the midpoint bin
    𝑙 ∈ 1, … , 𝐿
    Considerations
    • Bin width: simplicity- equidistance and non-
    overlapping
    • Number of bins
    • Too many bins: bin width is too small, identifiability
    issues
    • Too few bins: bins width too big, don’t capture shape
    of underlying function

    View Slide

  44. Step 2: fit Generalized Linear Mixed Model in each bin
    Fit separate GLMM in each bin to get latent estimates
    • 𝑔 𝐸 𝑌! 𝑠"!
    = 𝛽$ 𝑠"!
    + 𝑏! 𝑠"!
    = 𝜂! 𝑠"!
    • 𝑠"!
    : time 𝑠 at the midpoint of bin 𝑙
    • 𝛽$ 𝑠"!
    : fixed effect mean
    • 𝑏! 𝑠"!
    : subject-specific random effect
    • 𝜂! 𝑠"!
    : linear predictor, local latent estimates
    • Estimates are not on the original domain
    • On domain defined by bin midpoints
    • Model assumes constant effect for 𝛽%
    , 𝑏!
    across each bin
    • Used for estimating covariance matrix and eigenfunctions

    View Slide

  45. Step 3: estimate eigenfunctions using fPCA
    Estimate FPCA using linear predictor from Step 2
    • +
    𝜂! 𝑠"!
    = ,
    𝛽$ 𝑠"!
    + ∑%&'
    ( ,
    𝜉!%
    /
    𝜙% 𝑠"!
    • Estimated using refund::fpca.face()
    • Eigenfunctions F
    𝝓 characterize covariance
    • 𝐾 : chosen by percent variance explained
    • Evaluated at bin midpoint rather than original
    domain
    • Project eigenfunctions onto original domain

    View Slide

  46. Step 4: estimate GFPCA
    Estimate GFPCA conditional on eigenfunctions from Step 3
    • 𝑔 𝐸 𝑌! 𝑠 | = 𝛽$ 𝑠 + ∑%&'
    ( 𝜉!%
    /
    𝜙% 𝑠
    • Eigenfunctions are orthogonal basis functions
    • Reduces number of covariance parameters that need to be estimated for random effects
    • Simple implemention
    • mgcv::bam()

    View Slide