Getting Started#

BHF-SmartHealth#

Description#

Smart-Health will collect smartphone and wearable data from up to 10,000 consenting participants (sponsored by University of Sheffield) over 5 years. This data will be linked to NHS records via the local Subnational Data Environment for Yorkshire and Humber. The smartphone and wearable data and metadata from different devices and manufacturers need to be stored in an environment that supports processing, linkage and access for research. This repository is to provide a pipeline that process the data received from Daiser and build a database for future Smart Health projects.

Development:#

Pre-requisites:#

Installation:#

  1. Move into the repo cd BHF-SmartHealth

  2. Use uv sync to install the project code and dependencies

  3. Add a local git config filter to strip notebook outputs in git commits. This reduces merge conflicts associated with Notebook metadata. Do this by running the following:

    git config filter.strip-notebook-output.clean 'jupyter nbconvert --ClearOutputPreprocessor.enabled=True --ClearMetadataPreprocessor.enabled=True --to=notebook --stdin --stdout --log-level=ERROR'
    

    n.b.: If this step fails, there is a backstop CI workflow that will strip notebook outputs.

Rendering documentation#

The project documentation is built using Sphinx. There is a CI/CD workflow that builds and deploys the project documentation in main to GitHub pages. API reference documentation will be automatically built from the scripts in src/bhf_smarthealth using sphinx-autoapi (see here for more information.)

To render the Sphinx documentation pages locally:

  1. Check that make is installed on your machine using make --version. To install, from a terminal run:

sudo apt-get install build-essential
  1. Move into BHF-SmartHealth

cd docs
uv run make html

The above steps:

  1. Move into the docs directory

  2. make html builds the static web pages from the Sphinx sources in docs/source using sphinx-autoapi. The built files are output to docs/build/html.

The resulting web page can be viewed by either:

  1. Opening the docs/build/html/index.html file in your browser, or

  2. Using python’s in built server module. The following command serves the site at localhost:8000:

# Assumes you are in the docs dir
uv run python -m http.server -d build/html

Running Pipelines#

Pre-requisites#

Ensure that you ahve run uv sync to ensure all the required dependencies are installed.

Ensure that the url and keys for the S3 buckets for the data are present in a secrets.yaml in the workflow/config/ directory. The example_secrets.yaml gives a template for the structure of this file.

Pipelines#

There are currently two pipelines:

  1. Snakefile_Device.smk: Converts the raw Fitbit device data (daily summary and intraday) to summarised binary files.

  2. Snakefile_GPS.smk: Converts the raw GPS data to summarised LSOA and Mobility binary files.

Dry-runs#

You can test a pipeline by performing a ‘dry-run`:

uv run snakemake -s workflow/<PipelineFile> --dryrun

This will display what would be done, without executing the pipeline. It will also throw errors if there are problems with the pipeline.

Running#

To execute a pipeline you should run:

uv run snakemake -s workflow/<PipelineFile> --cores

You can specify a value for cores, but if you don’t then snakemake will the number of available CPU cores.