Getting Started#
BHF-SmartHealth#
Description#
Smart-Health will collect smartphone and wearable data from up to 10,000 consenting participants (sponsored by University of Sheffield) over 5 years. This data will be linked to NHS records via the local Subnational Data Environment for Yorkshire and Humber. The smartphone and wearable data and metadata from different devices and manufacturers need to be stored in an environment that supports processing, linkage and access for research. This repository is to provide a pipeline that process the data received from Daiser and build a database for future Smart Health projects.
Development:#
Pre-requisites:#
uv(see here)
Installation:#
Move into the repo
cd BHF-SmartHealthUse
uv syncto install the project code and dependenciesAdd a local git config filter to strip notebook outputs in git commits. This reduces merge conflicts associated with Notebook metadata. Do this by running the following:
git config filter.strip-notebook-output.clean 'jupyter nbconvert --ClearOutputPreprocessor.enabled=True --ClearMetadataPreprocessor.enabled=True --to=notebook --stdin --stdout --log-level=ERROR'
n.b.: If this step fails, there is a backstop CI workflow that will strip notebook outputs.
Rendering documentation#
The project documentation is built using Sphinx. There is a CI/CD workflow that builds and deploys the project documentation in main to GitHub pages. API reference documentation will be automatically built from the scripts in src/bhf_smarthealth using sphinx-autoapi (see here for more information.)
To render the Sphinx documentation pages locally:
Check that
makeis installed on your machine usingmake --version. To install, from a terminal run:
sudo apt-get install build-essential
Move into
BHF-SmartHealth
cd docs
uv run make html
The above steps:
Move into the
docsdirectorymake htmlbuilds the static web pages from the Sphinx sources indocs/sourceusingsphinx-autoapi. The built files are output todocs/build/html.
The resulting web page can be viewed by either:
Opening the
docs/build/html/index.htmlfile in your browser, orUsing python’s in built server module. The following command serves the site at
localhost:8000:
# Assumes you are in the docs dir
uv run python -m http.server -d build/html
Running Pipelines#
Pre-requisites#
Ensure that you ahve run uv sync to ensure all the required dependencies are installed.
Ensure that the url and keys for the S3 buckets for the data are present in a secrets.yaml in the workflow/config/ directory. The example_secrets.yaml gives a template for the structure of this file.
Pipelines#
There are currently two pipelines:
Snakefile_Device.smk: Converts the raw Fitbit device data (daily summary and intraday) to summarised binary files.Snakefile_GPS.smk: Converts the raw GPS data to summarised LSOA and Mobility binary files.
Dry-runs#
You can test a pipeline by performing a ‘dry-run`:
uv run snakemake -s workflow/<PipelineFile> --dryrun
This will display what would be done, without executing the pipeline. It will also throw errors if there are problems with the pipeline.
Running#
To execute a pipeline you should run:
uv run snakemake -s workflow/<PipelineFile> --cores
You can specify a value for cores, but if you don’t then snakemake will the number of available CPU cores.