Glossary of Terms

Glossary of Terms#

Common Abbreviations#

Abbreviation

Full Term

Definition/Information

SQL

Structured Query Language

A domain-specific language used for managing data held in a Relational Database Management System (RDBMS).

LSOA

Lower Super Output Area

A small geographic unit in England and Wales containing around 1,000-3,000 people, used primarily for statistical reporting.

AHAH

Access to Healthy Assets and Hazards

A specific index or measure often used in public health and geography research to assess the impact of the local environment on health.

JSON

JavaScript Object Notation

A lightweight, human-readable text format commonly used for data interchange (sending data between a server and web application).

ERD

Entity-Relationship Diagram

A structural diagram used to design or model a database. It shows how different entities (data objects) relate to one another.

MVP

Minimum Viable Product

An early version of a project with basic features to collect feedback for future product development.

API

Application Programming Interface

A set of rules and protocols that allows different software applications to communicate with each other.

NHS

National Health Service

The publicly funded healthcare system of England, Scotland, or Wales.

ICB

Integrated Care Board

A statutory NHS organisation in England responsible for the day-to-day planning and allocation of NHS resources.

ETL

Extract, Transform, Load

A process in data warehousing that involves pulling data from sources, cleaning/modifying it, and inserting it into a destination.

SFTP

SSH File Transfer Protocol (or Secure File Transfer Protocol)

A network protocol that provides secure file transfer capabilities using the Secure Shell (SSH) connection protocol.

EC2

Elastic Compute Cloud

A service from Amazon Web Services (AWS) that provides resizable computing capacity in the cloud.

VM

Virtual Machine

An emulation of a computer system (‘virtual computer’) that can run an operating system and applications independently of the host machine.

DAG

Directed Acyclic Graph

A mathematical structure used in computer science for modeling dependencies and workflows where all paths are one-directional and never loop.

Quick Reference Guide#

Terms#

Term

Definition/Function

Database

An organised collection of structured information, or data, typically stored electronically in a computer system. It allows for efficient storage, retrieval, modification, and management of data.

Pipeline

An automated system that takes raw data, transforms it in some way, then moves it to a destination storage area.

Batch processing

An execution method where data is collected over time and processed in large groups or “batches.” It is efficient for non-urgent, high-volume data operations.

Daiser

Daiser are a UK-based company that build digital health applications.

Bucket (Object storage)

The fundamental container used in object storage systems (like AWS S3) to store structured, semi-structured, or unstructured data. Buckets are used to organise, control access to, and manage data at a high level.

Tools#

Tool

Definition/Function

Python

A high-level, general-purpose programming language widely used in data science, machine learning, web development, and automation due to its clear syntax and large ecosystem of libraries.

Jupyter Notebook

An open-source web application that allows you to create and share documents containing live code, equations, visualisations, and narrative text.

Snakemake

A workflow management system that helps create portable, reproducible, and scalable data analysis pipelines. It uses a Python-based syntax to define rules and dependencies.

UV Package manager

An open-source, extremely fast Python package manager, written in Rust.

Github

A web-based platform for version control and collaboration, built around the Git system. It is primarily used for hosting source code, managing projects, and facilitating team collaboration.

S3

Amazon Simple Storage Service. A highly scalable, high-speed, low-cost cloud storage service for data backup, archival, and big data analytics. It operates using the concept of ‘buckets.’

RONIN

A cloud platform that simplifies the deployment and management of AWS computing infrastructure (like virtual machines and storage) for research and data-intensive projects.

DuckDB

An open-source in-process SQL OLAP (Online Analytical Processing) database management system. It is designed to run complex analytical queries directly on data files (like CSVs, Parquet) without needing to load them into a traditional database server.