Glossary of Terms#
Common Abbreviations#
Abbreviation |
Full Term |
Definition/Information |
|---|---|---|
SQL |
Structured Query Language |
A domain-specific language used for managing data held in a Relational Database Management System (RDBMS). |
LSOA |
Lower Super Output Area |
A small geographic unit in England and Wales containing around 1,000-3,000 people, used primarily for statistical reporting. |
AHAH |
Access to Healthy Assets and Hazards |
A specific index or measure often used in public health and geography research to assess the impact of the local environment on health. |
JSON |
JavaScript Object Notation |
A lightweight, human-readable text format commonly used for data interchange (sending data between a server and web application). |
ERD |
Entity-Relationship Diagram |
A structural diagram used to design or model a database. It shows how different entities (data objects) relate to one another. |
MVP |
Minimum Viable Product |
An early version of a project with basic features to collect feedback for future product development. |
API |
Application Programming Interface |
A set of rules and protocols that allows different software applications to communicate with each other. |
NHS |
National Health Service |
The publicly funded healthcare system of England, Scotland, or Wales. |
ICB |
Integrated Care Board |
A statutory NHS organisation in England responsible for the day-to-day planning and allocation of NHS resources. |
ETL |
Extract, Transform, Load |
A process in data warehousing that involves pulling data from sources, cleaning/modifying it, and inserting it into a destination. |
SFTP |
SSH File Transfer Protocol (or Secure File Transfer Protocol) |
A network protocol that provides secure file transfer capabilities using the Secure Shell (SSH) connection protocol. |
EC2 |
Elastic Compute Cloud |
A service from Amazon Web Services (AWS) that provides resizable computing capacity in the cloud. |
VM |
Virtual Machine |
An emulation of a computer system (‘virtual computer’) that can run an operating system and applications independently of the host machine. |
DAG |
Directed Acyclic Graph |
A mathematical structure used in computer science for modeling dependencies and workflows where all paths are one-directional and never loop. |
Quick Reference Guide#
Terms#
Term |
Definition/Function |
|---|---|
Database |
An organised collection of structured information, or data, typically stored electronically in a computer system. It allows for efficient storage, retrieval, modification, and management of data. |
Pipeline |
An automated system that takes raw data, transforms it in some way, then moves it to a destination storage area. |
Batch processing |
An execution method where data is collected over time and processed in large groups or “batches.” It is efficient for non-urgent, high-volume data operations. |
Daiser |
Daiser are a UK-based company that build digital health applications. |
Bucket (Object storage) |
The fundamental container used in object storage systems (like AWS S3) to store structured, semi-structured, or unstructured data. Buckets are used to organise, control access to, and manage data at a high level. |
Tools#
Tool |
Definition/Function |
|---|---|
Python |
A high-level, general-purpose programming language widely used in data science, machine learning, web development, and automation due to its clear syntax and large ecosystem of libraries. |
Jupyter Notebook |
An open-source web application that allows you to create and share documents containing live code, equations, visualisations, and narrative text. |
Snakemake |
A workflow management system that helps create portable, reproducible, and scalable data analysis pipelines. It uses a Python-based syntax to define rules and dependencies. |
UV Package manager |
An open-source, extremely fast Python package manager, written in Rust. |
Github |
A web-based platform for version control and collaboration, built around the Git system. It is primarily used for hosting source code, managing projects, and facilitating team collaboration. |
S3 |
Amazon Simple Storage Service. A highly scalable, high-speed, low-cost cloud storage service for data backup, archival, and big data analytics. It operates using the concept of ‘buckets.’ |
RONIN |
A cloud platform that simplifies the deployment and management of AWS computing infrastructure (like virtual machines and storage) for research and data-intensive projects. |
DuckDB |
An open-source in-process SQL OLAP (Online Analytical Processing) database management system. It is designed to run complex analytical queries directly on data files (like CSVs, Parquet) without needing to load them into a traditional database server. |