flowchart TD
subgraph ukbaid ["ukbAid"]
A["Request RAP access"]
B["Project setup & GitHub"]
C["Extract UKB variables"]
A --> B --> C
end
subgraph site ["This site"]
D["Build code lists"]
E["Extract GP + HES diagnoses<br/>extract_diagnoses.R"]
F["Extract medication prescriptions<br/>extract_medications.R"]
G["Merge & manage datasets<br/>manage_dataset.R"]
D --> E
E --> G
F --> G
end
C --> E
C --> F
UK Biobank Help
A companion resource to ukbAid for working with clinical records on RAP
About this site
This guide covers the parts of UK Biobank analysis that begin where ukbAid ends: how to extract, clean, and work with clinical records — GP diagnoses, hospital episodes, and medication prescriptions — once your project is set up and running on the Research Analysis Platform (RAP).
It is written for collaborators and students at Steno Diabetes Center Aarhus who are new to the UK Biobank platform or new to working with linked clinical data.
Before you start
Before using any script on this site, complete the ukbAid initial setup. This covers requesting RAP access, creating your GitHub personal access token, and cloning your project repository. None of the scripts here will work without that foundation.
Then go to Start Here for a step-by-step walkthrough of your first session.
Packages do not persist when a session ends. Run scripts/setup.R steps 1–3 at the start of every RAP session to reinstall and reload everything.
Pages on this site
Start Here
A step-by-step walkthrough for new users: first-session setup, how to run your first extraction, and where to go next.
Extract Data
How clinical data is stored on RAP, the Arrow query pattern for large files, UK Biobank Field IDs, primary care data availability, and a quick-reference table for common questions.
Code Lists
How to build your own diagnostic code list CSV, how to load validated code lists from Prigge et al. directly from GitHub, and where to find lists for over 200 conditions.
Functions
Every function used in the extraction scripts explained for a first-time reader — including the pipe %>%, with analogies and non-obvious behaviours.
Data Management
How to merge diagnosis, prescription, and demographic datasets; GP linkage filtering; deriving first diagnosis dates; and renaming/recoding variables.
Dataset Reference
Confirmed column names for GP clinical records, HES diagnoses, GP prescriptions, and UKB demographics fields — with descriptions and quirks.
Common Mistakes
Thirteen UK Biobank pitfalls that produce silent errors or session crashes, with wrong-vs-correct code examples for each.
What this site adds to ukbAid
Folder structure
When this project is set up as a standalone repository, the layout is:
ukb-clinical-guide/ <- project root (future GitHub repository)
├── scripts/
│ ├── setup.R <- session startup: packages, demographics, save pattern
│ ├── extract_diagnoses.R <- GP + HES extraction using a code list CSV
│ ├── extract_medications.R <- prescription extraction: BNF and regex approaches
│ └── manage_dataset.R <- merge datasets, derive first diagnosis dates
├── docs/
│ ├── _quarto.yml <- site configuration (theme, navbar, footer)
│ ├── index.qmd <- this page
│ ├── start-here.qmd <- step-by-step path for new users
│ ├── getting-started-ukb.qmd
│ ├── code-lists-guide.qmd
│ ├── guide-to-functions.qmd
│ ├── data-management.qmd
│ ├── dataset-reference.qmd
│ └── common-mistakes.qmd
└── .github/
└── workflows/
└── publish.yml <- GitHub Actions: render site and deploy to GitHub Pages