UK Biobank Help

A companion resource to ukbAid for working with clinical records on RAP

Published

June 2, 2026

About this site

This guide covers the parts of UK Biobank analysis that begin where ukbAid ends: how to extract, clean, and work with clinical records — GP diagnoses, hospital episodes, and medication prescriptions — once your project is set up and running on the Research Analysis Platform (RAP).

It is written for collaborators and students at Steno Diabetes Center Aarhus who are new to the UK Biobank platform or new to working with linked clinical data.


Before you start

ImportantNew to this site? Start with ukbAid first

Before using any script on this site, complete the ukbAid initial setup. This covers requesting RAP access, creating your GitHub personal access token, and cloning your project repository. None of the scripts here will work without that foundation.

Then go to Start Here for a step-by-step walkthrough of your first session.

NoteThe RAP environment resets between sessions

Packages do not persist when a session ends. Run scripts/setup.R steps 1–3 at the start of every RAP session to reinstall and reload everything.


Pages on this site

Start Here

A step-by-step walkthrough for new users: first-session setup, how to run your first extraction, and where to go next.

Extract Data

How clinical data is stored on RAP, the Arrow query pattern for large files, UK Biobank Field IDs, primary care data availability, and a quick-reference table for common questions.

Code Lists

How to build your own diagnostic code list CSV, how to load validated code lists from Prigge et al. directly from GitHub, and where to find lists for over 200 conditions.

Functions

Every function used in the extraction scripts explained for a first-time reader — including the pipe %>%, with analogies and non-obvious behaviours.

Data Management

How to merge diagnosis, prescription, and demographic datasets; GP linkage filtering; deriving first diagnosis dates; and renaming/recoding variables.

Dataset Reference

Confirmed column names for GP clinical records, HES diagnoses, GP prescriptions, and UKB demographics fields — with descriptions and quirks.

Common Mistakes

Thirteen UK Biobank pitfalls that produce silent errors or session crashes, with wrong-vs-correct code examples for each.


What this site adds to ukbAid

flowchart TD
    subgraph ukbaid ["ukbAid"]
        A["Request RAP access"]
        B["Project setup & GitHub"]
        C["Extract UKB variables"]
        A --> B --> C
    end

    subgraph site ["This site"]
        D["Build code lists"]
        E["Extract GP + HES diagnoses<br/>extract_diagnoses.R"]
        F["Extract medication prescriptions<br/>extract_medications.R"]
        G["Merge & manage datasets<br/>manage_dataset.R"]
        D --> E
        E --> G
        F --> G
    end

    C --> E
    C --> F


Folder structure

When this project is set up as a standalone repository, the layout is:

ukb-clinical-guide/        <- project root (future GitHub repository)
├── scripts/
│   ├── setup.R               <- session startup: packages, demographics, save pattern
│   ├── extract_diagnoses.R   <- GP + HES extraction using a code list CSV
│   ├── extract_medications.R <- prescription extraction: BNF and regex approaches
│   └── manage_dataset.R      <- merge datasets, derive first diagnosis dates
├── docs/
│   ├── _quarto.yml           <- site configuration (theme, navbar, footer)
│   ├── index.qmd             <- this page
│   ├── start-here.qmd        <- step-by-step path for new users
│   ├── getting-started-ukb.qmd
│   ├── code-lists-guide.qmd
│   ├── guide-to-functions.qmd
│   ├── data-management.qmd
│   ├── dataset-reference.qmd
│   └── common-mistakes.qmd
└── .github/
    └── workflows/
        └── publish.yml       <- GitHub Actions: render site and deploy to GitHub Pages