DARTER β Project 708421
Project-specific guide to the BS & Dementia cohort study
DARTER β Diabetes And inteRgenerational Transmission of hEalth determinants over the life couRse (project 708421).
This section is only for those working on the DARTER project. The content here builds on the general DST guide and adds the project-specific material.
Searchable variable and register overview for DARTER All variables and registers applied for in the project are collected in a searchable table: steno-aarhus.github.io/darter-project β
New to the project? Start with the general guide and return here: β Phase 1 β Plan your study β Phase 2 β R: the bare essentials β Phase 3 β Log in to DST
Practical introduction from the project itself In Lukeβs folder on the server you will find a thorough guide to working with data on DST written specifically for this project:
E:/workdata/708421/workspaces/luke/dstDataPrep/
Find the .qmd file in this folder β it walks through dstDataPrep, load_database() and the practical workflow on DARTER step by step with real examples from the project. Good supplementary reading to this guide.
In this section
| Page | Contents |
|---|---|
| This page | Setup (dstDataPrep + duckplyr) and a reusable LPR extraction function |
| Register paths and datastores | Confirmed paths and access methods for all registers on 708421 |
| DARTER-specific pitfalls | Quirks specific to this project |
Initial setup steps for DARTER
Two steps must be completed before you write code on DARTER:
Step 1 β Build dstDataPrep
dstDataPrep is the package that provides access to load_database() and all register data. It must be built manually, as the DST server resets installed packages.
- File β Open Project in New Session
- Navigate to
E:/workdata/708421/workspaces/luke/dstDataPrep/dstDataPrep.Rproj - Press Ctrl+Shift+B (Build) β or on Mac via the menu: Build β Install Package
- Wait for βDoneβ and close the session
Do it again if library(dstDataPrep) reports Error: there is no package called 'dstDataPrep'.
Step 2 β Reinstall duckplyr at the start of each session
install.packages("duckplyr") # run before library() β resets at logoutAlternative: use compute() for DuckDB connection load_database() returns an Arrow connection that only supports a subset of dplyr functions. If you need DuckDB-specific functionality or experience slow performance, you can pipe to compute():
lmdb <- load_database("lmdb") %>%
filter(pnr %in% !!kohort$pnr) %>% # filter in Arrow BEFORE compute β reduce data
compute() # convert to DuckDB connectionAlways reduce data with filter()/select() before compute(). See osdc documentation for DuckDB configuration and memory limits.
Before running a script: verify that path_output at the top of each script points to your workspace folder.
Recommendation: create a helper function for LPR extractions
LPR extractions require combining LPR2 somatic, LPR2 psychiatric and LPR3 β and doing the same for each new outcome in the project. It pays off to encapsulate this in one reusable function rather than copying the code repeatedly.
Advantages: - One place to fix if something changes (e.g. a new register or a new column) - The code block for each outcome is reduced from ~40 lines to one function call - Errors are introduced in one place instead of in each copy
How to create the function β define it at the top of your script or in a separate functions.R file:
See the full get_lpr_diagnoses() function
library(dstDataPrep)
library(dplyr)
get_lpr_diagnoses <- function(pnr_vector, diagtypes = c("A", "B"), inpatient_only = FALSE) {
# Open registers
lpr_adm <- load_database("lpr_adm") %>% rename_with(tolower) # LPR2 somatic contacts
lpr_diag <- load_database("lpr_diag") %>% rename_with(tolower) # LPR2 somatic diagnoses
psyk_adm <- load_database("t_psyk_adm") %>% rename_with(tolower) %>%
rename(pnr = v_cpr, recnum = k_recnum) # LPR2 psychiatric contacts
psyk_diag <- load_database("t_psyk_diag") %>% rename_with(tolower) %>%
rename(recnum = v_recnum) # LPR2 psychiatric diagnoses
lpr3_k <- load_database("lpr_a_kontakt") %>% rename_with(tolower) %>%
filter(lprindberetningssystem == "LPR3") # CRITICAL: avoid duplicated rows from LPR_F format
lpr3_d <- load_database("lpr_a_diagnose") %>% rename_with(tolower) # LPR3 diagnoses
# Filter on admission type if desired
if (inpatient_only) {
lpr_adm <- lpr_adm %>% filter(c_pattype == "0") # "0" = inpatient in LPR2
lpr3_k <- lpr3_k %>% filter(kont_type == "ALCA00") # "ALCA00" = inpatient in LPR3
}
# LPR2 somatic
lpr2_dx <- lpr_adm %>%
filter(pnr %in% !!pnr_vector) %>%
select(pnr, recnum, date_contact = d_inddto) %>%
inner_join(
lpr_diag %>% filter(c_diagtype %in% !!diagtypes) %>% select(recnum, c_diag),
by = "recnum"
) %>%
collect() %>%
mutate(icd3 = substr(c_diag, 2, 4)) # strip D-prefix
# LPR2 psychiatric
lpr2_psyk_dx <- psyk_adm %>%
filter(pnr %in% !!pnr_vector) %>%
select(pnr, recnum, date_contact = d_inddto) %>%
inner_join(
psyk_diag %>% filter(c_diagtype %in% !!diagtypes) %>% select(recnum, c_diag),
by = "recnum"
) %>%
collect() %>%
mutate(icd3 = substr(c_diag, 2, 4))
# LPR3
lpr3_dx <- lpr3_k %>%
filter(pnr %in% !!pnr_vector) %>%
select(pnr, dw_ek_kontakt, date_contact = kont_starttidspunkt) %>%
inner_join(
lpr3_d %>%
filter(diag_kode_type %in% !!diagtypes,
is.na(senere_afkraeftet) | senere_afkraeftet != "Ja") %>%
select(dw_ek_kontakt, c_diag = diag_kode),
by = "dw_ek_kontakt"
) %>%
collect() %>%
mutate(date_contact = as.Date(date_contact), # datetime β date
icd3 = substr(c_diag, 2, 4))
bind_rows(lpr2_dx, lpr2_psyk_dx, lpr3_dx) # return combined table
}
Use the function β one call per extraction, only change CODES
kohort <- readRDS("datasets/full_cohort.rds")
pnr_list <- unique(kohort$pnr)
# Fetch all diagnoses for the cohort (Phase 1 β see hospital contacts page)
alle_dx <- get_lpr_diagnoses(
pnr_vector = pnr_list,
diagtypes = c("A", "B"),
inpatient_only = FALSE
)
# Returns: pnr | date_contact | c_diag | icd3
# Extract one outcome β only change CODES (Phase 2)
CODES <- c("F00", "F01", "F02", "F03", "G30", "G31") # dementia
dementia <- alle_dx %>%
filter(icd3 %in% CODES) %>%
inner_join(kohort %>% select(pnr, index_date), by = "pnr") %>%
filter(date_contact > index_date) %>%
group_by(pnr) %>% arrange(date_contact) %>% slice(1) %>% ungroup() %>%
select(pnr, dementia_date = date_contact)
result <- kohort %>% select(pnr) %>% left_join(dementia, by = "pnr")
saveRDS(result, "datasets/extract_dementia.rds")This is the DARTER variant (using load_database() and the confirmed register names for 708421, as of June 2026). The general open_dataset() version and the explanation behind the pattern are in Phase 9b β LPR extraction.
See also
get_lpr_diagnoses() above wraps the pattern from the general guide:
- Phase 9 β Hospital contacts (LPR) β the explanation behind the two-phase strategy, LPR2/LPR3 and the D-prefix
- Phase 8 β Know your registers β which register contains what
- Register paths and datastores β confirmed paths on 708421