DARTER pitfalls

Quirks and known issues specific to project 708421

Published

June 6, 2026

This page supplements the general DST pitfalls with issues specific to the DARTER project.


1. Check that parquet files are up to date

Most registers are as of 2026 updated to end of 2024 (confirmed by Anders Aasted Isaksen/Marie Kempf Frydendahl, DARTER team). dodsaars covers only ~1970–2001 however β€” deaths after 2001 are not captured by the current code.

# Check when the parquet folder was last updated:
file.info("E:/workdata/708421/cleaned-data/parquet-registers/dodsaars/")$mtime

The same may apply to other registers. Always confirm that coverage matches your study period before running the pipeline.

Warning

If the parquet file does not cover your study period: You need to extract data from the raw SAS file on DST. Contact your data manager β€” they can help with raw data access and conversion.

# Current code β€” only catches deaths up to parquet file coverage:
death <- load_database("dodsaars") %>% rename_with(tolower)   # lazy connection
deaths <- death %>%
  filter(pnr %in% !!pnr_list) %>%   # only the cohort's pnr's
  select(pnr, d_dodsdto) %>%         # only death date
  collect()                           # fetch into R

Consequence of missing coverage: Comparators and BS patients who die after the parquet file’s end date are treated as alive β€” this affects censoring and matching in 01_build_cohorts.R.


2. Surgery and procedures

Procedure codes are split across two registers by period:

  • lpr_sksopr (parquet-registers) β€” procedures and surgery 1996–2018, joined to lpr_adm via recnum
  • procedurer_kirurgi (parquet-external) β€” 2019 and onwards, joined to lpr_a_kontakt via dw_ek_forloeb
Warning

dw_ek_kontakt is NA for all rows in procedurer_kirurgi (confirmed 2026-06-02). Use dw_ek_forloeb β€” not dw_ek_kontakt β€” to fetch pnr from lpr_a_kontakt.

# WRONG β€” dw_ek_kontakt is NA:
proc %>% left_join(contacts, by = "dw_ek_kontakt")   # joins nothing

# CORRECT β€” use dw_ek_forloeb:
proc <- load_database("procedurer_kirurgi") %>%
  rename_with(tolower) %>%
  left_join(
    load_database("lpr_a_kontakt") %>%
      rename_with(tolower) %>%
      select(dw_ek_forloeb, pnr),
    by = "dw_ek_forloeb"
  )

3. lpr_a_diagnose β€” β€œa” does not mean A-type diagnoses

The table is called lpr_a_diagnose β€” β€œa” refers to the analysis model designation (LPR_A series). It is not a filter on A-type diagnoses. The table contains A, B and G. You still need to filter on diag_kode_type.


4. nmi_count β‰  nmi_score

Variable What it is
nmi_score Weighted score β€” Nordic Multimorbidity Index (50 predictors with individual weights)
nmi_count Simple count of the number of chronic conditions (33 possible)

If you use nmi_count in your Cox model instead of nmi_score, you are adjusting for something different than you think.


5. LPR3 β€” filter on lprindberetningssystem == "LPR3"

The LPR_A registers (lpr_a_kontakt, lpr_a_diagnose) contain data from two formats: the old LPR_F and the new LPR_A. Both exist in the project and cover overlapping periods. Without the filter you get duplicated rows (confirmed by Anders Aasted Isaksen, DARTER team 2026).

# CORRECT β€” filter to LPR_A format only:
lpr3_k <- load_database("lpr_a_kontakt") %>%
  rename_with(tolower) %>%
  filter(lprindberetningssystem == "LPR3")   # removes LPR_F overlap and duplicates
Warning

get_lpr_diagnoses() in darter/00_index.qmd is updated with this filter. If you have copies of LPR3 code in your own scripts, you must add it manually.


6. Laboratory results β€” use laboratorieproevesvar_

The new laboratory data register is called laboratorieproevesvar_ and contains >2.2 billion rows. The old lab_forsker / lab_dm_forsker still exists but covers the same data β€” use only one source to avoid duplicates.

lab <- load_database("laboratorieproevesvar_") %>%
  rename_with(tolower) %>%
  filter(pnr %in% !!kohort$pnr) %>%   # filter BEFORE collect β€” the register is very large
  select(pnr, npu, samplingdato, samplevalue) %>%
  collect()
# samplevalue is character β€” can contain "not detected", "negative" etc.

See also

Back to top