DST pitfalls

9 errors that cost time and produce uninformative error messages

Published

June 6, 2026

This page collects the errors that most frequently catch new users of DST registers. What they have in common: the error messages are either confusing, or there is no error message at all β€” the result is just silently wrong.


1. dodsaars vs dodsaasg β€” use the correct death register

There are two registers with similar names:

Register Contains Used for
dodsaars Individual death registrations with precise date of death (d_dodsdto) Censoring at death
dodsaasg Cause-of-death classification Only for analysis of cause of death

dodsaasg does not have the date of death in the correct format and is not the authoritative source for individual death dates.

Warning

Check dodsaars coverage in your project guide. dodsaars does not necessarily cover your entire study period β€” in project 708421 it covers only ~1970–2001 (as of June 2026), and post-2001 deaths require a separate extraction. Other projects may have different coverage.

# CORRECT β€” replace "path/to/dodsaars/" with your project's parquet path
# DARTER: load_database("dodsaars") %>% rename_with(tolower)
death <- open_dataset("path/to/dodsaars/") %>%
  rename_with(tolower)                       # check coverage in your project guide
death_person <- death %>%
  filter(pnr %in% !!cohort_pnrs) %>%
  select(pnr, death_date = d_dodsdto) %>%   # d_dodsdto is the confirmed column
  collect()

# WRONG β€” do not use dodsaasg for censoring dates

2. RAM is shared β€” clean up after large extractions

You are on a shared server with shared RAM. When the memory bar in RStudio turns red, everyone on the server experiences slowdowns.

# Filter early β€” never collect() first
# DARTER: load_database("lmdb") %>% rename_with(tolower)
lmdb <- open_dataset("path/to/lmdb/") %>%
  rename_with(tolower)                  # lazy connection β€” no RAM used yet

result <- lmdb %>%
  filter(pnr %in% !!cohort_pnrs, substr(atc, 1, 4) == "N06D") %>%   # filter before collect
  select(pnr, atc, eksd) %>%
  collect()                                                            # only now is data moved to R

# Free large objects when you are done with them
rm(lmdb)   # delete the lazy connection β€” it does not use much, but it is good practice
gc()       # return memory to the operating system

3. rename_with(tolower) must be called on each register

Raw column names vary by register and year: PNR, pnr, Pnr, V_CPR. If you forget it, filter(pnr %in% ...) silently fails with β€œColumn pnr not found” β€” even though the column is there.

The rule: every open_dataset() or load_database() call ends with %>% rename_with(tolower) as the first step in your pipe. See Extracting data step by step for explanation and example.


4. Date columns are not always in Date format

DST registers store dates in multiple formats β€” they look the same but behave differently.

Format Example What class() returns What to do
Date 2020-05-15 "Date" Nothing β€” can be used directly
Character "2020-05-15" "character" as.Date(column)
Datetime "2020-05-15 14:32:00" "POSIXct" as.Date(column) to get only the date part
SAS integer 21990 "numeric" as.Date(column, origin = "1960-01-01")

The rule: always check class() on a date column before using it in calculations.

class(lpr_a_kontakt$kont_starttidspunkt)   # "POSIXct" β€” datetime, not Date
# Fix:
mutate(date = as.Date(kont_starttidspunkt))

class(bef$foed_dag)   # "Date" β€” can be used directly

5. BEF is a status snapshot β€” not a live register

BEF is a status register: it records the composition of the population at a given reference time β€” not continuously. DST’s reference time is ultimo (typically 31 December for an annual snapshot). Since 2008, BEF is also delivered quarterly (March, June, September, December).

Note

**aar == 2020 = 1 January 2020" is a project convention.** In many projects BEF snapshots are renamed soaar == 2020` conventionally refers to the population composition as of 1 January 2020 β€” but this does not follow from DST’s delivery naming. Confirm the convention in your project guide.

See DST’s official BEF documentation: statistikdokumentation/befolkningen β†’

This means that a person who dies in June 2020 still appears in the 2020 BEF snapshot.

# ERROR: do not use BEF to check "alive on a specific date"
bef_2020 <- bef %>%
  filter(aar == 2020)   # includes everyone in the 2020 snapshot
                        # β€” including those who die during 2020

# CORRECT: combine with dodsaars to exclude deaths
deaths <- open_dataset("path/to/dodsaars/") %>%   # DARTER: load_database("dodsaars")
  rename_with(tolower) %>%
  filter(pnr %in% !!cohort_pnrs) %>%
  select(pnr, d_dodsdto) %>%
  collect()

bef_alive <- bef_data %>%
  left_join(deaths, by = "pnr") %>%
  filter(is.na(d_dodsdto) | d_dodsdto > index_date)   # alive at index date

6. The β€œa” in lpr_a_diagnose does not mean A-type diagnoses

The table is called lpr_a_diagnose β€” the β€œa” refers to β€œanalysis model” (the LPR_A series introduced in 2025). It does not mean the table only contains A-type (action) diagnoses.

The table contains all diagnosis types: A (action), B (secondary diagnosis) and G (underlying condition). You still need to filter on diag_kode_type:

lpr_a_diagnose %>%
  filter(diag_kode_type %in% c("A", "B")) %>%   # still necessary
  ...

7. Categorical codes are not consistent across registers

The same variable can have different coding in different registers β€” different type (numeric vs. character), different values, or both.

In practice you extract demographic variables (sex, age) from BEF and rarely need to compare the same variable in another register. But if you do, always check with table() and class() before using the variable:

table(register_a$koen)   # what are the actual values and types?
class(register_a$koen)
table(register_b$koen)
class(register_b$koen)

8. !! (bang-bang) forgotten in lazy evaluation

When filtering with a local R vector inside a DuckDB query, you must use !!. Without it, DuckDB looks for a column with that name β€” and fails silently or with a confusing message.

my_pnr_list <- c("001", "002", "003")   # local R vector

# WRONG β€” DuckDB looks for a column called "my_pnr_list"
bef %>% filter(pnr %in% my_pnr_list)   # error or wrong result

# CORRECT β€” !! tells DuckDB: "use the local R vector"
bef %>% filter(pnr %in% !!my_pnr_list)
Note

!! is necessary for all local R objects used inside filter(), mutate() etc. on lazy DuckDB connections. See Functions guide for full explanation.


9. nmi_count β‰  nmi_score

These two variables are not the same and are not interchangeable:

Variable What it is Source
nmi_score Weighted comorbidity score β€” Nordic Multimorbidity Index (Kristensen et al., Clin Epidemiol 2022). 50 predictors with individual weights; lung cancer counts e.g. 19 points, type 2 diabetes counts 2. See NMI page
nmi_count Simple count of the number of chronic conditions (out of 33 possible) a person has been diagnosed with Calculated separately

If you use nmi_count in your regression model instead of nmi_score, you are adjusting for something different than you think β€” and you get no error message.


10. The most common error messages and what they mean

R’s error messages are short and technical β€” here are the ones you most often encounter in a DST workflow, translated into what they actually mean:

Error message Typical cause Solution
Error: Column 'pnr' not found rename_with(tolower) is missing Add %>% rename_with(tolower) immediately after load_database() β€” see pitfall 3
Error: object 'my_list' not found !! missing in filter() on a lazy connection Write filter(pnr %in% !!my_list) β€” see pitfall 8
Error: could not find function "load_database" library(dstDataPrep) missing (DARTER only) Add library(dstDataPrep) at the top of the script β€” only relevant on DARTER/project 708421
non-numeric argument to binary operator Date column is character, not Date mutate(date = as.Date(date)) β€” see pitfall 4
Error in filter.default(...) Filtering on a lazy object without %>% Switch to %>% β€” see the pipe
Error: Can't convert ... to ... Join on columns of different type (e.g. numeric vs. character) Use mutate(pnr = as.character(pnr)) to match types
object of type 'closure' is not subsettable A variable name overwrites a function (e.g. data <- ...) Use a unique variable name β€” avoid data, df, c as object names
Tip

The fastest debugging flow β€” what to do step by step when you see a red error message β€” is described in Phase 7 β€” Seeing a red error message?.


Back to top