DST pitfalls

9 errors that cost time and produce uninformative error messages

Published

June 6, 2026

This page collects the errors that most frequently catch new users of DST registers. What they have in common: the error messages are either confusing, or there is no error message at all — the result is just silently wrong.

1. `dodsaars` vs `dodsaasg` — use the correct death register

There are two registers with similar names:

Register	Contains	Used for
`dodsaars`	Individual death registrations with precise date of death (`d_dodsdto`)	Censoring at death
`dodsaasg`	Cause-of-death classification	Only for analysis of cause of death

dodsaasg does not have the date of death in the correct format and is not the authoritative source for individual death dates.

Warning

Check dodsaars coverage in your project guide. dodsaars does not necessarily cover your entire study period — in project 708421 it covers only ~1970–2001 (as of June 2026), and post-2001 deaths require a separate extraction. Other projects may have different coverage.

# CORRECT — replace "path/to/dodsaars/" with your project's parquet path
# DARTER: load_database("dodsaars") %>% rename_with(tolower)
death <- open_dataset("path/to/dodsaars/") %>%
  rename_with(tolower)                       # check coverage in your project guide
death_person <- death %>%
  filter(pnr %in% !!cohort_pnrs) %>%
  select(pnr, death_date = d_dodsdto) %>%   # d_dodsdto is the confirmed column
  collect()

# WRONG — do not use dodsaasg for censoring dates

2. RAM is shared — clean up after large extractions

You are on a shared server with shared RAM. When the memory bar in RStudio turns red, everyone on the server experiences slowdowns.

# Filter early — never collect() first
# DARTER: load_database("lmdb") %>% rename_with(tolower)
lmdb <- open_dataset("path/to/lmdb/") %>%
  rename_with(tolower)                  # lazy connection — no RAM used yet

result <- lmdb %>%
  filter(pnr %in% !!cohort_pnrs, substr(atc, 1, 4) == "N06D") %>%   # filter before collect
  select(pnr, atc, eksd) %>%
  collect()                                                            # only now is data moved to R

# Free large objects when you are done with them
rm(lmdb)   # delete the lazy connection — it does not use much, but it is good practice
gc()       # return memory to the operating system

3. `rename_with(tolower)` must be called on each register

Raw column names vary by register and year: PNR, pnr, Pnr, V_CPR. If you forget it, filter(pnr %in% ...) silently fails with “Column pnr not found” — even though the column is there.

The rule: every open_dataset() or load_database() call ends with %>% rename_with(tolower) as the first step in your pipe. See Extracting data step by step for explanation and example.

4. Date columns are not always in Date format

DST registers store dates in multiple formats — they look the same but behave differently.

Format	Example	What `class()` returns	What to do
Date	`2020-05-15`	`"Date"`	Nothing — can be used directly
Character	`"2020-05-15"`	`"character"`	`as.Date(column)`
Datetime	`"2020-05-15 14:32:00"`	`"POSIXct"`	`as.Date(column)` to get only the date part
SAS integer	`21990`	`"numeric"`	`as.Date(column, origin = "1960-01-01")`

The rule: always check class() on a date column before using it in calculations.

class(lpr_a_kontakt$kont_starttidspunkt)   # "POSIXct" — datetime, not Date
# Fix:
mutate(date = as.Date(kont_starttidspunkt))

class(bef$foed_dag)   # "Date" — can be used directly

5. BEF is a status snapshot — not a live register

BEF is a status register: it records the composition of the population at a given reference time — not continuously. DST’s reference time is ultimo (typically 31 December for an annual snapshot). Since 2008, BEF is also delivered quarterly (March, June, September, December).

Note

**aar == 2020 = 1 January 2020" is a project convention.** In many projects BEF snapshots are renamed soaar == 2020` conventionally refers to the population composition as of 1 January 2020 — but this does not follow from DST’s delivery naming. Confirm the convention in your project guide.

See DST’s official BEF documentation: statistikdokumentation/befolkningen →

This means that a person who dies in June 2020 still appears in the 2020 BEF snapshot.

# ERROR: do not use BEF to check "alive on a specific date"
bef_2020 <- bef %>%
  filter(aar == 2020)   # includes everyone in the 2020 snapshot
                        # — including those who die during 2020

# CORRECT: combine with dodsaars to exclude deaths
deaths <- open_dataset("path/to/dodsaars/") %>%   # DARTER: load_database("dodsaars")
  rename_with(tolower) %>%
  filter(pnr %in% !!cohort_pnrs) %>%
  select(pnr, d_dodsdto) %>%
  collect()

bef_alive <- bef_data %>%
  left_join(deaths, by = "pnr") %>%
  filter(is.na(d_dodsdto) | d_dodsdto > index_date)   # alive at index date

6. The “a” in `lpr_a_diagnose` does not mean A-type diagnoses

The table is called lpr_a_diagnose — the “a” refers to “analysis model” (the LPR_A series introduced in 2025). It does not mean the table only contains A-type (action) diagnoses.

The table contains all diagnosis types: A (action), B (secondary diagnosis) and G (underlying condition). You still need to filter on diag_kode_type:

lpr_a_diagnose %>%
  filter(diag_kode_type %in% c("A", "B")) %>%   # still necessary
  ...

7. Categorical codes are not consistent across registers

The same variable can have different coding in different registers — different type (numeric vs. character), different values, or both.

In practice you extract demographic variables (sex, age) from BEF and rarely need to compare the same variable in another register. But if you do, always check with table() and class() before using the variable:

table(register_a$koen)   # what are the actual values and types?
class(register_a$koen)
table(register_b$koen)
class(register_b$koen)

8. `!!` (bang-bang) forgotten in lazy evaluation

When filtering with a local R vector inside a DuckDB query, you must use !!. Without it, DuckDB looks for a column with that name — and fails silently or with a confusing message.

my_pnr_list <- c("001", "002", "003")   # local R vector

# WRONG — DuckDB looks for a column called "my_pnr_list"
bef %>% filter(pnr %in% my_pnr_list)   # error or wrong result

# CORRECT — !! tells DuckDB: "use the local R vector"
bef %>% filter(pnr %in% !!my_pnr_list)

Note

!! is necessary for all local R objects used inside filter(), mutate() etc. on lazy DuckDB connections. See Functions guide for full explanation.

9. `nmi_count` ≠ `nmi_score`

These two variables are not the same and are not interchangeable:

Variable	What it is	Source
`nmi_score`	Weighted comorbidity score — Nordic Multimorbidity Index (Kristensen et al., Clin Epidemiol 2022). 50 predictors with individual weights; lung cancer counts e.g. 19 points, type 2 diabetes counts 2.	See NMI page
`nmi_count`	Simple count of the number of chronic conditions (out of 33 possible) a person has been diagnosed with	Calculated separately

If you use nmi_count in your regression model instead of nmi_score, you are adjusting for something different than you think — and you get no error message.

10. The most common error messages and what they mean

R’s error messages are short and technical — here are the ones you most often encounter in a DST workflow, translated into what they actually mean:

Error message	Typical cause	Solution
`Error: Column 'pnr' not found`	`rename_with(tolower)` is missing	Add `%>% rename_with(tolower)` immediately after `load_database()` — see pitfall 3
`Error: object 'my_list' not found`	`!!` missing in `filter()` on a lazy connection	Write `filter(pnr %in% !!my_list)` — see pitfall 8
`Error: could not find function "load_database"`	`library(dstDataPrep)` missing (DARTER only)	Add `library(dstDataPrep)` at the top of the script — only relevant on DARTER/project 708421
`non-numeric argument to binary operator`	Date column is `character`, not `Date`	`mutate(date = as.Date(date))` — see pitfall 4
`Error in filter.default(...)`	Filtering on a lazy object without `%>%`	Switch to `%>%` — see the pipe
`Error: Can't convert ... to ...`	Join on columns of different type (e.g. numeric vs. character)	Use `mutate(pnr = as.character(pnr))` to match types
`object of type 'closure' is not subsettable`	A variable name overwrites a function (e.g. `data <- ...`)	Use a unique variable name — avoid `data`, `df`, `c` as object names

Tip

The fastest debugging flow — what to do step by step when you see a red error message — is described in Phase 7 — Seeing a red error message?.

1. dodsaars vs dodsaasg — use the correct death register

2. RAM is shared — clean up after large extractions

3. rename_with(tolower) must be called on each register

4. Date columns are not always in Date format

5. BEF is a status snapshot — not a live register

6. The “a” in lpr_a_diagnose does not mean A-type diagnoses