DST pitfalls
9 errors that cost time and produce uninformative error messages
This page collects the errors that most frequently catch new users of DST registers. What they have in common: the error messages are either confusing, or there is no error message at all β the result is just silently wrong.
1. dodsaars vs dodsaasg β use the correct death register
There are two registers with similar names:
| Register | Contains | Used for |
|---|---|---|
dodsaars |
Individual death registrations with precise date of death (d_dodsdto) |
Censoring at death |
dodsaasg |
Cause-of-death classification | Only for analysis of cause of death |
dodsaasg does not have the date of death in the correct format and is not the authoritative source for individual death dates.
Check dodsaars coverage in your project guide. dodsaars does not necessarily cover your entire study period β in project 708421 it covers only ~1970β2001 (as of June 2026), and post-2001 deaths require a separate extraction. Other projects may have different coverage.
# CORRECT β replace "path/to/dodsaars/" with your project's parquet path
# DARTER: load_database("dodsaars") %>% rename_with(tolower)
death <- open_dataset("path/to/dodsaars/") %>%
rename_with(tolower) # check coverage in your project guide
death_person <- death %>%
filter(pnr %in% !!cohort_pnrs) %>%
select(pnr, death_date = d_dodsdto) %>% # d_dodsdto is the confirmed column
collect()
# WRONG β do not use dodsaasg for censoring dates3. rename_with(tolower) must be called on each register
Raw column names vary by register and year: PNR, pnr, Pnr, V_CPR. If you forget it, filter(pnr %in% ...) silently fails with βColumn pnr not foundβ β even though the column is there.
The rule: every open_dataset() or load_database() call ends with %>% rename_with(tolower) as the first step in your pipe. See Extracting data step by step for explanation and example.
4. Date columns are not always in Date format
DST registers store dates in multiple formats β they look the same but behave differently.
| Format | Example | What class() returns |
What to do |
|---|---|---|---|
| Date | 2020-05-15 |
"Date" |
Nothing β can be used directly |
| Character | "2020-05-15" |
"character" |
as.Date(column) |
| Datetime | "2020-05-15 14:32:00" |
"POSIXct" |
as.Date(column) to get only the date part |
| SAS integer | 21990 |
"numeric" |
as.Date(column, origin = "1960-01-01") |
The rule: always check class() on a date column before using it in calculations.
class(lpr_a_kontakt$kont_starttidspunkt) # "POSIXct" β datetime, not Date
# Fix:
mutate(date = as.Date(kont_starttidspunkt))
class(bef$foed_dag) # "Date" β can be used directly5. BEF is a status snapshot β not a live register
BEF is a status register: it records the composition of the population at a given reference time β not continuously. DSTβs reference time is ultimo (typically 31 December for an annual snapshot). Since 2008, BEF is also delivered quarterly (March, June, September, December).
**aar == 2020 = 1 January 2020" is a project convention.** In many projects BEF snapshots are renamed soaar == 2020` conventionally refers to the population composition as of 1 January 2020 β but this does not follow from DSTβs delivery naming. Confirm the convention in your project guide.
See DSTβs official BEF documentation: statistikdokumentation/befolkningen β
This means that a person who dies in June 2020 still appears in the 2020 BEF snapshot.
# ERROR: do not use BEF to check "alive on a specific date"
bef_2020 <- bef %>%
filter(aar == 2020) # includes everyone in the 2020 snapshot
# β including those who die during 2020
# CORRECT: combine with dodsaars to exclude deaths
deaths <- open_dataset("path/to/dodsaars/") %>% # DARTER: load_database("dodsaars")
rename_with(tolower) %>%
filter(pnr %in% !!cohort_pnrs) %>%
select(pnr, d_dodsdto) %>%
collect()
bef_alive <- bef_data %>%
left_join(deaths, by = "pnr") %>%
filter(is.na(d_dodsdto) | d_dodsdto > index_date) # alive at index date6. The βaβ in lpr_a_diagnose does not mean A-type diagnoses
The table is called lpr_a_diagnose β the βaβ refers to βanalysis modelβ (the LPR_A series introduced in 2025). It does not mean the table only contains A-type (action) diagnoses.
The table contains all diagnosis types: A (action), B (secondary diagnosis) and G (underlying condition). You still need to filter on diag_kode_type:
lpr_a_diagnose %>%
filter(diag_kode_type %in% c("A", "B")) %>% # still necessary
...7. Categorical codes are not consistent across registers
The same variable can have different coding in different registers β different type (numeric vs. character), different values, or both.
In practice you extract demographic variables (sex, age) from BEF and rarely need to compare the same variable in another register. But if you do, always check with table() and class() before using the variable:
table(register_a$koen) # what are the actual values and types?
class(register_a$koen)
table(register_b$koen)
class(register_b$koen)8. !! (bang-bang) forgotten in lazy evaluation
When filtering with a local R vector inside a DuckDB query, you must use !!. Without it, DuckDB looks for a column with that name β and fails silently or with a confusing message.
my_pnr_list <- c("001", "002", "003") # local R vector
# WRONG β DuckDB looks for a column called "my_pnr_list"
bef %>% filter(pnr %in% my_pnr_list) # error or wrong result
# CORRECT β !! tells DuckDB: "use the local R vector"
bef %>% filter(pnr %in% !!my_pnr_list)!! is necessary for all local R objects used inside filter(), mutate() etc. on lazy DuckDB connections. See Functions guide for full explanation.
9. nmi_count β nmi_score
These two variables are not the same and are not interchangeable:
| Variable | What it is | Source |
|---|---|---|
nmi_score |
Weighted comorbidity score β Nordic Multimorbidity Index (Kristensen et al., Clin Epidemiol 2022). 50 predictors with individual weights; lung cancer counts e.g. 19 points, type 2 diabetes counts 2. | See NMI page |
nmi_count |
Simple count of the number of chronic conditions (out of 33 possible) a person has been diagnosed with | Calculated separately |
If you use nmi_count in your regression model instead of nmi_score, you are adjusting for something different than you think β and you get no error message.
10. The most common error messages and what they mean
Rβs error messages are short and technical β here are the ones you most often encounter in a DST workflow, translated into what they actually mean:
| Error message | Typical cause | Solution |
|---|---|---|
Error: Column 'pnr' not found |
rename_with(tolower) is missing |
Add %>% rename_with(tolower) immediately after load_database() β see pitfall 3 |
Error: object 'my_list' not found |
!! missing in filter() on a lazy connection |
Write filter(pnr %in% !!my_list) β see pitfall 8 |
Error: could not find function "load_database" |
library(dstDataPrep) missing (DARTER only) |
Add library(dstDataPrep) at the top of the script β only relevant on DARTER/project 708421 |
non-numeric argument to binary operator |
Date column is character, not Date |
mutate(date = as.Date(date)) β see pitfall 4 |
Error in filter.default(...) |
Filtering on a lazy object without %>% |
Switch to %>% β see the pipe |
Error: Can't convert ... to ... |
Join on columns of different type (e.g. numeric vs. character) | Use mutate(pnr = as.character(pnr)) to match types |
object of type 'closure' is not subsettable |
A variable name overwrites a function (e.g. data <- ...) |
Use a unique variable name β avoid data, df, c as object names |
The fastest debugging flow β what to do step by step when you see a red error message β is described in Phase 7 β Seeing a red error message?.