Format tables

Translate codes to text with DST’s SAS format files

Published

June 6, 2026

Note

Prerequisite: This page uses left_join() to attach format tables to your data. If you are not familiar with joins yet, read Phase 11 — Joins and pivots first.

Many variables in DST registers are stored as codes — municipality numbers, education codes, employment codes. To give them meaningful labels you use DST’s format tables: SAS files on the server that translate codes to text.

A format table always has at least two columns:

Column Contents Example
START The raw code from your register 101 (municipality number)
Label column The corresponding text label "København"

You join the format table onto your dataset with START as the key — exactly like a dictionary lookup.

Note

Rows in your dataset without a match in the format table get NA in the label column — no error message. Always check with sum(is.na(data$label_column)) after the join.


Where are the format tables?

E:/Formater/SAS formater i Danmarks Statistik/SAS_datasaet/

The Star (*) on the Windows desktop inside the DST server opens an HTML guide for finding the right table. PDF guides to SAS formats are also here:

E:/Formater/SAS formater i Danmarks Statistik/Vejledning mv/

Subfolders are organised by topic — including:

Geokoder/                ← municipalities, regions
Times_personstatistik/   ← employment (socio13)
Disced/                  ← education (hfaudd)
Brancher/                ← industries
Sundhed/                 ← health
Uddannelser/             ← education
... and more
Warning

Find the specific filenames in the folder yourself. The base path and subfolder structure above are confirmed on DST. But the specific filenames and column names further down the page (e.g. c_kom_v4_t.sas7bdat, KOM_V4_T) are illustrative examples — naming varies, and you must find the right file in the relevant subfolder yourself. Use the Star guide / PDF guide to find the file, and names()/head() to see the actual column names.


Load a format table

Format tables are SAS files and are loaded with haven::read_sas():

library(haven)   # read_sas — loading SAS files

mun_table <- read_sas(   # fetch format table as data frame
  "E:/Formater/SAS formater i Danmarks Statistik/SAS_datasaet/Geokoder/c_kom_v4_t.sas7bdat"
)
Note

Format tables are loaded directly into R’s memory (not lazy evaluation). You must collect() DuckDB data before joining with them.


The three tables you will use most often

Municipality names

mun_table <- read_sas(
  "E:/Formater/SAS formater i Danmarks Statistik/SAS_datasaet/Geokoder/c_kom_v4_t.sas7bdat"
)
# Columns: START (municipality number), KOM_V4_T (municipality name)
# Use for years after the 2007 municipal reform (V4)

bef_with_mun <- bef_data %>%
  collect() %>%                                             # collect BEFORE joining with SAS file
  left_join(mun_table, by = c("bopkom" = "START"))          # attach municipality name via bopkom code

Employment status (socio13)

To get the raw code labels from DST’s format table:

socio13_table <- read_sas(   # fetch label table from format folder
  "E:/Formater/SAS formater i Danmarks Statistik/SAS_datasaet/Times_personstatistik/n_socio13_kt.sas7bdat"
)
# Columns: START (socio13 code), N_SOCIO13_KT (label)

akm_with_labels <- akm_data %>%
  left_join(socio13_table, by = c("socio13" = "START"))   # attach label per code

To use the SEPLINE categorisation directly instead (recommended for analysis):

library(dplyr)   # mutate, case_when

akm_categorised <- akm_data %>%
  mutate(occupation_cat = case_when(             # SEPLINE categories
    socio13 %in% c(110:114, 120, 131:135, 139) ~ "Employed",
    socio13 == 310                              ~ "Student",
    socio13 %in% c(210, 410)                   ~ "Unemployed",
    socio13 %in% c(220, 321, 330)              ~ "Outside labour market",
    socio13 %in% c(322, 323)                   ~ "Retired",
    TRUE                                        ~ "Unknown"   # unknown code or NA
  ))

Education level (hfaudd)

# Duration of education in months (standard duration + entry requirements):
udda_table <- read_sas(
  "E:/Formater/SAS formater i Danmarks Statistik/SAS_datasaet/Disced/n_audd_pria_l1lx_k.sas7bdat"
)
# Columns: START (hfaudd code), AUDD_PRIA_L1LX_K (duration in months)

# Categorisation per SEPLINE guidelines:
udda_categorised <- udda_data %>%
  left_join(udda_table, by = c("hfaudd" = "START")) %>%
  mutate(education_cat = case_when(
    substr(as.character(hfaudd), 1, 2) %in% c("10", "15") ~ "Short",
    substr(as.character(hfaudd), 1, 2) %in% c("20", "30", "35") ~ "Medium",
    as.numeric(substr(as.character(hfaudd), 1, 2)) >= 40  ~ "Long",
    TRUE ~ "Unknown"
  ))

Tips for finding the right table

  1. Open the Star on the desktop or navigate to E:/Formater/... in File Explorer
  2. Look at the suffix:
    • _T = text label only
    • _K = code only
    • _KT = both code and text
  3. Load the table and look at the columns with names() and head()
names(mun_table)   # see what the columns are called — find START and the label column
head(mun_table)    # see the first rows and understand the structure

See also

Back to top