Skip to contents

Introduction

This article describes creating derived datasets from raw data (eCRFs) for the ADNI study. The purpose of generating such standardized dataset is to create analysis ready dataset using the PHARMERVERSE workflow, which required standardized dataset as input. In ADNIMERGE2 R data package, the following derived datasets will be created for illustration purpose, and more information about of each dataset is presented in the corresponding subsections.

  • Demographic (DM)
  • Subject Characteristics (SC)
  • Adverse Event (AE)
  • Questionnaires (QS)
  • Clinical Classification (RS)
  • Nervous System Finding (NV)
  • Laboratory Test Result (LB)
  • Genomics Findings (GF)
  • Vital Sign (VS)

NOTE:

  • ORIGPROT variable is included in the DM dataset to identify the first study protocol/phase of subjects’ enrollment in the ADNI study.

  • COLPROT or —GRPID variables are also included in some of the derived dataset to identify the data collection study protocol/phase in the ADNI study.

  • VISITNUM variable is included only for creating study epoch across the study phases and mainly used for data merging purpose and sorting. It is not recommended to use VISITNUM variable in analysis related to theses derived dataset. Learn more about how the study visit number VISITNUM is created in Data Preparation section below.

  • These derived dataset may not be fully complied with the CDISC-SDTM standardization.

Load Required R Packages

# ADNI study R data package
library(ADNIMERGE2)

Data Preparation

In the study, subjects are categorized into two groups:

  • New Subject: Subjects that did not participate/enroll in any of the previous ADNI study phases prior to a given study phase.

  • Rollover Subject: Subjects that participated/enrolled at least in one of the previous study phases prior to a given study phase.

Study Visit and Visit Number

Some data wrangling was performed prior creating the specified derived datasets. First, study epoch (EPOCH) with corresponding study visit number (VISITNUM) was created, and stored as EPOCH_LIST_LONG. Next, subject-specific study visit with corresponding study epoch was created based on the REGISTRY and ROSTER eCRFs along with the EPOCH_LIST_LONG data. Then, the result output named as ADNI_VISIT_RECORD. Here, rollover subjects may not have sequential VISITNUM due to the study design which allows rollovers to enroll in the next study phase before completing a given study phase. Thus, VISITNUM records can be used for data merging purpose and sorting with corresponding study visit date/form completion date.

For more information, please refer to the package vignettes vignette(topic = 'ADNIMERGE2-Derived-Data', package = 'ADNIMERGE2') source script.

Assessement Completion Date and Status

A study visit date from the REGISTRY table (eCRF) will be used if the assessment specific completion date is missing with known assessment result. Furthermore, any assessment with missing/unknown results will be considered as not completed/done.

Baseline Flag

A baseline flag is included in some of the derived dataset to identify records that are closest to subject enrollment date or collected at baseline visit. A record is flagged as baseline if one of the following criteria is met:

  • Record collected at baseline visits within 30 days from the enrollment date

  • Any record collected prior to the baseline visits and closest to the enrollment date if the baseline record value is missing or not collected.

The assessment baseline flag is created using the derive_blfl_adni function which is based the sdtm.oak::derive_blfl function with minor study-specific modification.

NOTE: However, there might be more than one baseline flags per assessment per subject in a given derived dataset. For future work, derive_blfl_adni function will be updated to account such cases as future.

# Load utils function from package system file
utils_file_list <- c(
  "derived-dataset-sdtmoak-utils.R",
  "derived-dataset-utils.R",
  "derived-labdata-utils.R"
)
utils_file_path <- system.file(
  utils_file_list,
  package = "ADNIMERGE2",
  mustWork = TRUE
)
load_utils_funs <- lapply(utils_file_path, source)
# Add study track (i.e., New or Rollover)
REGISTRY <- REGISTRY %>%
  mutate(PTTYPE = adni_study_track(COLPROT, ORIGPROT))

ROSTER <- ROSTER %>%
  mutate(PTTYPE = adni_study_track(COLPROT, ORIGPROT))

Building Derived Dataset

Demographic (DM)

DM dataset contains one records per subject when they were enrolled or screened in the ADNI study for the first time (i.e. as new-enrollee). The DM dataset will contains the following characteristics:

# Join columns
dm_join_var <- c("RID", "ORIGPROT")

# Demographic data columns
dm_cols <- c("PTGENDER", "PTRACCAT", "PTETHCAT", "PTDOB")
names(dm_cols) <- c("SEX", "RACE", "ETHNIC", "BRTHDTC")

DM <- tibble(RID = unique_rid) %>%
  mutate(ORIGPROT = original_study_protocol(RID = RID)) %>%
  # Add PTDEMOG record
  left_join(
    PTDEMOG %>%
      assert_non_missing(ORIGPROT, COLPROT) %>%
      filter(ORIGPROT == COLPROT) %>%
      group_by(RID) %>%
      filter(
        (any(!is.na(VISDATE)) & VISDATE == min(VISDATE, na.rm = TRUE)) |
          (all(is.na(VISDATE)) & row_number() == 1)
      ) %>%
      ungroup() %>%
      assert_uniq(RID) %>%
      select(RID, ORIGPROT, VISDATE, all_of(as.character(dm_cols))),
    by = dm_join_var
  ) %>%
  rename(all_of(dm_cols)) %>%
  generate_oak_id_vars_adni(raw_src = "PTDEMOG")
DM <- DM %>%
  # Add screening visit date
  left_join(
    get_adni_screen_date(
      .registry = REGISTRY,
      phase = "Overall",
      both = FALSE,
      multiple_screen_visit = FALSE
    ) %>%
      select(RID, ORIGPROT, SESTDTC = SCREENDATE) %>%
      assert(is_uniq, RID),
    by = dm_join_var
  ) %>%
  # Add enrollment/RFSTDTC date
  create_rfstdtc(.registry = REGISTRY) %>%
  # Compute age based on first screening date
  mutate(
    AGE = round(as.numeric(SESTDTC - my(BRTHDTC)) / 365.25, 1),
    AGEU = "Years",
    SUBJID = as.character(RID),
    ARMCD = NA_character_,
    ACTARM = NA_character_,
    ARM = NA_character_,
    ACTARMCD = NA_character_,
    ACTARM = NA_character_,
    ARMNRS = "Non-Interventional Study",
    COUNTRY = NA_character_,
    DMDTC = create_iso8601(as.character(VISDATE), .format = "y-m-d")
  )
# Add death flag
DM <- DM %>%
  left_join(
    get_death_flag(
      .studysum = STUDYSUM,
      .adverse = ADVERSE,
      .recadv = RECADV
    ) %>%
      verify(all(DTHFL == "Yes")) %>%
      select(RID, ORIGPROT, DTHFL, DTHDTC),
    by = dm_join_var
  )
# Add last known disposition date
DM <- DM %>%
  left_join(
    get_disposition_flag(
      .registry = REGISTRY,
      .studysum = STUDYSUM
    ) %>%
      mutate(COLPROT = factor(COLPROT, levels = adni_phase())) %>%
      group_by(RID) %>%
      arrange(COLPROT) %>%
      # Last known discontinuation/disposition date
      filter(row_number() == n()) %>%
      ungroup() %>%
      select(RID, ORIGPROT, SDSTATUS, RFPENDTC = SDDATE),
    by = dm_join_var
  ) %>%
  mutate(RFPENDTC = case_when(
    !is.na(DTHDTC) & is.na(RFPENDTC) ~ as.character(DTHDTC),
    TRUE ~ as.character(RFPENDTC)
  ))
DM <- DM %>%
  # Derive USUBJID and SITIED
  derive_usubjid(
    .data = .,
    .registry = REGISTRY,
    .roster = ROSTER,
    .ptdemog = PTDEMOG,
    varList = c("USUBJID", "SITEID")
  ) %>%
  derive_study_day_adni(
    sdtm_in = .,
    domain = "DM",
    dm_domain = .,
    refdt = "RFSTDTC"
  ) %>%
  assign_studyid_domain(
    studyid = "ADNI",
    domain = "DM"
  ) %>%
  assign_vars_label(
    .data = .,
    data_dict = dm_data_dic,
    .strict = TRUE
  ) %>%
  assert_non_missing(SITEID, USUBJID, SUBJID)

Subject Characteristics (SC)

SC dataset contains subject-related data that are not collected in DM domain and/or characteristics that are collected over time (i.e. across the ADNI phases: ADNI1, ADNIGO, ADNI2, ADNI3, and ADNI4). The SC dataset will contains the following characteristics collected from three source datasets: PTDEMOG, ADI and RURALITY.

# Full demographic records ----
sc_common_cols <- c("ORIGPROT", "COLPROT", "RID", "VISCODE", "VISDATE")

SC_PTDEMOG <- PTDEMOG %>%
  select(-all_of(
    c(
      "PTID", "VISCODE2", "ID", "SITEID", "USERDATE", "USERDATE2",
      "DD_CRF_VERSION_LABEL", "LANGUAGE_CODE", "HAS_QC_ERROR", "update_stamp"
    )
  )) %>%
  mutate(COLPROT = factor(COLPROT, levels = adni_phase())) %>%
  assert_non_missing(COLPROT) %>%
  group_by(RID, ORIGPROT, COLPROT) %>%
  # Since it is observational study
  arrange(COLPROT, VISDATE) %>%
  fill(-all_of(sc_common_cols), .direction = "down") %>%
  fill(-all_of(sc_common_cols), .direction = "up") %>%
  ungroup() %>%
  mutate(across(-all_of(sc_common_cols), as.character)) %>%
  mutate(VISCODE = factor(VISCODE, levels = c(unique(EPOCH_LIST$VISCODE)))) %>%
  assert_non_missing(VISCODE)

SC_DEMOG <- tibble(RID = unique_rid) %>%
  mutate(ORIGPROT = original_study_protocol(RID = RID)) %>%
  left_join(
    SC_PTDEMOG,
    by = dm_join_var
  ) %>%
  pivot_longer(
    cols = !all_of(sc_common_cols),
    names_to = "SCTESTCD",
    values_to = "SCORRES"
  ) %>%
  # Remove missing values
  drop_na(SCORRES) %>%
  mutate(
    SCCAT = "Demographic Records",
    SCDTC = as.character(VISDATE)
  ) %>%
  # Required a long format dataset
  generate_oak_id_vars_adni(raw_src = "PTDEMOG")

# Area Deprivation Index (ADI) for ADNI4 phase only ----
SC_ADI <- ADI %>%
  verify(all(COLPROT == adni_phase()[5])) %>%
  select(
    RID, ORIGPROT, COLPROT, VISCODE, ADISTATE, ADINATIONAL, ADIREV,
    SCDTC = ADIDATE
  ) %>%
  generate_oak_id_vars_adni(raw_src = "ADI") %>%
  mutate(across(c(ADISTATE, ADINATIONAL, ADIREV, SCDTC), as.character)) %>%
  pivot_longer(
    cols = c(ADISTATE, ADINATIONAL, ADIREV),
    names_to = "SCTESTCD",
    values_to = "SCORRES"
  ) %>%
  mutate(SCCAT = "Area Deprivation Index")

# RUCA and RUCC from ADNI4 phase only ----
rurality_cols <- c("RUCA", "RUCC", "RUCA_2010", "RUCC_2023")
SC_RURALITY <- RURALITY %>%
  verify(all(COLPROT == adni_phase()[5])) %>%
  select(
    RID, ORIGPROT, COLPROT, VISCODE, all_of(rurality_cols),
    SCDTC = RURDATE
  ) %>%
  generate_oak_id_vars_adni(raw_src = "RURALITY") %>%
  mutate(across(all_of(c(rurality_cols, "SCDTC")), as.character)) %>%
  pivot_longer(
    cols = all_of(rurality_cols),
    names_to = "SCTESTCD",
    values_to = "SCORRES"
  ) %>%
  mutate(SCCAT = "Rurality")
SC <- bind_rows(SC_DEMOG, SC_ADI, SC_RURALITY) %>%
  assert_uniq(RID, ORIGPROT, COLPROT, VISCODE, SCTESTCD) %>%
  mutate(
    COLPROT = factor(COLPROT, levels = adni_phase()),
    SCGRPID = COLPROT,
    SCSTAT = case_when(is.na(SCDTC) ~ "NOT DONE"),
    SCSTRESN = as.numeric(SCORRES),
    SCDTC = create_iso8601(SCDTC, .format = "y-m-d")
  ) %>%
  assert_uniq(RID, ORIGPROT, COLPROT, VISCODE, SCTESTCD) %>%
  derive_usubjid(varList = "USUBJID") %>%
  assign_studyid_domain(
    .data = .,
    studyid = "ADNI",
    domain = "SC"
  ) %>%
  assign_visit_attr(
    .data = .,
    visit_record_data = ADNI_VISIT_RECORD,
    domain = "SC",
    check_missing = TRUE
  ) %>%
  assign_epoch(
    .data = .,
    .epoch = EPOCH_LIST_LONG
  ) %>%
  derive_blfl_adni(
    sdtm_in = .,
    dm_domain = DM,
    tgt_var = "SCBLFL"
  ) %>%
  derive_study_day_adni(
    dm_domain = DM,
    domain = "SC",
    refdt = "RFSTDTC"
  ) %>%
  derive_seq(
    tgt_var = "SCSEQ",
    rec_vars = c("USUBJID", "SCTESTCD", "SCGRPID")
  )
SC <- SC %>%
  # Add characteristics test name and check coded value
  set_dom_test(
    .data = .,
    .data_list = SC_TESTCD_LIST %>%
      select(SCTESTCD, SCTEST),
    merge_by = "SCTESTCD"
  ) %>%
  assert_non_missing(SCTESTCD, SCTEST) %>%
  assign_vars_label(data_dict = sc_data_dic)

Adverse Events (AE)

AE dataset contains one records per adverse event per subject. The AE dataset will includes the following characteristics for subjects that have at least one adverse events experience during the study.

NOTE: Currently, only includes records that are collected in ADNI3 and ADNI4 study phases.

# Adverse Event for ADNI3 and ADNI4
AE_ADNI34 <- ADVERSE %>%
  select(
    ID, RID, ORIGPROT, COLPROT, VISCODE, SITEID, AENUMBER, AEOUTCOME,
    AEHONSDT, AEHCSDT, AEHDTHDT, AERELAD, AERELCM, AERELFLRBTBN, AERELFLRBPR,
    AEHIMG, AERELTAU, AERELNAV, AERELMK, AERELPI, AEHLUMB, AERELCOVID, AERELPAN,
    AERELATESP, AESERIOUS, AESERDATE, SAELIFE, SAEHOSPIT, SAEPROLONG, SAEDEATH,
    SAECONGEN, SAEDISAB, SAEOTHER, AEHCMEDS,
    AESEV0 = AEHSEVR, all_of(paste0("AESEV", 1:10))
  ) %>%
  generate_oak_id_vars_adni(raw_src = "ADVERSE") %>%
  assert_non_missing(AENUMBER) %>%
  # Filter the worst severity level per RID, AENUMBER, COLPROT
  mutate(across(all_of(paste0("AESEV", 0:10)), as.character)) %>%
  pivot_longer(
    cols = all_of(paste0("AESEV", 0:10)),
    names_to = "SEVERITY_COL",
    values_to = "AESEV"
  ) %>%
  mutate(AESEV_NUM = case_when(
    AESEV == "Mild" ~ 1,
    AESEV == "Moderate" ~ 2,
    AESEV == "Severe" ~ 3
  )) %>%
  group_by(RID, ORIGPROT, COLPROT, SITEID, AENUMBER, VISCODE) %>%
  filter(
    (all(is.na(AESEV)) & row_number() == 1) |
      (any(!is.na(AESEV)) & AESEV_NUM == max(AESEV_NUM, na.rm = TRUE))
  ) %>%
  filter(
    (n() > 1 & row_number() == n()) |
      (n() == 1 & row_number() == 1)
  ) %>%
  ungroup() %>%
  verify(nrow(.) == nrow(ADVERSE)) %>%
  assert_uniq(RID, ORIGPROT, COLPROT, AENUMBER, VISCODE) %>%
  rename(
    "AESER" = AESERIOUS, "AEOUT" = AEOUTCOME,
    "AESCONG" = SAECONGEN, "AESDISAB" = SAEDISAB, "AESDTH" = SAEDEATH,
    "AESLIFE" = SAELIFE, "AESMIE" = SAEOTHER, "AECONTRT" = AEHCMEDS,
    "AESTDTC" = AEHONSDT, "AEENDTC" = AEHCSDT
  ) %>%
  # Adverse events for `required or prolongs hospitalization`
  mutate(
    AESHOSP = case_when(
      SAEHOSPIT == "Yes" | SAEPROLONG == "Yes" ~ "Yes",
      SAEHOSPIT == "No" & SAEPROLONG == "No" ~ "No",
      SAEHOSPIT == "No" & is.na(SAEPROLONG) ~ "No",
      is.na(SAEHOSPIT) & SAEPROLONG == "No" ~ "No"
    )
  ) %>%
  select(-c(SAEHOSPIT, SAEPROLONG, AESEV_NUM, SEVERITY_COL))
# Required checking for missing AENUMBER in RECADV
AE_ADNI12GO <- tibble(PHASE = NA_character_) %>%
  na.omit()
AE <- AE_ADNI34 %>%
  bind_rows(AE_ADNI12GO) %>%
  assert_non_missing(RID) %>%
  mutate(
    AESTDTC = as.character(AESTDTC),
    AEENDTC = as.character(AEENDTC)
  ) %>%
  group_by(RID) %>%
  arrange(AESTDTC) %>%
  mutate(AESEQ = row_number()) %>%
  ungroup()

AE <- AE %>%
  derive_usubjid() %>%
  assign_studyid_domain(domain = "AE") %>%
  assign_visit_attr() %>%
  assign_epoch() %>%
  mutate(
    AEGRPID = COLPROT,
    AETERM = NA_character_,
    AELLT = NA_character_,
    AELLTCD = NA_character_,
    AEDECOD = NA_character_,
    AEPTCD = NA_character_,
    AEHLT = NA_character_,
    AEHLTCD = NA_character_,
    AEHLGT = NA_character_,
    AEHLGTCD = NA_character_,
    AESOC = NA_character_,
    AESOCCD = NA_character_
  ) %>%
  assign_vars_label(data_dict = ae_data_dic)

Questionnaires (QS)

QS dataset contains one record per parameter finding (i.e. total score) per visit per subject. The QS dataset will contains the following characteristics:

The following cognitive/functional assessment scores are included in the QS dataset.

qs_com_cols <- c(
  "RID", "ORIGPROT", "COLPROT", "VISCODE",
  "VISCODE2", "VISDATE", "SITEID"
)

# ADAS Cognitive Behavior Total Score ----
## Completed variable ??
ADAS_SCORE_DATA <- ADAS %>%
  select(all_of(qs_com_cols), ADASTT11 = TOTSCORE, ADASTT13 = TOTAL13) %>%
  generate_oak_id_vars_adni(raw_src = "ADAS") %>%
  pivot_longer(
    cols = c(ADASTT11, ADASTT13),
    names_to = "QSTESTCD",
    values_to = "QSSTRESC"
  ) %>%
  mutate(
    QSSTRESC = as.character(QSSTRESC),
    QSDRVFL = "Yes",
    QSSTAT = ifelse(is.na(QSSTRESC), "NOT DONE", NA_character_)
  )

# Clinical Dementia Rating Score ----
## Completion columns??
CDR_SCORE_DATA <- CDR %>%
  select(all_of(qs_com_cols), CDGLOBAL, CDRSB) %>%
  generate_oak_id_vars_adni(raw_src = "CDR") %>%
  pivot_longer(
    cols = c(CDGLOBAL, CDRSB),
    names_to = "QSTESTCD",
    values_to = "QSSTRESC"
  ) %>%
  mutate(
    QSSTRESC = as.character(QSSTRESC),
    QSSTAT = ifelse(is.na(QSSTRESC), "NOT DONE", NA_character_)
  )

# Everyday Cognition Total Score ----
## Completion variable
ECOG_SCORE_DATA <- ECOGPT %>%
  mutate(QSTESTCD = "ECOGPTTT") %>%
  select(all_of(qs_com_cols), QSTESTCD, QSSTRESC = EcogPtTotal) %>%
  generate_oak_id_vars_adni(raw_src = "ECOGPT") %>%
  bind_rows(
    ECOGSP %>%
      mutate(QSTESTCD = "ECOGSPTT") %>%
      select(all_of(qs_com_cols), QSTESTCD, QSSTRESC = EcogSPTotal) %>%
      generate_oak_id_vars_adni(raw_src = "ECOGSP")
  ) %>%
  mutate(
    QSSTRESC = as.character(QSSTRESC),
    QSDRVFL = "Yes",
    QSSTAT = ifelse(is.na(QSSTRESC), "NOT DONE", NA_character_)
  )

# Financial Capacity Instrument Short Form - Score ----
FCI_SCORE_DATA <- FCI %>%
  select(all_of(qs_com_cols),
    QSSTRESC = FCISCORE,
    QSSTAT = DONE, QSREASND = NDREASON
  ) %>%
  mutate(
    QSSTRESC = as.character(QSSTRESC),
    QSTESTCD = "FCISCORE"
  ) %>%
  generate_oak_id_vars_adni(raw_src = "FCI")

# Functional Assessments Questionnaires - Score ----
# Completion status ??
FAQ_SCORE_DATA <- FAQ %>%
  select(all_of(qs_com_cols), QSSTRESC = FAQTOTAL) %>%
  mutate(
    QSSTRESC = as.character(QSSTRESC),
    QSTESTCD = "FAQTOTAL",
  ) %>%
  generate_oak_id_vars_adni(raw_src = "FAQ")

# Geriatric Depression Scale ----
GDS_SCORE_DATA <- GDSCALE %>%
  select(all_of(qs_com_cols), QSSTRESC = GDTOTAL, QSREASND = GDUNABL) %>%
  mutate(
    QSSTRESC = as.character(QSSTRESC),
    QSTESTCD = "GDTOTAL",
    QSSTAT = ifelse(!is.na(QSREASND), "NOT DONE", NA_character_)
  ) %>%
  generate_oak_id_vars_adni(raw_src = "GDSCALE")

# Mini Mental State Exam Score ----
MMSE_SCORE_DATA <- MMSE %>%
  select(all_of(qs_com_cols),
    QSSTRESC = MMSCORE,
    QSSTAT = DONE, QSREASND = NDREASON
  ) %>%
  mutate(
    QSSTRESC = as.character(QSSTRESC),
    QSTESTCD = "MMSCORE",
    QSDRVFL = "Yes"
  ) %>%
  generate_oak_id_vars_adni(raw_src = "MMSE")

# Montreal Cognitive Assessments ----
# Completion status ??
MOCA_SCORE_DATA <- MOCA %>%
  select(all_of(qs_com_cols), QSSTRESC = MOCA) %>%
  mutate(
    QSSTRESC = as.character(QSSTRESC),
    QSTESTCD = "MOCA",
    QSDRVFL = "Yes"
  ) %>%
  generate_oak_id_vars_adni(raw_src = "MOCA")

# Neuropsychiatric Inventory ----
# Completion status??
NPI_SCORE_DATA <- NPI %>%
  rename("VISDATE" = EXAMDATE) %>%
  select(all_of(qs_com_cols), QSSTRESC = NPITOTAL) %>%
  mutate(
    QSSTRESC = as.character(QSSTRESC),
    QSTESTCD = "NPITOTAL",
    QSDRVFL = "Yes",
    QSSTAT = ifelse(is.na(QSSTRESC), "NOT DONE", NA_character_)
  ) %>%
  generate_oak_id_vars_adni(raw_src = "NPI")

# Neuropsychiatric Inventory Q ----
# Completion status??
NPIQ_SCORE_DATA <- NPIQ %>%
  select(all_of(qs_com_cols), QSSTRESC = NPISCORE) %>%
  mutate(
    QSSTRESC = as.character(QSSTRESC),
    QSTESTCD = "NPIQTOTL",
    QSDRVFL = "Yes",
    QSSTAT = ifelse(is.na(QSSTRESC), "NOT DONE", NA_character_)
  ) %>%
  generate_oak_id_vars_adni(raw_src = "NPIQ")

# Logical Memory - Immediate/Delayed Recall ----
neurobat_cols <- c(
  "LIMMTOTL", "LDELTOTL", "DIGITSCR", "TRABSCOR",
  "RAVLTIMM", "RAVLTLRN", "RAVLTFG", "RAVLTFGP"
)
NEUROBAT_SCORE_DATA <- NEUROBAT %>%
  compute_neurobat_subscore(.neurobat = .) %>%
  select(all_of(c(qs_com_cols, neurobat_cols))) %>%
  generate_oak_id_vars_adni(raw_src = "NEUROBAT") %>%
  pivot_longer(
    cols = all_of(neurobat_cols),
    names_to = "QSTESTCD",
    values_to = "QSSTRESC"
  ) %>%
  mutate(
    QSSTRESC = as.character(QSSTRESC),
    QSDRVFL = case_when(
      QSTESTCD %in% c(
        "RAVLTIMM", "RAVLTRN", "RAVLTFG", "RAVLTFGP"
      ) ~ "Yes"
    ),
    QSSTAT = ifelse(is.na(QSSTRESC), "NOT DONE", NA_character_)
  )
# Score data names
score_data_names <- ls()[str_detect(ls(), "SCORE\\_DATA")]
QS <- mget(score_data_names) %>%
  bind_rows() %>%
  assert_non_missing(RID, COLPROT, VISCODE, QSTESTCD, VISCODE) %>%
  mutate(
    QSGRPID = COLPROT,
    QSDTC = create_iso8601(as.character(VISDATE), .format = "y-m-d"),
    QSSTRESN = as.numeric(QSSTRESC),
    QSORRES = as.character(QSSTRESN),
    QSSTAT = case_when(
      QSSTAT %in% c("No", "NOT DONE") ~ "NOT DONE",
      TRUE ~ NA_character_
    )
  ) %>%
  set_dom_test(
    .data_list = QS_TESTCD_LIST %>%
      select(QSTESTCD, QSTEST, QSCAT),
    merge_by = "QSTESTCD"
  ) %>%
  assert_uniq(RID, COLPROT, VISCODE, QSTESTCD) %>%
  derive_usubjid() %>%
  assign_studyid_domain(domain = "QS") %>%
  assign_visit_attr() %>%
  assign_epoch() %>%
  derive_blfl_adni(
    dm_domain = DM,
    tgt_var = "QSBLFL"
  ) %>%
  derive_study_day_adni(
    dm_domain = DM,
    domain = "QS"
  ) %>%
  derive_seq(
    tgt_var = "QSSEQ",
    rec_vars = c("USUBJID", "QSTESTCD", "COLPROT")
  ) %>%
  # # Update QSORRES for derived scores
  # mutate(QSORRES = case_when(is.na(QSDRVFL) ~ QSSTRESC)) %>%
  assign_vars_label(data_dict = qs_data_dic)

Clinical Classification (RS)

RS dataset will contains one record per instrument status per visit per subject. The RS dataset will contains the following variables:

The RS dataset will contains clinical diagnostics summary of subject per study visit. The clinical diagnostics status (i.e. either Cognitive Normal (CN), Mild Cognitive Impairment (MCI) or Dementia/Alzheimer’s (DEM/AD)) of a subject was determined by clinicians’ judgment.

# Clinical diagnostics status
RS <- DXSUM %>%
  generate_oak_id_vars_adni(raw_src = "DXSUM") %>%
  mutate(
    DIAGNOSIS = case_when(
      DIAGNOSIS %in% "Dementia" ~ "DEM",
      TRUE ~ as.character(DIAGNOSIS)
    ),
    RSTESTCD = "DX",
    RSORRES = as.character(DIAGNOSIS),
    RSSTRESC = as.character(DIAGNOSIS),
    RSEVAL = SITEID,
    RSDTC = create_iso8601(as.character(EXAMDATE), .format = "y-m-d"),
    RSGRPID = COLPROT,
    RSSTAT = NA_character_
  ) %>%
  assert_non_missing(COLPROT)
RS <- RS %>%
  set_dom_test(
    .data_list = RS_TESTCD_LIST %>%
      select(RSCAT, RSTEST, RSTESTCD),
    merge_by = "RSTESTCD"
  ) %>%
  derive_usubjid() %>%
  assign_studyid_domain(domain = "RS") %>%
  assign_visit_attr() %>%
  assign_epoch() %>%
  derive_blfl_adni(
    dm_domain = DM,
    tgt_var = "RSBLFL"
  ) %>%
  derive_study_day_adni(
    dm_domain = DM,
    domain = "RS"
  ) %>%
  derive_seq(
    tgt_var = "RSSEQ",
    rec_vars = c("USUBJID", "RSTESTCD", "COLPROT")
  ) %>%
  assign_vars_label(data_dict = rs_data_dic)

Nervous System Finding (NV)

NV dataset contains one record of physiological and morphological finding related to the nervous system (including brain) per visit per subject. The NV dataset will contains the following variables:

# FDG-PET Analysis Results Data ----
FDG_PET_DATA <- UCBERKELEYFDG_8mm %>%
  # Based on method manual document (on the data-shared platform)
  select(ORIGPROT, RID, VISCODE, VISCODE2, EXAMDATE, ROINAME, MEAN) %>%
  # Remove missing visit code
  filter(!is.na(VISCODE)) %>%
  assert_uniq(RID, VISCODE, VISCODE2, ROINAME) %>%
  verify(all(ORIGPROT %in% adni_phase()[-5])) %>%
  pivot_wider(
    id_cols = everything(),
    names_from = "ROINAME",
    values_from = "MEAN"
  ) %>%
  mutate(FDGMROI = MetaROI / Top50PonsVermis) %>%
  select(-MetaROI, -Top50PonsVermis) %>%
  check_duplicate_records(col_names = c("RID", "EXAMDATE", "VISCODE")) %>%
  mutate(
    NVMETHOD = "FDG PET",
    NVTESTCD = "FDGMROI",
    NVSTRESC = as.character(FDGMROI),
    NVSTRESN = FDGMROI,
    NVDRVFL = "Yes",
    NVDTC = as.character(EXAMDATE)
  ) %>%
  generate_oak_id_vars_adni(raw_src = "UCBERKELEYFDG_8mm")

# Mapping phase-specific visit code based opn registry
FDG_PET_DATA <- FDG_PET_DATA %>%
  # Trying to map visit code from registry
  use_dtplyr() %>%
  left_join(
    REGISTRY %>%
      mutate(EXAMDATE = as.character(EXAMDATE)) %>%
      select(RID, ORIGPROT, COLPROT, VISCODE, VISCODE2,
        REGISTRY.EXAMDATE = EXAMDATE
      ) %>%
      filter(!is.na(REGISTRY.EXAMDATE)) %>%
      filter(RID %in% unique(FDG_PET_DATA$RID)) %>%
      distinct() %>%
      check_duplicate_records(
        col_names = c("RID", "ORIGPROT", "VISCODE", "REGISTRY.EXAMDATE")
      ),
    by = c("RID", "ORIGPROT", "VISCODE", "VISCODE2")
  ) %>%
  as_tibble() %>%
  mutate(COLPROT = case_when(
    is.na(COLPROT) & VISCODE == VISCODE2 ~ ORIGPROT,
    is.na(COLPROT) & VISCODE != VISCODE2 & VISCODE2 %in% "bl" ~ ORIGPROT,
    TRUE ~ COLPROT
  )) %>%
  verify(nrow(.) == nrow(FDG_PET_DATA)) %>%
  assert_non_missing(COLPROT) %>%
  assert_uniq(RID, COLPROT, VISCODE, NVTESTCD)
# PIB PET Analysis Results Data -----
# Only for ADNI1 phase
PIB_PET_DATA <- PIBPETSUVR %>%
  verify(all(ORIGPROT == adni_phase()[1])) %>%
  # Based on previously generated dataset
  mutate(PIB = rowMeans(across(c("ACG", "FRC", "PAR", "PRC")), na.rm = FALSE)) %>%
  select(RID, ORIGPROT, VISCODE, EXAMDATE, NVSTRESN = PIB, LONIUID) %>%
  check_duplicate_records(col_names = c("RID", "EXAMDATE")) %>%
  mutate(
    NVMETHOD = "PET SCAN",
    NVTESTCD = "PIB",
    NVSTRESU = NA_character_,
    NVSTRESC = as.character(NVSTRESN),
    NVDRVFL = "Yes",
    NVDTC = as.character(EXAMDATE),
    NVLNKID = as.character(LONIUID),
    COLPROT = adni_phase()[1]
  ) %>%
  assert_non_missing(VISCODE) %>%
  assert_uniq(RID, VISCODE, NVTESTCD) %>%
  generate_oak_id_vars_adni(raw_src = "PIBPETSUVR")
# Amyloid status ----
amystatus_lvls <- c("Non Elevated", "Elevated")

AMYREAD_DATA <- AMYREAD %>%
  verify(all(COLPROT == adni_phase()[5])) %>%
  # Based on a clinician decision
  mutate(AMYSTAT = case_when(
    str_detect(CONSENS, "No, visual read and quantification") ~ as.character(CONGRU),
    str_detect(CONSENS, "Yes, this scan should be reviewed") ~ as.character(CONSENSRES)
  )) %>%
  mutate(
    AMYSTAT = str_remove(AMYSTAT, " scan"),
    TRACERTYPE = as.character(TRACERTYPE)
  ) %>%
  verify(all(AMYSTAT %in% amystatus_lvls)) %>%
  generate_oak_id_vars_adni(raw_src = "AMYREAD") %>%
  pivot_longer(
    cols = AMYSTAT,
    names_to = "NVTESTCD",
    values_to = "NVORRES"
  ) %>%
  mutate(
    NVDTC = as.character(SCANDATE),
    NVSTRESC = as.character(NVORRES),
    NVMETHOD = "PET",
    NVDRVFL = "Yes"
  )
# Common cols in PET data
pet_data_common_cols <- c(
  "ORIGPROT", "LONIUID", "RID", "VISCODE", "SCANDATE", "PROCESSDATE",
  "IMAGE_RESOLUTION", "TRACER", "qc_flag"
)

# Amyloid PET Data ----
amypet_cols <- list(
  AMYSTAT = "AMYLOID_STATUS",
  AMYSTATC = "AMYLOID_STATUS_COMPOSITE_REF",
  SUVRSM = "SUMMARY_SUVR",
  SUVRCR = "COMPOSITE_REF_SUVR",
  CENTILOIDS = "CENTILOIDS"
)

AMYPET_DATA <- UCBERKELEY_AMY_6MM %>%
  mutate(across(
    all_of(as.character(amypet_cols[1:2])),
    ~ case_when(
      .x == 1 ~ amystatus_lvls[2],
      .x == 0 ~ amystatus_lvls[1]
    )
  )) %>%
  rename_with_list(., name_char = amypet_cols, by_name = TRUE) %>%
  select(all_of(c(pet_data_common_cols, names(amypet_cols)))) %>%
  mutate(across(all_of(names(amypet_cols)), as.character)) %>%
  generate_oak_id_vars_adni(raw_src = "UCBERKELEY_AMY_6MM") %>%
  pivot_longer(
    cols = all_of(names(amypet_cols)),
    names_to = "NVTESTCD",
    values_to = "NVSTRESC"
  )
# Tau PET Data -----
taupet_cols <- list(
  SUVRINFE = "INFERIORCEREBELLUM_SUVR",
  SUVRSC = "ERODED_SUBCORTICALWM_SUVR",
  SUVRMETA = "META_TEMPORAL_SUVR"
)
TAUPET_DATA <- UCBERKELEY_TAU_6MM %>%
  # filter(!is.na(VISCODE)) %>%
  rename_with_list(., name_char = taupet_cols, by_name = TRUE) %>%
  select(all_of(c(pet_data_common_cols, names(taupet_cols)))) %>%
  mutate(across(all_of(names(taupet_cols)), as.character)) %>%
  generate_oak_id_vars_adni(raw_src = "UCBERKELEY_TAU_6MM") %>%
  pivot_longer(
    cols = all_of(names(taupet_cols)),
    names_to = "NVTESTCD",
    values_to = "NVSTRESC"
  )

# Tau PET - PVC Data ----
taupet_pvc_cols <- list(
  SUVRINFE = "INFERIORCEREBELLUM_SUVR",
  SUVRCWM = "CEREBRAL_WHITE_MATTER_SUVR",
  SUVRMETA = "META_TEMPORAL_SUVR"
)
TAUPET_PVC_DATA <- UCBERKELEY_TAUPVC_6MM %>%
  mutate(
    IMAGE_RESOLUTION = "None",
    qc_flag = NA_real_
  ) %>%
  rename_with_list(., name_char = taupet_pvc_cols, by_name = TRUE) %>%
  select(all_of(c(pet_data_common_cols, names(taupet_pvc_cols)))) %>%
  mutate(across(all_of(names(taupet_pvc_cols)), as.character)) %>%
  generate_oak_id_vars_adni(raw_src = "UCBERKELEY_TAUPVC_6MM") %>%
  pivot_longer(
    cols = all_of(names(taupet_pvc_cols)),
    names_to = "NVTESTCD",
    values_to = "NVSTRESC"
  )
# PET dataset
PET_join_var <- paste0(c("ORIGPROT", "RID", "SCANDATE"), "_MPL")
names(PET_join_var) <- str_remove_all(PET_join_var, "\\_MPL")

PET_DATA <- bind_rows(AMYPET_DATA, TAUPET_DATA, TAUPET_PVC_DATA) %>%
  select(-VISCODE) %>%
  # Fuzzy join for actual study phase and visits
  left_fuzzy_join(
    data1 = .,
    data2 = IMAGING_MAPPING_LIST %>%
      select(ORIGPROT, COLPROT, RID, VISCODE, SCANDATE, SOURCE) %>%
      rename_with_list(., name_char = PET_join_var, by_name = FALSE),
    join_by = PET_join_var,
    check_cols = "COLPROT",
    main_cols = "SCANDATE",
    date_col = "SCANDATE"
  ) %>%
  mutate(
    LONIUID = as.character(LONIUID),
    NVMETHOD = "PET",
    NVDTC = as.character(SCANDATE),
    NVLNKID = as.character(LONIUID),
    NVSTRESN = as.numeric(NVSTRESC),
    NVSTAT = case_when(qc_flag %in% -2:0 ~ "NOT DONE"),
    NVREASND = case_when(
      qc_flag == -2 ~ "CANNOT BE PROCESSED",
      qc_flag == -1 ~ "NOT ASSESSED",
      qc_flag == 0 ~ "FAIL"
    )
  ) %>%
  assert_non_missing(COLPROT)
NV <- bind_rows(FDG_PET_DATA, PIB_PET_DATA, AMYREAD_DATA, PET_DATA) %>%
  mutate(
    NVGRPID = COLPROT,
    NVDTC = create_iso8601(NVDTC, .format = "y-m-d")
  ) %>%
  left_join(
    NV_TESTCD_LIST %>%
      select(NVCAT, NVSCAT, NVTESTCD, NVTEST, SOURCE),
    by = c("NVTESTCD" = "NVTESTCD", "raw_source" = "SOURCE"),
    relationship = "many-to-many"
  ) %>%
  assert_non_missing(NVTEST, NVSCAT, NVCAT) %>%
  derive_usubjid() %>%
  assign_studyid_domain(domain = "NV") %>%
  assign_visit_attr(check_missing = TRUE) %>%
  assign_epoch() %>%
  derive_blfl_adni(
    dm_domain = DM,
    tgt_var = "NVBLFL"
  ) %>%
  derive_study_day_adni(
    dm_domain = DM,
    domain = "NV"
  ) %>%
  derive_seq(
    tgt_var = "NVSEQ",
    rec_vars = c("USUBJID", "NVCAT", "NVSCAT", "NVTESTCD", "NVGRPID")
  ) %>%
  assign_vars_label(nv_data_dic)

Laboratory Test Results (LB)

LB dataset contains laboratory test data such as hematology, clinical chemistry and urinalysis per visit per subject. Additionally, the LB dataset contains blood plasma biomarkers results with the following variables:

The LB dataset contains the following blood plasma biomarker results in addition to the safety lab data.

common_cols <- c(
  "ORIGPROT", "COLPROT", "ID", "PTID", "RID", "SITEID", "VISCODE", "VISCODE2",
  "USERDATE", "USERDATE2", "RECNO", "ACCNO", "COVVIS", "EXAMDATE", "update_stamp"
)
lab_cols <- c(
  "ORIGPROT", "COLPROT", "RID", "SITEID",
  "VISCODE", "VISCODE2", "EXAMDATE"
)

# Lab data for ANDI1-GO-2 phases
ADNI1GO2_CLINICAL_LABDATA <- LABDATA %>%
  generate_oak_id_vars_adni(raw_src = "LABDATA") %>%
  mutate(across(-all_of(c(common_cols, oak_id_vars())), as.character)) %>%
  pivot_longer(
    cols = everything() & -all_of(c(common_cols, oak_id_vars())),
    names_to = "LBTESTCD",
    values_to = "LBORRES"
  ) %>%
  select(-all_of(common_cols[!common_cols %in% lab_cols])) %>%
  filter(LBORRES != -1) %>%
  mutate(LBDTC = create_iso8601(as.character(EXAMDATE), .format = "y-m-d")) %>%
  adjust_lab_visitcode()
# Lab data for ANDI3-4 phases
ADNI34_CLINICAL_LABDATA <- URMC_LABDATA %>%
  generate_oak_id_vars_adni(raw_src = "URMC_LABDATA") %>%
  mutate(
    LBNAM = "URMC",
    TestID = ifelse(TestName %in% "Sodium", "NA", TestID)
  ) %>%
  mutate(across(ends_with(c("Date", "Time")), as.character))

# Create LBDTC using `sdtm.aok::assign_datetime`
ADNI34_CLINICAL_LABDATA <- assign_datetime(
  tgt_dat = ADNI34_CLINICAL_LABDATA %>%
    select(-SampleDate, -SampleTime),
  raw_dat = ADNI34_CLINICAL_LABDATA,
  tgt_var = "LBDTC",
  raw_var = c("SampleDate", "SampleTime"),
  raw_fmt = c("y-m-d", "H:M:S")
)
ADNI34_CLINICAL_LABDATA <- ADNI34_CLINICAL_LABDATA %>%
  # Adjust for lab test that were considered as 'not completed/done'
  adjust_lab_status() %>%
  adjust_lab_visitcode() %>%
  mutate(
    LBGRPID = COLPROT,
    LBTESTCD = TestID,
    LBTEST = TestName,
    LBORRES = ResultValueConv_translated,
    LBORRESU = UnitsConv,
    LBORNRLO = LowerRangeConv,
    LBORNRHI = UpperRangeConv,
    LBSTRESC = ResultValueSI_translated,
    LBSTRESN = as.numeric(ResultValueSI_translated),
    LBSTRESU = UnitsSI,
    LBSTNRLO = LowerRangeSI,
    LBSTNRHI = UpperRangeSI,
    LBFAST = Fasting,
    LBSPCCND = Comments
  )
# C2N Blood Plasma Result ----
c2n_cols <- c(
  "pT217_C2N", "npT217_C2N", "AB42_C2N", "AB40_C2N", "AB42_AB40_C2N",
  "pT217_npT217_C2N", "APS2_C2N"
)
names(c2n_cols) <- c(
  "PT217", "NPT217", "AB42", "AB40",
  "AB42AB40", "PTNPT217", "APS2"
)

C2N_PLASMA_DATA <- C2N_PRECIVITYAD2_PLASMA %>%
  generate_oak_id_vars_adni(raw_src = "C2N_PRECIVITYAD2_PLASMA") %>%
  rename(c2n_cols) %>%
  select(
    all_of(c(oak_id_vars(), names(c2n_cols))),
    ORIGPROT, COLPROT, RID, VISCODE,
    EXAMDATE,
    LBANTREG = Primary, LBSPCCND = Comments
  ) %>%
  mutate(across(all_of(names(c2n_cols)), as.character)) %>%
  pivot_longer(
    cols = all_of(names(c2n_cols)),
    names_to = "LBTESTCD",
    values_to = "LBORRES"
  ) %>%
  mutate(
    LBSPEC = "PLASMA",
    LBDTC = create_iso8601(as.character(EXAMDATE), .format = "y-m-d")
  ) %>%
  left_join(
    get_biomarker_details(assay = "C2N"),
    by = "LBTESTCD"
  ) %>%
  assert_non_missing(LBTEST)
LB <- bind_rows(
  C2N_PLASMA_DATA, ADNI1GO2_CLINICAL_LABDATA,
  UPENNBIOMK_ROCHE_DATA, UPENNBIOMK_ALZBIO3_DATA,
  UPENN_PLASMA_FQ_DATA
) %>%
  mutate(
    LBGRPID = COLPROT,
    LBSTRESC = as.character(LBORRES),
    LBSTRESN = as.numeric(LBORRES)
  ) %>%
  bind_rows(ADNI34_CLINICAL_LABDATA) %>%
  assert_non_missing(LBGRPID) %>%
  # Required adjustment ???
  # set_dom_test(
  #    .data_list = LB_TESTCD_LIST %>%
  #     select(LBTESTCD, LBTEST, LBORRESU, LBSTRESU),
  #   merge_by = "LBTESTCD"
  # ) %>%
  derive_usubjid() %>%
  assign_studyid_domain(domain = "LB") %>%
  assign_visit_attr(check_missing = FALSE) %>%
  assign_epoch() %>%
  derive_blfl_adni(
    dm_domain = DM,
    tgt_var = "LBBLFL"
  ) %>%
  derive_study_day_adni(
    dm_domain = DM,
    domain = "LB"
  ) %>%
  derive_seq(
    tgt_var = "LBSEQ",
    rec_vars = c("USUBJID", "LBTESTCD", "LBGRPID")
  ) %>%
  assign_vars_label(data_dict = lb_data_dic, .strict = FALSE)

Genomics Findings (GF)

GF dataset contains data related to genomic material of interest. The GF dataset will contains subjects’ APOE genotype that collected once during the study period.

NOTE: There might be some instance with duplicated records of APOE genotype per subject.

GF <- APOERES %>%
  generate_oak_id_vars_adni(raw_src = "APOERES") %>%
  mutate(
    GFTESTCD = "APOE",
    GFTEST = "Apolipoprotein E",
    GFGRPID = COLPROT,
    GFSTDTL = "GENOTYPE",
    GFORRES = GENOTYPE,
    GFDTC = create_iso8601(as.character(APTESTDT), .format = "y-m-d"),
  ) %>%
  separate(GENOTYPE, into = c("ALLEL1", "ALLEL2"), sep = "/") %>%
  mutate(across(contains("ALLEL"), ~ paste0("ε", .x))) %>%
  unite("GFSTRESC", ALLEL1, ALLEL2, sep = "/") %>%
  # Required to adjust for collection date ??
  derive_usubjid() %>%
  assign_studyid_domain(domain = "GF") %>%
  assign_visit_attr(check_missing = FALSE) %>%
  derive_study_day_adni(
    dm_domain = DM,
    domain = "GF"
  ) %>%
  derive_seq(
    tgt_var = "GFSEQ",
    rec_vars = c("USUBJID", "GFTESTCD", "GFGRPID")
  ) %>%
  assign_vars_label(data_dict = gf_data_dic)

Vital Sign (VS)

VS dataset contains measurement finding of blood pressure, heart beat, respiratory rate, temperature, weight and height per visit per subject. The VS dataset will contains the following variables:

# Wide format
VS <- VITALS %>%
  mutate(VSDTC = create_iso8601(as.character(VISDATE), .format = "y-m-d")) %>%
  select(
    ORIGPROT, COLPROT, RID, VISCODE, VSDTC,
    VSWEIGHT, VSWTUNIT, VSHEIGHT, VSHTUNIT, VSBPSYS, VSBPDIA, VSPULSE,
    VSRESP, VSTEMP, VSTMPSRC, VSTMPUNT, VSHGTSC
  ) %>%
  generate_oak_id_vars_adni(raw_src = "VITALS") %>%
  mutate(across(c(VSWEIGHT, VSHEIGHT, VSTEMP), ~ ifelse(.x == -1, NA, .x))) %>%
  verify(all(VSWTUNIT %in% c("kilograms", "pounds") | is.na(VSWTUNIT))) %>%
  verify(all(VSHTUNIT %in% c("centimeters", "inches") | is.na(VSHTUNIT))) %>%
  verify(all(VSTMPUNT %in% c("Fahrenheit", "Celsius") | is.na(VSTMPUNT))) %>%
  mutate(
    VSWTUNIT_TRANSLATED = case_when(
      VSWTUNIT %in% "kilograms" ~ "kg",
      VSWTUNIT %in% "pounds" ~ "LB"
    ),
    VSHTUNIT_TRANSLATED = case_when(
      VSHTUNIT %in% "centimeters" ~ "cm",
      VSHTUNIT %in% "inches" ~ "inch"
    ),
    VSTMPUNT_TRANSLATED = case_when(
      VSTMPUNT %in% "Fahrenheit" ~ "F",
      VSTMPUNT %in% "Celsius" ~ "C"
    ),
  ) %>%
  unite("WEIGHT", VSWEIGHT, VSWTUNIT_TRANSLATED, sep = "-") %>%
  unite("HEIGHT", VSHEIGHT, VSHTUNIT_TRANSLATED, sep = "-") %>%
  unite("TEMP", VSTEMP, VSTMPUNT_TRANSLATED, VSTMPSRC, sep = "-") %>%
  mutate(
    DIABP = ifelse(!is.na(VSBPDIA), paste0(VSBPDIA, "-", "mmHg"), NA),
    SYSBP = ifelse(!is.na(VSBPSYS), paste0(VSBPSYS, "-", "mmHg"), NA),
    PLUSE = ifelse(!is.na(VSPULSE), paste0(VSPULSE, "-", "beats/min"), NA),
    RESP = ifelse(!is.na(VSRESP), paste0(VSRESP, "-", "breaths/min"), NA)
  )
# Long format
VS <- VS %>%
  pivot_longer(
    cols = c(WEIGHT, HEIGHT, TEMP, DIABP, SYSBP, PLUSE, RESP),
    names_to = "VSTESTCD",
    values_to = "VALUE"
  ) %>%
  set_dom_test(
    .data_list = VS_TESTCD_LIST %>%
      select(VSTESTCD, VSTEST),
    merge_by = "VSTESTCD"
  ) %>%
  separate(VALUE, into = c("VSORRES", "VSORRESU", "VSLOC"), sep = "-") %>%
  mutate(VSLOC = str_to_upper(VSLOC)) %>%
  mutate(VSSTRESU = case_when(
    VSORRESU %in% c("F", "C") ~ "C",
    VSORRESU %in% c("cm", "inch") ~ "cm",
    VSORRESU %in% c("kg", "LB") ~ "kg",
    VSORRESU %in% c("mmHg", "beats/min", "breaths/min") ~ VSORRESU
  )) %>%
  # Unit conversion: conv_unit
  mutate(
    FROM_UNIT = case_when(
      VSORRESU %in% "LB" ~ "lbs",
      VSORRESU %in% c("C", "F", "cm", "inch", "kg") ~ VSORRESU,
      TRUE ~ NA_character_
    ),
    TO_UNIT = case_when(
      VSSTRESU %in% c("C", "kg", "cm") ~ VSSTRESU,
      TRUE ~ NA_character_
    )
  ) %>%
  rowwise() %>%
  mutate(
    VSSTRESN = ifelse(
      VSTESTCD %in% c("WEIGHT", "HEIGHT", "TEMP") & !is.na(FROM_UNIT),
      conv_unit(as.numeric(VSORRES), from = FROM_UNIT, to = TO_UNIT),
      ifelse(VSTESTCD %in% c("DIABP", "SYSBP", "PLUSE", "RESP"),
        as.numeric(VSORRES), NA_real_
      )
    )
  ) %>%
  mutate(
    VSGRPID = COLPROT,
    VSSTRESN = round(VSSTRESN, digits = 1),
    VSSTRESC = as.character(VSSTRESN),
    VSDRVFL = case_when(!is.na(FROM_UNIT) & FROM_UNIT != TO_UNIT ~ "Yes")
  ) %>%
  as_tibble() %>%
  assert_non_missing(COLPROT, VISCODE, VSTEST)
VS <- VS %>%
  derive_usubjid() %>%
  assign_studyid_domain(domain = "VS") %>%
  assign_visit_attr() %>%
  assign_epoch() %>%
  derive_blfl_adni(
    dm_domain = DM,
    tgt_var = "VSBLFL"
  ) %>%
  derive_study_day_adni(
    dm_domain = DM,
    domain = "VS"
  ) %>%
  derive_seq(
    tgt_var = "VSSEQ",
    rec_vars = c("USUBJID", "VSTESTCD", "VSGRPID")
  ) %>%
  assign_vars_label(data_dict = vs_data_dic)