Skip to contents

Overview

The main goal of developing the Alzheimer’s Disease Neuroimaging Initiative (ADNI) study R data package, ADNIMERGE2, is to grant easy access of the study data, which is being collected over two decades, for any researchers and/or data analyst who are interested to learn more about the study and conduct data analysis using the R software.

Package Source

The ADNIMERGE2 R package will be available on the data-sharing platform at LONI website. To get access the package, it is required to submit an online application via LONI website and acceptance of the ADNI Data Use Agreement. Please visit at https://adni.loni.usc.edu/data-samples/adni-data/#AccessData to learn more about the application submission process and ADNI Data Use Agreement.

Installation

To install the package locally, run install.packages("path/to/ADNIMERGE2_0.1.1.tar.gz", repos = NULL, type = "source").

Package Usage

Package Source Data Date

The ADNIMERGE2 package contains a data stamped date of which the raw data is downloaded from the data-sharing platform. For instance, the current package contains data that downloaded from the data-sharing platform as of 2025-07-10. To get the data stamped date:

# Data source downloaded date
ADNIMERGE2::DATA_DOWNLOADED_DATE
#> [1] "2025-07-10"

Data Dictionary

A separate data dictionary file for both raw and derived datasets is included in the package.

  • Raw Dataset: Defined as all the study data that are available on the data-sharing platform.

  • Derived Dataset: Defined as data that are generated during this package building. Please refer to the package vignettes to learn more how these data are generated.

# Data dictionary for raw data
head(ADNIMERGE2::DATADIC, 6)
#> # A tibble: 6 × 13
#>   PHASE CRFNAME    TBLNAME FLDNAME TEXT  TYPE  LENGTH DD_CRF_VERSION CODE  UNITS
#>   <chr> <chr>      <chr>   <chr>   <chr> <chr> <chr>  <chr>          <chr> <chr>
#> 1 ADNI1 ADAS-Cogn… ADAS    PTID    Part… NA    NA     NA             NA    NA   
#> 2 ADNI1 ADAS-Cogn… ADAS    RID     Part… N     38 di… NA             NA    NA   
#> 3 ADNI1 ADAS-Cogn… ADAS    VISCODE Visi… T     20 ch… NA             NA    NA   
#> 4 ADNI1 ADAS-Cogn… ADAS    EXAMDA… Exam… D     10     NA             NA    NA   
#> 5 ADNI1 ADAS-Cogn… ADAS    VISDATE Asse… D     NA     NA             NA    NA   
#> 6 ADNI1 ADAS-Cogn… ADAS    COT1LI… Tria… T     20     NA             1=BU… NA   
#> # ℹ 3 more variables: STATUS <chr>, CODE_CHANGES <chr>, MAPPING_NOTES <chr>
# Data dictionary for derived data
head(ADNIMERGE2::DERIVED_DATADIC, 6)
#> # A tibble: 6 × 5
#>   TBLNAME CRFNAME                            FLDNAME LABEL                 TEXT 
#>   <chr>   <chr>                              <chr>   <chr>                 <chr>
#> 1 ADAE    Analysis Dataset of Adverse Events STUDYID Study Identifier      " "  
#> 2 ADAE    Analysis Dataset of Adverse Events USUBJID Unique Subject Ident… " "  
#> 3 ADAE    Analysis Dataset of Adverse Events SUBJID  Subject Identifier f… " "  
#> 4 ADAE    Analysis Dataset of Adverse Events SITEID  Study Site Identifier " "  
#> 5 ADAE    Analysis Dataset of Adverse Events TRTA    Actual Arm            " "  
#> 6 ADAE    Analysis Dataset of Adverse Events TRTP    Planned Arm           " "
# Data dictionary for derived data based on R6-class object
ADNIMERGE2::METACORES
#> Metacore object contains metadata for 4 datasets

Coded Values

Majority of the study raw data contains variable with numerically coded values. As a result, these variable values are mapped based on the data dictionary with corresponding study phase-specific mapping values. The variables which values are mapped accordingly have a Decoded Value:Yes tag in data documentations. For instance, CDSOURCE variable in the ADNIMERGE2::CDR have such tags as shown in Figure below within the red box.

It is recommended to verify the values of numerically coded variables that are not mapped/decoded using the data dictionary file either in this package ADNIMERGE2::DATADIC or from the data-sharing platform.

To get all variables with code values:

# Get variable code values for all available data based on the DATADIC
data_dict_codes <- get_factor_levels_datadict(
  .datadic = ADNIMERGE2::DATADIC,
  tbl_name = NULL,
  nested_value = FALSE
)

class(data_dict_codes)
#> [1] "tbl_df"       "tbl"          "data.frame"   "datadict_tbl"

data_dict_codes %>%
  datadict_as_tibble() %>%
  relocate(prefix, suffix) %>%
  head()
#> # A tibble: 6 × 15
#>   prefix suffix PHASE CRFNAME  TBLNAME FLDNAME TEXT  TYPE  LENGTH DD_CRF_VERSION
#>   <chr>  <chr>  <chr> <chr>    <chr>   <chr>   <chr> <chr> <chr>  <chr>         
#> 1 1      BUTTER ADNI1 ADAS-Co… ADAS    COT1LI… Tria… t     20     NA            
#> 2 2      ARM    ADNI1 ADAS-Co… ADAS    COT1LI… Tria… t     20     NA            
#> 3 3      SHORE  ADNI1 ADAS-Co… ADAS    COT1LI… Tria… t     20     NA            
#> 4 4      LETTER ADNI1 ADAS-Co… ADAS    COT1LI… Tria… t     20     NA            
#> 5 5      QUEEN  ADNI1 ADAS-Co… ADAS    COT1LI… Tria… t     20     NA            
#> 6 6      CABIN  ADNI1 ADAS-Co… ADAS    COT1LI… Tria… t     20     NA            
#> # ℹ 5 more variables: UNITS <chr>, STATUS <chr>, CODE_CHANGES <chr>,
#> #   MAPPING_NOTES <chr>, class_type <chr>

Missing Values

Furthermore, -4 value in majority of the ADNI study phases and -1 values in the ADNI1 study phase were be considered as a missing value. Therefore, these values are converted into a missing character value NA in the package.

NOTE: ℹ No variable that contains this value "-4" message in the following two r chunks tells us there are no variables with the specified values in the ADNIMERGE2::DXSUM dataset since these values are already converted into missing value prior to the package build here.

# Convert "-4" into missing value
convert_to_missing_value(
  .data = ADNIMERGE2::DXSUM,
  col_name = colnames(ADNIMERGE2::DXSUM),
  value = "-4",
  missing_char = NA,
  phase = adni_phase()
) %>%
  select(-PTID) %>%
  head()
#>  No variable that contains this value "-4".
#> # A tibble: 6 × 41
#>   ORIGPROT COLPROT   RID VISCODE VISCODE2 EXAMDATE   DIAGNOSIS DXNORM DXNODEP
#>   <chr>    <fct>   <dbl> <chr>   <chr>    <date>     <chr>     <chr>  <chr>  
#> 1 ADNI1    ADNI1       2 bl      bl       2005-09-29 CN        Yes    NA     
#> 2 ADNI1    ADNI1       3 bl      bl       2005-09-30 Dementia  NA     NA     
#> 3 ADNI1    ADNI1       5 bl      bl       2005-09-30 CN        Yes    NA     
#> 4 ADNI1    ADNI1       8 bl      bl       2005-09-30 CN        Yes    NA     
#> 5 ADNI1    ADNI1       7 bl      bl       2005-10-06 Dementia  NA     NA     
#> 6 ADNI1    ADNI1      15 bl      bl       2005-10-18 CN        Yes    NA     
#> # ℹ 32 more variables: DXMCI <chr>, DXMDES <chr>, DXMPTR1 <chr>, DXMPTR2 <chr>,
#> #   DXMPTR3 <chr>, DXMPTR4 <chr>, DXMPTR5 <chr>, DXMPTR6 <chr>, DXMDUE <chr>,
#> #   DXMOTHET <chr>, DXDSEV <chr>, DXDDUE <chr>, DXAD <chr>, DXAPP <chr>,
#> #   DXAPROB <chr>, DXAPOSS <chr>, DXPARK <chr>, DXPDES <chr>, DXPCOG <chr>,
#> #   DXPATYP <chr>, DXDEP <chr>, DXOTHDEM <chr>, DXODES <chr>, DXCONFID <chr>,
#> #   ID <dbl>, SITEID <dbl>, USERDATE <date>, USERDATE2 <date>,
#> #   DD_CRF_VERSION_LABEL <chr>, LANGUAGE_CODE <chr>, HAS_QC_ERROR <chr>, …
# Convert "-1" into missing value in ADNI1 phase only
convert_to_missing_value(
  .data = ADNIMERGE2::DXSUM,
  col_name = NULL,
  value = "-1",
  missing_char = NA,
  phase = "ADNI1"
) %>%
  select(-PTID) %>%
  head()
#>  No variable that contains this value "-1".
#> # A tibble: 6 × 41
#>   ORIGPROT COLPROT   RID VISCODE VISCODE2 EXAMDATE   DIAGNOSIS DXNORM DXNODEP
#>   <chr>    <fct>   <dbl> <chr>   <chr>    <date>     <chr>     <chr>  <chr>  
#> 1 ADNI1    ADNI1       2 bl      bl       2005-09-29 CN        Yes    NA     
#> 2 ADNI1    ADNI1       3 bl      bl       2005-09-30 Dementia  NA     NA     
#> 3 ADNI1    ADNI1       5 bl      bl       2005-09-30 CN        Yes    NA     
#> 4 ADNI1    ADNI1       8 bl      bl       2005-09-30 CN        Yes    NA     
#> 5 ADNI1    ADNI1       7 bl      bl       2005-10-06 Dementia  NA     NA     
#> 6 ADNI1    ADNI1      15 bl      bl       2005-10-18 CN        Yes    NA     
#> # ℹ 32 more variables: DXMCI <chr>, DXMDES <chr>, DXMPTR1 <chr>, DXMPTR2 <chr>,
#> #   DXMPTR3 <chr>, DXMPTR4 <chr>, DXMPTR5 <chr>, DXMPTR6 <chr>, DXMDUE <chr>,
#> #   DXMOTHET <chr>, DXDSEV <chr>, DXDDUE <chr>, DXAD <chr>, DXAPP <chr>,
#> #   DXAPROB <chr>, DXAPOSS <chr>, DXPARK <chr>, DXPDES <chr>, DXPCOG <chr>,
#> #   DXPATYP <chr>, DXDEP <chr>, DXOTHDEM <chr>, DXODES <chr>, DXCONFID <chr>,
#> #   ID <dbl>, SITEID <dbl>, USERDATE <date>, USERDATE2 <date>,
#> #   DD_CRF_VERSION_LABEL <chr>, LANGUAGE_CODE <chr>, HAS_QC_ERROR <chr>, …

Derived Datasets

As part of the study data R package development, some derived datasets are created using the PHARMAVERSE workflow for illustration purpose. Detailed procedures of how these datasets are generated can be found in the following package vignettes:

NOTE:

  • It is recommended to learn more about how these data are generated and the corresponding raw data source prior using those derived data in any analysis.

  • Some of the derived datasets in the package may not be fully complied with the CDISC standardization, and those data are generated for illustration purpose.

Articles

There are few articles included in the package vignettes to demonstrate how to use the study data R package for creating simple summaries or analysis results.

  • Enrollment Summaries: Includes enrollment summary by calender month, enrolled subject demographic/characteristics summary by study phases and baseline diagnostics status (Cognitive Normal (CN), Mild Cognitive Impairment (MCI) and Dementia (DEM)).

  • Longitudinal Clinical Cognitive Outcome Summaries: Currently, includes summary of the ADAS Cognitive Behavior assessment item-13 total score (ADAS-cog) overtime across the baseline diagnostics status.

NOTE: The analysis results that presented in the package vignettes are not pre-planned and only included for illustration purpose.