Index Tables Reference¶
This page provides a comprehensive reference for all index tables available in
idc-index. The documentation is automatically generated from the schemas
provided by idc-index-data (version 24.2.0).
Note: Column descriptions are sourced directly from the
idc-index-datapackage schemas. If you notice any missing or incorrect descriptions, please report them in the idc-index-data repository.
Overview¶
idc-index provides access to multiple index tables containing metadata about
the files available in IDC. Each table serves a specific purpose and contains
different metadata attributes.
Table Relationships¶
The following diagram shows how the different index tables relate to each other based on shared key columns. Only key columns used for joins are shown in the diagram; see the individual table sections below for complete column lists.
erDiagram
analysis_results_index {
STRING analysis_result_id
STRING source_DOI
}
ann_group_index {
STRING SeriesInstanceUID
}
ann_index {
STRING SeriesInstanceUID
}
clinical_index {
STRING collection_id
}
collections_index {
STRING collection_id
}
contrast_index {
STRING SeriesInstanceUID
}
ct_index {
STRING SeriesInstanceUID
}
index {
STRING PatientID
STRING SeriesInstanceUID
STRING StudyInstanceUID
STRING analysis_result_id
STRING collection_id
STRING crdc_series_uuid
STRING source_DOI
}
mr_index {
STRING SeriesInstanceUID
}
prior_versions_index {
STRING PatientID
STRING SeriesInstanceUID
STRING StudyInstanceUID
STRING collection_id
STRING crdc_series_uuid
}
pt_index {
STRING SeriesInstanceUID
}
rtstruct_index {
STRING SeriesInstanceUID
}
seg_index {
STRING SeriesInstanceUID
}
sm_index {
STRING SeriesInstanceUID
}
sm_instance_index {
STRING SOPInstanceUID
STRING SeriesInstanceUID
}
volume_geometry_index {
STRING SeriesInstanceUID
}
index ||--o{ prior_versions_index : PatientID
index ||--o{ prior_versions_index : SeriesInstanceUID
index ||--o{ prior_versions_index : StudyInstanceUID
index ||--o{ prior_versions_index : collection_id
index ||--o{ prior_versions_index : crdc_series_uuid
index ||--o{ collections_index : collection_id
index ||--o{ analysis_results_index : analysis_result_id
index ||--o{ analysis_results_index : source_DOI
index ||--o{ clinical_index : collection_id
index ||--o{ sm_index : SeriesInstanceUID
index ||--o{ sm_instance_index : SeriesInstanceUID
index ||--o{ seg_index : SeriesInstanceUID
index ||--o{ ann_index : SeriesInstanceUID
index ||--o{ ann_group_index : SeriesInstanceUID
index ||--o{ contrast_index : SeriesInstanceUID
index ||--o{ volume_geometry_index : SeriesInstanceUID
index ||--o{ rtstruct_index : SeriesInstanceUID
index ||--o{ ct_index : SeriesInstanceUID
index ||--o{ mr_index : SeriesInstanceUID
index ||--o{ pt_index : SeriesInstanceUID
prior_versions_index ||--o{ collections_index : collection_id
prior_versions_index ||--o{ clinical_index : collection_id
prior_versions_index ||--o{ sm_index : SeriesInstanceUID
prior_versions_index ||--o{ sm_instance_index : SeriesInstanceUID
prior_versions_index ||--o{ seg_index : SeriesInstanceUID
prior_versions_index ||--o{ ann_index : SeriesInstanceUID
prior_versions_index ||--o{ ann_group_index : SeriesInstanceUID
prior_versions_index ||--o{ contrast_index : SeriesInstanceUID
prior_versions_index ||--o{ volume_geometry_index : SeriesInstanceUID
prior_versions_index ||--o{ rtstruct_index : SeriesInstanceUID
prior_versions_index ||--o{ ct_index : SeriesInstanceUID
prior_versions_index ||--o{ mr_index : SeriesInstanceUID
prior_versions_index ||--o{ pt_index : SeriesInstanceUID
collections_index ||--o{ clinical_index : collection_id
sm_index ||--o{ sm_instance_index : SeriesInstanceUID
sm_index ||--o{ seg_index : SeriesInstanceUID
sm_index ||--o{ ann_index : SeriesInstanceUID
sm_index ||--o{ ann_group_index : SeriesInstanceUID
sm_index ||--o{ contrast_index : SeriesInstanceUID
sm_index ||--o{ volume_geometry_index : SeriesInstanceUID
sm_index ||--o{ rtstruct_index : SeriesInstanceUID
sm_index ||--o{ ct_index : SeriesInstanceUID
sm_index ||--o{ mr_index : SeriesInstanceUID
sm_index ||--o{ pt_index : SeriesInstanceUID
sm_instance_index ||--o{ seg_index : SeriesInstanceUID
sm_instance_index ||--o{ ann_index : SeriesInstanceUID
sm_instance_index ||--o{ ann_group_index : SeriesInstanceUID
sm_instance_index ||--o{ contrast_index : SeriesInstanceUID
sm_instance_index ||--o{ volume_geometry_index : SeriesInstanceUID
sm_instance_index ||--o{ rtstruct_index : SeriesInstanceUID
sm_instance_index ||--o{ ct_index : SeriesInstanceUID
sm_instance_index ||--o{ mr_index : SeriesInstanceUID
sm_instance_index ||--o{ pt_index : SeriesInstanceUID
seg_index ||--o{ ann_index : SeriesInstanceUID
seg_index ||--o{ ann_group_index : SeriesInstanceUID
seg_index ||--o{ contrast_index : SeriesInstanceUID
seg_index ||--o{ volume_geometry_index : SeriesInstanceUID
seg_index ||--o{ rtstruct_index : SeriesInstanceUID
seg_index ||--o{ ct_index : SeriesInstanceUID
seg_index ||--o{ mr_index : SeriesInstanceUID
seg_index ||--o{ pt_index : SeriesInstanceUID
ann_index ||--o{ ann_group_index : SeriesInstanceUID
ann_index ||--o{ contrast_index : SeriesInstanceUID
ann_index ||--o{ volume_geometry_index : SeriesInstanceUID
ann_index ||--o{ rtstruct_index : SeriesInstanceUID
ann_index ||--o{ ct_index : SeriesInstanceUID
ann_index ||--o{ mr_index : SeriesInstanceUID
ann_index ||--o{ pt_index : SeriesInstanceUID
ann_group_index ||--o{ contrast_index : SeriesInstanceUID
ann_group_index ||--o{ volume_geometry_index : SeriesInstanceUID
ann_group_index ||--o{ rtstruct_index : SeriesInstanceUID
ann_group_index ||--o{ ct_index : SeriesInstanceUID
ann_group_index ||--o{ mr_index : SeriesInstanceUID
ann_group_index ||--o{ pt_index : SeriesInstanceUID
contrast_index ||--o{ volume_geometry_index : SeriesInstanceUID
contrast_index ||--o{ rtstruct_index : SeriesInstanceUID
contrast_index ||--o{ ct_index : SeriesInstanceUID
contrast_index ||--o{ mr_index : SeriesInstanceUID
contrast_index ||--o{ pt_index : SeriesInstanceUID
volume_geometry_index ||--o{ rtstruct_index : SeriesInstanceUID
volume_geometry_index ||--o{ ct_index : SeriesInstanceUID
volume_geometry_index ||--o{ mr_index : SeriesInstanceUID
volume_geometry_index ||--o{ pt_index : SeriesInstanceUID
rtstruct_index ||--o{ ct_index : SeriesInstanceUID
rtstruct_index ||--o{ mr_index : SeriesInstanceUID
rtstruct_index ||--o{ pt_index : SeriesInstanceUID
ct_index ||--o{ mr_index : SeriesInstanceUID
ct_index ||--o{ pt_index : SeriesInstanceUID
mr_index ||--o{ pt_index : SeriesInstanceUID
Available Index Tables¶
index¶
This is the main metadata table provided by idc-index. Each row corresponds to a DICOM series, and contains attributes at the collection, patient, study, and series levels. The table also contains download-related attributes, such as the AWS S3 bucket and URL to download the series.
Columns¶
collection_id(STRING, NULLABLE): short string with the identifier of the collection the series belongs toanalysis_result_id(STRING, NULLABLE): this string is not empty if the specific series is part of an analysis results collection; analysis results can be added to a given collection over timePatientID(STRING, NULLABLE): identifier of the patient within the collection (DICOM attribute)SeriesInstanceUID(STRING, NULLABLE):StudyInstanceUID(STRING, NULLABLE): unique identifier of the DICOM study (DICOM attribute)source_DOI(STRING, NULLABLE): Digital Object Identifier of the dataset that contains the given series; follow this DOI to learn more about the activity that produced this seriesPatientAge(STRING, NULLABLE): age of the subject at the time of imaging (DICOM attribute)PatientSex(STRING, NULLABLE): subject sex (DICOM attribute)StudyDate(STRING, NULLABLE): date of the study (de-identified) (DICOM attribute)StudyDescription(STRING, NULLABLE): textual description of the study content (DICOM attribute)BodyPartExamined(STRING, NULLABLE): body part imaged (not applicable for SM series) (DICOM attribute)Modality(STRING, NULLABLE): acquisition modality (DICOM attribute)SOPClassUID(STRING, NULLABLE): SOP Class UID identifying the type of DICOM object (e.g., CT Image Storage, Segmentation Storage); more specific than Modality for distinguishing object types (DICOM attribute)sop_class_name(STRING, NULLABLE): human-readable name of the SOP Class (e.g., “CT Image Storage”, “Segmentation Storage”); derived from SOPClassUIDTransferSyntaxUID(STRING, NULLABLE): Transfer Syntax UID identifying the encoding of the stored instances (e.g., Explicit VR Little Endian, JPEG 2000, HTJ2K); comma-separated when a series contains instances with different encodings, which is common for SM (DICOM attribute)transfer_syntax_name(STRING, NULLABLE): human-readable name of the Transfer Syntax (e.g., “JPEG 2000”, “Explicit VR Little Endian”); comma-separated when a series contains instances with different encodings; derived from TransferSyntaxUIDManufacturer(STRING, NULLABLE): manufacturer of the equipment that produced the series (DICOM attribute)ManufacturerModelName(STRING, NULLABLE): model name of the equipment that produced the series (DICOM attribute)SeriesDate(STRING, NULLABLE): date of the series (de-identified) (DICOM attribute)SeriesDescription(STRING, NULLABLE): textual description of the series content (DICOM attribute)SeriesNumber(STRING, NULLABLE): series number (DICOM attribute)instanceCount(INTEGER, NULLABLE): number of instances in the serieslicense_short_name(STRING, NULLABLE): short name of the license that applies to this seriesseries_init_idc_version(INTEGER, NULLABLE): IDC data release version number when this series first appeared in IDC (integer, e.g., 1 for v1)series_revised_idc_version(INTEGER, NULLABLE): IDC data release version number when this series was most recently revised in IDC (integer, e.g., 24 for v24)aws_bucket(STRING, NULLABLE): name of the AWS S3 bucket that contains the seriescrdc_series_uuid(STRING, NULLABLE): unique identifier of the series within the IDCseries_aws_url(STRING, NULLABLE): public AWS S3 URL to download the series in bulk (each instance is a separate file)series_size_MB(FLOAT, NULLABLE): total size of the series in megabytes
prior_versions_index¶
Columns¶
collection_id(STRING, NULLABLE):PatientID(STRING, NULLABLE):SeriesInstanceUID(STRING, NULLABLE):StudyInstanceUID(STRING, NULLABLE):Modality(STRING, NULLABLE):gcs_bucket(STRING, NULLABLE):crdc_series_uuid(STRING, NULLABLE):series_size_MB(FLOAT, NULLABLE):series_aws_url(STRING, NULLABLE):gcs_bucket_1(STRING, NULLABLE):aws_bucket(STRING, NULLABLE):min_idc_version(INTEGER, NULLABLE):max_idc_version(INTEGER, NULLABLE):
analysis_results_index¶
This table contains metadata about the analysis results collections available in IDC. Each row corresponds to an analysis results collection, and contains attributes such as the collection name, types of cancer represented, number of subjects, and pointers to the resources to learn more about the content of the collection
Columns¶
analysis_result_id(STRING, NULLABLE): unique identifier of the analysis results collectionanalysis_result_title(STRING, NULLABLE): name of the analysis results collectionsource_DOI(STRING, NULLABLE): Digital Object Identifier (DOI) of the analysis results collectionsource_url(STRING, NULLABLE): URL for the location of additional information about the analysis results collectionsubjects(INTEGER, NULLABLE): number of subjects analyzed in the analysis results collectioncollections(STRING, NULLABLE): collections analyzed in the analysis results collectionmodalities(STRING, NULLABLE): modalities corresponding to the analysis artifacts included in the analysis results collectionupdated(STRING, NULLABLE): timestamp of the last update to the analysis results collectionlicense_url(STRING, NULLABLE): license URL for the analysis results collectionlicense_long_name(STRING, NULLABLE): license name for the analysis results collectionlicense_short_name(STRING, NULLABLE): short name for the license of the analysis results collectiondescription(STRING, NULLABLE): detailed description of the analysis results collectioncitation(STRING, NULLABLE): citation for the analysis results collection that should be used for acknowledgment
ann_group_index¶
This table contains detailed metadata about individual annotation groups within Microscopy Bulk Simple Annotations (ANN) series in IDC. Each row corresponds to a single annotation group, providing granular information about the graphic type, number of annotations, property codes, and algorithm details. This table can be joined with ann_index using SeriesInstanceUID for series-level context. Note: ANN series are assumed to contain a single instance.
Columns¶
SeriesInstanceUID(STRING, NULLABLE):AnnotationGroupNumber(INTEGER, NULLABLE):AnnotationGroupUID(STRING, NULLABLE):AnnotationGroupLabel(STRING, NULLABLE):AnnotationGroupGenerationType(STRING, NULLABLE):NumberOfAnnotations(INTEGER, NULLABLE):GraphicType(STRING, NULLABLE):AnnotationPropertyCategory_code(STRING, NULLABLE): annotation property category code tuple (CodingSchemeDesignator:CodeValue) from DICOM AnnotationPropertyCategoryCodeSequenceAnnotationPropertyCategory_CodeMeaning(STRING, NULLABLE): human-readable meaning of the annotation property category from DICOM AnnotationPropertyCategoryCodeSequenceAnnotationPropertyType_code(STRING, NULLABLE): annotation property type code tuple (CodingSchemeDesignator:CodeValue) from DICOM AnnotationPropertyTypeCodeSequenceAnnotationPropertyType_CodeMeaning(STRING, NULLABLE): human-readable meaning of the annotation property type from DICOM AnnotationPropertyTypeCodeSequenceAlgorithmName(STRING, NULLABLE): name of the algorithm from DICOM AlgorithmName attribute in AnnotationGroupAlgorithmIdentificationSequence (when AnnotationGroupGenerationType is AUTOMATIC)
ann_index¶
This table contains metadata about the Microscopy Bulk Simple Annotations (ANN) series available in IDC. Each row corresponds to a DICOM series containing annotations, and includes attributes such as the annotation coordinate type and references to the annotated image series. For detailed group-level information (counts, graphic types, property codes), join with ann_group_index using SeriesInstanceUID. This table can be joined with the main idc_index table using the SeriesInstanceUID column. Note: ANN series are assumed to contain a single instance.
Columns¶
SeriesInstanceUID(STRING, NULLABLE):AnnotationCoordinateType(STRING, NULLABLE):referenced_SeriesInstanceUID(STRING, NULLABLE): SeriesInstanceUID of the referenced image series that the annotations apply to
clinical_index¶
This table contains metadata about the tabular data, including clinical data,
accompanying images that is available in IDC. Think about this table as a
dictionary containing information about the columns for all of the tabular data
accompanying individual collections in IDC. Each row corresponds to a unique
combination of collection, clinical data table that is available for that
collection, and a column from that table. Individual tables referenced from this
table can be retrieved using idc-index get_clinical_table() function.
Columns¶
collection_id(STRING, NULLABLE): unique identifier of the collectiontable_name(STRING, NULLABLE): full name of the table in which the column is storedshort_table_name(STRING, NULLABLE): short name of the table in which the column is storedcolumn(STRING, NULLABLE):column_label(STRING, NULLABLE): human readable name of the columnvalues(RECORD, REPEATED):
collections_index¶
This table contains metadata about the collections available in IDC. Each row corresponds to a collection, and contains attributes such as the collection name, types of cancer represented, number of subjects, and pointers to the resources to learn more about the content of the collection.
Columns¶
collection_name(STRING, NULLABLE): name of the collectioncollection_id(STRING, NULLABLE): unique identifier of the collectioncancer_types(STRING, NULLABLE): types of cancer represented in the collectiontumor_locations(STRING, NULLABLE): locations of tumors represented in the collectionsubjects(INTEGER, NULLABLE): number of subjects in the collectionspecies(STRING, NULLABLE): species represented in the collectionsources(RECORD, REPEATED): sources of data for the collectionsupporting_data(STRING, NULLABLE): additional data supporting the collection available in IDCprogram_id(STRING, NULLABLE): broader initiative/category under which this collection is being sharedstatus(STRING, NULLABLE): status of the collection (Completed or Ongoing)updated(STRING, NULLABLE): date of the last update to the collectiondescription(STRING, NULLABLE): detailed information about the collection
contrast_index¶
This table contains one row per DICOM series that has contrast agent information. It captures contrast bolus metadata from CT, MR, PT, XA, and RF imaging modalities, including the agent name, ingredient, and administration route. Only series with at least one non-null contrast attribute are included. This table can be joined with the main idc_index table using the SeriesInstanceUID column.
Columns¶
SeriesInstanceUID(STRING, NULLABLE): DICOM SeriesInstanceUID identifier of the imaging seriesContrastBolusAgent(STRING, REPEATED): distinct contrast agent names used in the series as defined in DICOM ContrastBolusAgent attributeContrastBolusIngredient(STRING, REPEATED): distinct contrast agent ingredients used in the series as defined in DICOM ContrastBolusIngredient attributeContrastBolusRoute(STRING, REPEATED): distinct contrast administration routes used in the series as defined in DICOM ContrastBolusRoute attribute
ct_index¶
This table contains one row per CT Image Storage (SOPClassUID 1.2.840.10008.5.1.4.1.1.2) DICOM series in IDC, capturing acquisition and reconstruction parameters that are not included in the main idc_index table. The index can be joined to idc_index on SeriesInstanceUID to combine universal series metadata with CT-specific acquisition parameters. For XRayTubeCurrent, Exposure, and ExposureTime — which vary across instances within a series due to dose modulation — both min and max values are reported. All other attributes are aggregated with ANY_VALUE (one representative instance).
Columns¶
SeriesInstanceUID(STRING, NULLABLE): DICOM SeriesInstanceUID — unique identifier of the CT series; use to join with idc_indexImageType(STRING, REPEATED): image type values as defined in DICOM ImageType attribute (e.g., ORIGINAL/DERIVED, PRIMARY/SECONDARY, AXIAL/LOCALIZER); aggregated with ANY_VALUE — constant across instances within a seriesPixelSpacing_row_mm(FLOAT, NULLABLE): in-plane pixel spacing along the row direction in mm, derived from DICOM PixelSpacing[0]; a subset of CT series have anisotropic spacing where this differs from PixelSpacing_col_mm; aggregated with ANY_VALUE — constant across instances within a seriesPixelSpacing_col_mm(FLOAT, NULLABLE): in-plane pixel spacing along the column direction in mm, derived from DICOM PixelSpacing[1]; a subset of CT series have anisotropic spacing where this differs from PixelSpacing_row_mm; aggregated with ANY_VALUE — constant across instances within a seriesRows(INTEGER, NULLABLE):Columns(INTEGER, NULLABLE):SliceThickness(FLOAT, NULLABLE): nominal reconstructed slice thickness in mm as defined in DICOM SliceThickness attribute; aggregated with ANY_VALUE — constant across instances within a seriesKVP(FLOAT, NULLABLE): peak kilovoltage of the X-ray tube in kV as defined in DICOM KVP attribute; constant across instances — aggregated with ANY_VALUEScanOptions(STRING, REPEATED): acquisition scan options as defined in DICOM ScanOptions attribute (e.g., HELICAL MODE, AXIAL MODE, SCOUT MODE); may contain multiple values; aggregated with ANY_VALUE — constant across instances within a seriesConvolutionKernel(STRING, REPEATED): reconstruction convolution kernel as defined in DICOM ConvolutionKernel attribute; vendor-specific string (e.g., B30f, STANDARD, LUNG); may contain multiple values; aggregated with ANY_VALUE — constant across instances within a seriesGantryDetectorTilt(FLOAT, NULLABLE): nominal angle of the scanning gantry in degrees as defined in DICOM GantryDetectorTilt attribute; non-zero for gantry-tilted acquisitions; constant across instances — aggregated with ANY_VALUEXRayTubeCurrent_min(FLOAT, NULLABLE): minimum X-ray tube current in mA across all instances in the series, derived from DICOM XRayTubeCurrent attribute; equals XRayTubeCurrent_max for fixed-current acquisitions; lower than XRayTubeCurrent_max for dose-modulated acquisitions (MIN across all instances)XRayTubeCurrent_max(FLOAT, NULLABLE): maximum X-ray tube current in mA across all instances in the series, derived from DICOM XRayTubeCurrent attribute; equals XRayTubeCurrent_min for fixed-current acquisitions; higher than XRayTubeCurrent_min for dose-modulated acquisitions (MAX across all instances)FilterType(STRING, NULLABLE): type of filter used in the acquisition as defined in DICOM FilterType attribute (e.g., WEDGE, BUTTERFLY, FLAT); constant across instances — aggregated with ANY_VALUEExposure_min(FLOAT, NULLABLE): minimum exposure in mAs across all instances in the series, derived from DICOM Exposure attribute; equals Exposure_max for fixed-exposure acquisitions; lower than Exposure_max for dose-modulated acquisitions (MIN across all instances)Exposure_max(FLOAT, NULLABLE): maximum exposure in mAs across all instances in the series, derived from DICOM Exposure attribute; equals Exposure_min for fixed-exposure acquisitions; higher than Exposure_min for dose-modulated acquisitions (MAX across all instances)ExposureTime_min(FLOAT, NULLABLE): minimum duration of the X-ray exposure in ms across all instances in the series, derived from DICOM ExposureTime attribute (MIN across all instances)ExposureTime_max(FLOAT, NULLABLE): maximum duration of the X-ray exposure in ms across all instances in the series, derived from DICOM ExposureTime attribute (MAX across all instances)DataCollectionDiameter(FLOAT, NULLABLE): diameter of the region over which data were collected in mm as defined in DICOM DataCollectionDiameter attribute; constant across instances — aggregated with ANY_VALUEReconstructionDiameter(FLOAT, NULLABLE): diameter of the reconstruction field of view in mm as defined in DICOM ReconstructionDiameter attribute; aggregated with ANY_VALUE — constant across instances within a seriesSpiralPitchFactor(FLOAT, NULLABLE): ratio of the beam pitch for helical CT as defined in DICOM SpiralPitchFactor attribute; NULL for non-helical (sequential/axial) acquisitions; constant across instances — aggregated with ANY_VALUE
mr_index¶
This table contains one row per MR Image Storage (SOPClassUID 1.2.840.10008.5.1.4.1.1.4) DICOM series in IDC, capturing MR acquisition and sequence parameters that are not included in the main idc_index table. The index can be joined to idc_index on SeriesInstanceUID to combine universal series metadata with MR-specific acquisition parameters. EchoTime and DiffusionBValue are reported as arrays of all distinct per-instance values because they legitimately differ across instances within multi-echo and diffusion-weighted series respectively. All other attributes are aggregated with ANY_VALUE (one representative instance).
Columns¶
SeriesInstanceUID(STRING, NULLABLE): DICOM SeriesInstanceUID — unique identifier of the MR series; use to join with idc_indexMagneticFieldStrength(FLOAT, NULLABLE): static magnetic field strength in Tesla as defined in DICOM MagneticFieldStrength attribute; constant across instances within a series — aggregated with ANY_VALUEScanningSequence(STRING, REPEATED): pulse sequence type as defined in DICOM ScanningSequence attribute (SE = Spin Echo, GR = Gradient Recalled, IR = Inversion Recovery, EP = Echo Planar); may contain multiple values; constant across instances — aggregated with ANY_VALUESequenceVariant(STRING, REPEATED): variant of the scanning sequence as defined in DICOM SequenceVariant attribute (SK = Segmented k-Space, MTC = Magnetization Transfer Contrast, SS = Steady State, TRSS = Time Reversed Steady State, SP = Spoiled, MP = MAG Prepared, OSP = Oversampling Phase, NONE = No sequence variant); may contain multiple values; constant across instances — aggregated with ANY_VALUEMRAcquisitionType(STRING, NULLABLE): whether the acquisition is 2D or 3D as defined in DICOM MRAcquisitionType attribute; constant across instances — aggregated with ANY_VALUEEchoTime(FLOAT, REPEATED): distinct echo times in ms present in the series, derived from DICOM EchoTime attribute; aggregated as ARRAY_AGG(DISTINCT) across all instances because EchoTime legitimately varies in multi-echo sequences; single-element array for single-echo series, multi-element array for multi-echo series (e.g., [2.46, 4.92, 7.38])RepetitionTime(FLOAT, NULLABLE): repetition time in ms as defined in DICOM RepetitionTime attribute; aggregated with ANY_VALUE — constant across instances within a seriesEchoTrainLength(INTEGER, NULLABLE): number of echoes in the echo train as defined in DICOM EchoTrainLength attribute; constant across instances — aggregated with ANY_VALUEFlipAngle(FLOAT, NULLABLE): flip angle in degrees as defined in DICOM FlipAngle attribute; aggregated with ANY_VALUE — constant across instances within a seriesPixelBandwidth(FLOAT, NULLABLE): receiver bandwidth per pixel in Hz as defined in DICOM PixelBandwidth attribute; aggregated with ANY_VALUE — constant across instances within a seriesImagingFrequency(FLOAT, NULLABLE): Larmor resonance frequency in MHz as defined in DICOM ImagingFrequency attribute; proportional to MagneticFieldStrength (42.577 MHz/T for proton); constant across instances — aggregated with ANY_VALUEImagedNucleus(STRING, NULLABLE): nucleus used for imaging as defined in DICOM ImagedNucleus attribute (e.g., 1H for proton, 31P, 23Na); constant across instances — aggregated with ANY_VALUEPixelSpacing_row_mm(FLOAT, NULLABLE): in-plane pixel spacing along the row direction in mm, derived from DICOM PixelSpacing[0]; MR pixel spacing is isotropic in almost all series in IDC; aggregated with ANY_VALUE — constant across instances within a seriesPixelSpacing_col_mm(FLOAT, NULLABLE): in-plane pixel spacing along the column direction in mm, derived from DICOM PixelSpacing[1]; MR pixel spacing is isotropic in almost all series in IDC; aggregated with ANY_VALUE — constant across instances within a seriesRows(INTEGER, NULLABLE):Columns(INTEGER, NULLABLE):SliceThickness(FLOAT, NULLABLE): nominal slice thickness in mm as defined in DICOM SliceThickness attribute; aggregated with ANY_VALUE — constant across instances within a seriesInversionTime(FLOAT, NULLABLE): inversion time in ms as defined in DICOM InversionTime attribute; populated only for inversion recovery sequences, NULL otherwise; constant across instances — aggregated with ANY_VALUEReceiveCoilName(STRING, NULLABLE): name of the receiver coil used as defined in DICOM ReceiveCoilName attribute; aggregated with ANY_VALUE — constant across instances within a seriesSequenceName(STRING, NULLABLE): manufacturer-specific pulse sequence name as defined in DICOM SequenceName attribute (e.g., tfl for Siemens FLASH, ep_b1000 for Siemens EPI diffusion); aggregated with ANY_VALUE — constant across instances within a seriesDiffusionBValue(FLOAT, REPEATED): distinct diffusion b-values in s/mm² present in the series, derived from DICOM DiffusionBValue attribute; aggregated as ARRAY_AGG(DISTINCT) across all instances because DiffusionBValue legitimately varies across instances in diffusion-weighted series; empty array for non-DWI series; multi-element for DWI (e.g., [0.0, 1000.0])NumberOfTemporalPositions(INTEGER, NULLABLE): number of temporal positions (time frames) in the series as defined in DICOM NumberOfTemporalPositions attribute; populated for dynamic (DCE-MRI) series; NULL otherwise; constant across instances — aggregated with ANY_VALUE
pt_index¶
This table contains one row per Positron Emission Tomography Image Storage (SOPClassUID 1.2.840.10008.5.1.4.1.1.128) DICOM series in IDC, capturing PET acquisition, reconstruction and radiopharmaceutical parameters that are not included in the main idc_index table. The index can be joined to idc_index on SeriesInstanceUID to combine universal series metadata with PET-specific acquisition parameters. ActualFrameDuration is reported as an array of all distinct per-instance values because it legitimately varies across frames in dynamic (multi-frame) PET acquisitions. All other attributes are constant within a series and are aggregated with ANY_VALUE.
Columns¶
SeriesInstanceUID(STRING, NULLABLE): DICOM SeriesInstanceUID — unique identifier of the PET series; use to join with idc_indexSeriesType(STRING, NULLABLE): acquisition type of the series as defined in DICOM SeriesType attribute, encoded as a slash-separated string of the two type values (e.g., STATIC/IMAGE, DYNAMIC/IMAGE, GATED/IMAGE, WHOLE BODY/IMAGE); constant across instances — aggregated with ANY_VALUE(ARRAY_TO_STRING(SeriesType, ‘/’))Units(STRING, NULLABLE): pixel value units as defined in DICOM Units attribute (e.g., BQML = Bq/mL, CNTS = counts, CPS = counts/s, GML = g/mL); constant across instances — aggregated with ANY_VALUEDecayCorrection(STRING, NULLABLE): type of decay correction applied as defined in DICOM DecayCorrection attribute (START = corrected to scan start time, ADMIN = corrected to radiopharmaceutical administration time, NONE = no correction); constant across instances — aggregated with ANY_VALUECorrectedImage(STRING, REPEATED): list of corrections applied to the image as defined in DICOM CorrectedImage attribute (e.g., ATTN = attenuation, SCAT = scatter, DECY = decay, RAN = randoms); may contain multiple values; constant across instances — aggregated with ANY_VALUERandomsCorrectionMethod(STRING, NULLABLE): method used for randoms correction as defined in DICOM RandomsCorrectionMethod attribute; constant across instances — aggregated with ANY_VALUEReconstructionMethod(STRING, NULLABLE): reconstruction algorithm as defined in DICOM ReconstructionMethod attribute (e.g., OSEM, FBP); constant across instances — aggregated with ANY_VALUEActualFrameDuration(FLOAT, REPEATED): distinct actual frame durations in ms present in the series, derived from DICOM ActualFrameDuration attribute; aggregated as ARRAY_AGG(DISTINCT) across all instances because ActualFrameDuration legitimately varies across frames in dynamic PET acquisitions; single-element array for static PET, multi-element for dynamic PET with variable frame durationsScatterCorrectionMethod(STRING, NULLABLE): scatter correction method as defined in DICOM ScatterCorrectionMethod attribute; constant across instances — aggregated with ANY_VALUEAttenuationCorrectionMethod(STRING, NULLABLE): attenuation correction method as defined in DICOM AttenuationCorrectionMethod attribute; constant across instances — aggregated with ANY_VALUERadionuclideCodeMeaning(STRING, NULLABLE): code meaning of the radionuclide used, from RadiopharmaceuticalInformationSequence[0].RadionuclideCodeSequence[0].CodeMeaning; (e.g., Fluorine F18, Gallium Ga-68); constant across instances — aggregated with ANY_VALUERadionuclideTotalDose(FLOAT, NULLABLE): total administered dose of the radionuclide in Bq, from RadiopharmaceuticalInformationSequence[0].RadionuclideTotalDose; constant across instances — aggregated with ANY_VALUERadiopharmaceuticalStartTime(STRING, NULLABLE): time of radiopharmaceutical administration (injection time), from RadiopharmaceuticalInformationSequence[0].RadiopharmaceuticalStartTime; stored as STRING (HH:MM:SS.FFFFFF) because DICOM TIME type is not supported in parquet output; constant across instances — aggregated with ANY_VALUERadiopharmaceutical(STRING, NULLABLE): free-text name of the radiopharmaceutical as defined in RadiopharmaceuticalInformationSequence[0].Radiopharmaceutical (e.g., Fluorodeoxyglucose F^18^); values are not standardized across sites; see RadionuclideCodeMeaning for a more consistent alternative; constant across instances — aggregated with ANY_VALUEPixelSpacing_row_mm(FLOAT, NULLABLE): in-plane pixel spacing along the row direction in mm, derived from DICOM PixelSpacing[0]; PET pixel spacing is isotropic in almost all series in IDC; aggregated with ANY_VALUE — constant across instances within a seriesPixelSpacing_col_mm(FLOAT, NULLABLE): in-plane pixel spacing along the column direction in mm, derived from DICOM PixelSpacing[1]; PET pixel spacing is isotropic in almost all series in IDC; aggregated with ANY_VALUE — constant across instances within a seriesRows(INTEGER, NULLABLE):Columns(INTEGER, NULLABLE):SliceThickness(FLOAT, NULLABLE): nominal slice thickness in mm as defined in DICOM SliceThickness attribute; constant across instances — aggregated with ANY_VALUENumberOfSlices(INTEGER, NULLABLE): total number of slices in the series as defined in DICOM NumberOfSlices attribute; constant across instances — aggregated with ANY_VALUENumberOfTimeSlices(INTEGER, NULLABLE): number of time frames in the series as defined in DICOM NumberOfTimeSlices attribute; populated only for dynamic (multi-frame) PET series, NULL for static PET; constant across instances — aggregated with ANY_VALUE
rtstruct_index¶
This table contains one row per DICOM RT Structure Set (RTSTRUCT) SeriesInstanceUID available from IDC, and captures key metadata about the structure set including the number of ROIs, ROI names, generation algorithms, interpreted types, and the referenced image series. Note: multi-valued columns (ROINames, ROIGenerationAlgorithms, RTROIInterpretedTypes) are aggregated with DISTINCT independently, so positional correspondence between columns is not preserved.
Columns¶
SeriesInstanceUID(STRING, NULLABLE): DICOM SeriesInstanceUID identifier of the RT Structure Set seriestotal_rois(INTEGER, NULLABLE): Number of ROIs in the structure set obtained by counting distinct DICOM ROINumber values in the StructureSetROISequenceROINames(STRING, REPEATED): Array of distinct ROI names from DICOM ROIName attribute in StructureSetROISequence, e.g., [“GTV”, “Heart”, “Liver”, “PTV”]ROIGenerationAlgorithms(STRING, REPEATED): Array of distinct ROI generation algorithms from DICOM ROIGenerationAlgorithm attribute in StructureSetROISequence, e.g., [“AUTOMATIC”, “MANUAL”, “SEMIAUTOMATIC”]RTROIInterpretedTypes(STRING, REPEATED): Array of distinct ROI interpreted types from DICOM RTROIInterpretedType attribute in RTROIObservationsSequence, e.g., [“GTV”, “ORGAN”, “PTV”]referenced_SeriesInstanceUID(STRING, NULLABLE): SeriesInstanceUID of the referenced image series that the structure set applies to, extracted from DICOM ReferencedFrameOfReferenceSequence > RTReferencedStudySequence > RTReferencedSeriesSequence
seg_index¶
This table contains one row per DICOM Segmentation SeriesInstanceUID available from IDC, and captures key metadata about the segmentation series including the number of segments, segmentation type, algorithm type and name, and the segmented image series. Note: multi-valued columns (AlgorithmType, AlgorithmName, SegmentedPropertyCategory_CodeMeanings, etc.) are aggregated with DISTINCT independently, so positional correspondence between columns is not preserved. For example, the first value in AlgorithmType does not necessarily pair with the first value in AlgorithmName.
Columns¶
SeriesInstanceUID(STRING, NULLABLE): DICOM SeriesInstanceUID identifier of the segmentation seriesSegmentationType(STRING, NULLABLE): Type of segmentation as defined in DICOM SegmentationType attributetotal_segments(INTEGER, NULLABLE): Number of segments in the segmentation series obtained by counting distinct DICOM SegmentNumber values in the DICOM SegmentatationSequenceAlgorithmType(STRING, NULLABLE): Segmentation algorithm type as available in DICOM SegmentAlgorithmTypeAlgorithmName(STRING, NULLABLE): Segmentation algorithm name as available in DICOM SegmentAlgorithmNamesegmented_SeriesInstanceUID(STRING, NULLABLE): SeriesInstanceUID of the referenced image series that the segmentation applies toSegmentedPropertyCategory_CodeMeanings(STRING, REPEATED): Array of distinct CodeMeaning values from SegmentedPropertyCategoryCodeSequence across all segments in the series, representing the broad category of the segmented property as defined in DICOM PS3.3 C.8.20.2, e.g., [“Anatomical Structure”], [“Morphologically Altered Structure”]SegmentedPropertyType_CodeMeanings(STRING, REPEATED): Array of distinct CodeMeaning values from SegmentedPropertyTypeCodeSequence across all segments in the series, representing the specific type of the segmented property as defined in DICOM PS3.3 C.8.20.2, e.g., [“Liver”, “Kidney”, “Spleen”]AnatomicRegion_CodeMeanings(STRING, REPEATED): Array of distinct CodeMeaning values from AnatomicRegionSequence across all segments in the series, representing the anatomic location of the segmented structure as defined in DICOM PS3.3 C.8.20.2, e.g., [“Abdomen”], [“Thorax”, “Head”]
sm_index¶
This table contains metadata about the slide microscopy (SM) series available in
IDC. Each row corresponds to a DICOM series, and contains attributes specific to
SM series, such as the pixel spacing at the maximum resolution layer, the power
of the objective lens used to digitize the slide, and the anatomic location from
where the imaged specimen was collected. This table can be joined with the main
index table using the SeriesInstanceUID column.
Columns¶
SeriesInstanceUID(STRING, NULLABLE):ContainerIdentifier(STRING, NULLABLE): identifier of the container (e.g., glass slide) holding the specimen from DICOM ContainerIdentifier attributeembeddingMedium_CodeMeaning(STRING, REPEATED): embedding medium CodeMeaning from DICOM SpecimenPreparationSequence in SpecimenDescriptionSequenceembeddingMedium_code_designator_value_str(STRING, REPEATED): embedding medium code tuple from DICOM SpecimenPreparationSequence in SpecimenDescriptionSequencetissueFixative_CodeMeaning(STRING, REPEATED): tissue fixative CodeMeaning from DICOM SpecimenPreparationSequence in SpecimenDescriptionSequencetissueFixative_code_designator_value_str(STRING, REPEATED): tissue fixative code tuple from DICOM SpecimenPreparationSequence in SpecimenDescriptionSequencestaining_usingSubstance_CodeMeaning(STRING, REPEATED): staining substances CodeMeaning from DICOM SpecimenPreparationSequence in SpecimenDescriptionSequencestaining_usingSubstance_code_designator_value_str(STRING, REPEATED): staining substances code tuple from DICOM SpecimenPreparationSequence in SpecimenDescriptionSequencemin_PixelSpacing_2sf(FLOAT, NULLABLE): pixel spacing in mm at the maximum resolution layer, rounded to 2 significant figures, derived from DICOM PixelSpacing attributemax_TotalPixelMatrixColumns(INTEGER, NULLABLE): width of the image at the maximum resolution from DICOM TotalPixelMatrixColumns attributemax_TotalPixelMatrixRows(INTEGER, NULLABLE): height of the image at the maximum resolution from DICOM TotalPixelMatrixRows attributeObjectiveLensPower(INTEGER, NULLABLE): power of the objective lens from DICOM ObjectiveLensPower attribute in OpticalPathSequenceprimaryAnatomicStructure_code_designator_value_str(STRING, NULLABLE): anatomic location code tuple from DICOM PrimaryAnatomicStructureSequence in SpecimenDescriptionSequenceprimaryAnatomicStructure_CodeMeaning(STRING, NULLABLE): anatomic location CodeMeaning from DICOM PrimaryAnatomicStructureSequence in SpecimenDescriptionSequenceprimaryAnatomicStructureModifier_code_designator_value_str(STRING, NULLABLE): specimen modifier code tuple from DICOM PrimaryAnatomicStructureModifierSequence (when available)primaryAnatomicStructureModifier_CodeMeaning(STRING, NULLABLE): specimen modifier CodeMeaning from DICOM PrimaryAnatomicStructureModifierSequence (when available)illuminationType_code_designator_value_str(STRING, NULLABLE): illumination type code tuple from DICOM IlluminationTypeCodeSequence in OpticalPathSequenceilluminationType_CodeMeaning(STRING, NULLABLE): illumination type CodeMeaning from DICOM IlluminationTypeCodeSequence in OpticalPathSequenceadmittingDiagnosis_code_designator_value_str(STRING, NULLABLE): admitting diagnosis code tuple from DICOM AdmittingDiagnosesCodeSequence (when available)admittingDiagnosis_CodeMeaning(STRING, NULLABLE): admitting diagnosis CodeMeaning from DICOM AdmittingDiagnosesCodeSequence (when available)
sm_instance_index¶
This table contains metadata about the slide microscopy (SM) series available in
IDC. Each row corresponds to an instance from a DICOM Slide Microscopy series
available from IDC, identified by SOPInstanceUID, and contains attributes
specific to SM series, such as the pixel spacing at the maximum resolution
layer, the power of the objective lens used to digitize the slide, and the
anatomic location from where the imaged specimen was collected. This table can
be joined with the main index table and/or with sm_index using the
SeriesInstanceUID column.
Columns¶
SOPInstanceUID(STRING, NULLABLE): unique identifier of the instanceSeriesInstanceUID(STRING, NULLABLE): unique identifier of the seriesembeddingMedium_CodeMeaning(STRING, REPEATED): embedding medium used for the slide preparationembeddingMedium_code_designator_value_str(STRING, REPEATED): embedding medium code tupletissueFixative_CodeMeaning(STRING, REPEATED): tissue fixative used for the slide preparationtissueFixative_code_designator_value_str(STRING, REPEATED): tissue fixative code tuplestaining_usingSubstance_CodeMeaning(STRING, REPEATED): staining substances used for the slide preparationstaining_usingSubstance_code_designator_value_str(STRING, REPEATED): staining using substance code tuplePixelSpacing_0(FLOAT, NULLABLE): pixel spacing in mm, rounded to 2 significant figuresImageType(STRING, REPEATED): DICOM ImageType attributeTransferSyntaxUID(STRING, NULLABLE): DICOM TransferSyntaxUID attributeinstance_size(INTEGER, NULLABLE): size of the instance file in bytesTotalPixelMatrixColumns(INTEGER, NULLABLE): number of columns in the imageTotalPixelMatrixRows(INTEGER, NULLABLE): number of rows in the imagecrdc_instance_uuid(STRING, NULLABLE): unique identifier of the instance within the IDC
version_metadata_index¶
This table contains metadata about each IDC data release version. Each row corresponds to one IDC version and captures when that version was created. This index can be used to correlate data in other indexes (which include idc_version columns) with the corresponding release timestamps.
Columns¶
idc_version(INTEGER, NULLABLE): IDC version number identifying the data releaseversion_timestamp(STRING, NULLABLE): timestamp when this IDC version was created
volume_geometry_index¶
This table contains one row per DICOM series from IDC for single-frame CT, MR, and PT SOP classes, with boolean columns characterizing the geometric properties of each series. The checks determine whether the series forms a regularly-spaced rectilinear 3D volume (consistent orientation, spacing, dimensions, and slice positions). Series that do not pass all checks may still be usable with additional processing such as resampling or acquisition geometry correction (e.g., for variable-spacing or gantry-tilted acquisitions). Oblique-aware: uses projection-based slice position computation, which handles gantry-tilted CT, oblique MR, and axial PET uniformly.
Columns¶
SeriesInstanceUID(STRING, NULLABLE): unique identifier of the DICOM seriessingle_orientation(BOOLEAN, NULLABLE): TRUE if all instances share the same ImageOrientationPatient (DICOM attribute)orthogonal_orientation(BOOLEAN, NULLABLE): TRUE if the cross product of row and column orientation vectors has unit magnitude (within orientationTolerance), confirming orthogonal direction cosinesunique_slice_positions(BOOLEAN, NULLABLE): TRUE if every instance has a distinct slice position along the volume normal, i.e. no duplicate or overlapping slicesconsistent_in_plane_row(BOOLEAN, NULLABLE): TRUE if the projection of ImagePositionPatient onto the row direction is constant across all instances (within inPlaneTolerance)consistent_in_plane_col(BOOLEAN, NULLABLE): TRUE if the projection of ImagePositionPatient onto the column direction is constant across all instances (within inPlaneTolerance)consistent_pixel_spacing(BOOLEAN, NULLABLE): TRUE if all instances share the same PixelSpacing (DICOM attribute)consistent_image_dimensions(BOOLEAN, NULLABLE): TRUE if all instances share the same Rows and Columns values (DICOM attributes)uniform_slice_spacing(BOOLEAN, NULLABLE): TRUE if the spacing between consecutive slices is constant (within relativeSliceTolerance, a relative fraction of expected spacing)obliquity_degrees(FLOAT, NULLABLE): angle in degrees between the slice normal and the nearest cardinal axis (X, Y, or Z in patient coordinates); 0 means pure axial, sagittal, or coronal; values above 0 indicate oblique acquisition or gantry tiltregularly_spaced_3d_volume(BOOLEAN, NULLABLE): TRUE if all individual checks pass, indicating the series forms a regularly-spaced rectilinear 3D volume that can be loaded directly into a 3D array without resampling