Dataset

The datasets that will be provided for the challenge will be separated into:

  • A training set: A first group of datasets to be used for training will be provided on August 26, 2011, along with manual annotations (representing 25% of the available data).
  • A test set: A second group of datasets will be provided on August 26, 2011. However, the annotations will ONLY be released on the day of the workshop.
  • A validation set: A small subset of frames (around 10), that belong to the test set, will be randomly chosen. This data set will be provided on the day of the challenge.
  • Participants can use the training set provided by the organizers to tune their algorithm(s) on unknown IVUS formats. However, if the datasets provided by the organizers do not contain sufficient frames for training a classifier, then, the participants are allowed to use their own training sets. 
The data will have the following attributes:
Attribute Options
IVUS System BSX/VOLC/Other
Frequency 20MHz/40MHz/
Format RF/DICOM/Both
Number of frames in this sequence 1-5 (2D) / 20-50 (3D)
Plaque No/Yes
Bifurcation No/Yes
Stent No/Yes
Side Vessels No/Yes
Shadow Artifacts No/Yes
Guidewire Artifacts (not applicable for Volcano) No/Yes
Catheter near vessel wall No/Yes

A MATLAB script will be provided with the training set to evaluate the results in a unified way (evaluation script). A strict data format will be defined in order to run automatic error computations. Participants will run the segmentation algorithm (before the workshop), and will provide the contours obtained on the test set to the organizers before September 9, 2011 (HARD DEAD LINE, no extension possible). The segmentation results will be evaluated using the evaluation script. For methods that require initialization before segmentation, such an interaction will be allowed as long as a detailed description of the initialization (process) is submitted with the algorithm description. No manual edits to the results of the algorithm are allowed.

To assess the execution time of the methods, the participants are required to run the algorithm over the validation set on the day of the challenge and provide the contours to the organizers. The validation set will be distributed at the beginning of the workshop, and the participants will have the full day for providing the results.


Dataset description

Four heterogeneous datasets have been prepared for the challenge. Each dataset includes a group of five contiguous frames chosen at specific vessel locations, so that both temporal and spatial information are exploited. Each frame is labeled as follows: frame_XX_YYYY_ZZZ where XX is the patient number, YYYY is the frame number, and ZZZ is the frame number in the pullback.

A description of the four datasets is provided below.
  1. Dataset_A is composed of 77 groups of five consecutive frames, obtained from a digital 40 MHz IVUS scanner, acquired from different patients. The middle frame (frame_XX_YYYY_003) is provided in both the DICOM and RF formats, while the frames (...._001 and ...._002) and (...._004 and ...._005) are provided only in the DICOM format (.png images). Each RF file is provided in a .dat file built as a concatenation of 1024 samples for 256 lines. Each line is obtained by sampling the RF signal using 12 bits and a sample rate of 200 Msamples/sec).
  2. Dataset_B is composed of 435 groups of five consecutive frames, obtained from a 20 MHz IVUS scanner, acquired from 10 patients. All the frames are provided in DICOM format (.png images). All the frames having the same XX number belong to the same pullback and the central frame "_003" has been chosen at subsequent gating positions.

For all the four datasets, only the middle frame (…_003) is manually labeled and should be reported ( as the result of the segmentation algorithm).

Two separate annotation files (for lumen and media) are available in the directory "LABELS" as a sequence of Cartesian coordinates. The corresponding files are named " lum_frame_ XX_YYYY_003.txt" and "med_frame_XX_YYYY_003.txt", for lumen and media, respectively.

The data folder distributed to the participants contains the training-set (about 25% of the whole data set) while he DCM/RF folders contain the training-set and the test-set (the remaining 75% of the data set not annotated, that should be segmented by each participants).