compressai.datasets#

Image/video datasets#

ImageFolder#

class compressai.datasets.ImageFolder(root, transform=None, split='train')[source]#

Load an image folder database. Training and testing image samples are respectively stored in separate directories:

- rootdir/
    - train/
        - img000.png
        - img001.png
    - test/
        - img000.png
        - img001.png

Parameters

root (string) – root directory of the dataset
transform (callable, optional) – a function or transform that takes in a PIL image and returns a transformed version
split (string) – split mode (‘train’ or ‘val’)

PreGeneratedMemmapDataset#

class compressai.datasets.PreGeneratedMemmapDataset(root: str, transform=None, split: str = 'train', image_size: Union[int, Tuple[int, int]] = (256, 256))[source]#

A data loader for memory-mapped numpy arrays.

This allows for fast training where the images patches have already been extracted and shuffled. The numpy array in expected to have the following size: NxHxWx3, with N the number of samples, H and W the images dimensions.

Parameters

root (string) – root directory where the numpy arrays are located.
image_size (int, int) – size of the images in the array.
patch_size (int) – size of the patches to be randomly cropped for training.
split (string) – split mode (‘train’ or ‘val’).
batch_size (int) – batch size.
num_workers (int) – number of CPU thread workers.
pin_memory (bool) – pin memory.

VideoFolder#

class compressai.datasets.VideoFolder(root, rnd_interval=False, rnd_temp_order=False, transform=None, split='train')[source]#

Load a video folder database. Training and testing video clips are stored in a directorie containing mnay sub-directorie like Vimeo90K Dataset:

- rootdir/
    train.list
    test.list
    - sequences/
        - 00010/
            ...
            -0932/
            -0933/
            ...
        - 00011/
            ...
        - 00012/
            ...

training and testing (valid) clips are withdrew from sub-directory navigated by corresponding input files listing relevant folders.

This class returns a set of three video frames in a tuple. Random interval can be applied to if subfolders includes more than 6 frames.

Parameters

root (string) – root directory of the dataset
rnd_interval (bool) – enable random interval [1,2,3] when drawing sample frames
transform (callable, optional) – a function or transform that takes in a PIL image and returns a transformed version
split (string) – split mode (‘train’ or ‘test’)

Vimeo90kDataset#

class compressai.datasets.Vimeo90kDataset(root, transform=None, split='train', tuplet=3)[source]#

Load a Vimeo-90K structured dataset.

Vimeo-90K dataset from Tianfan Xue, Baian Chen, Jiajun Wu, Donglai Wei, William T. Freeman: “Video Enhancement with Task-Oriented Flow”, International Journal of Computer Vision (IJCV), 2019.

Training and testing image samples are respectively stored in separate directories:

- rootdir/
    - sequence/
        - 00001/001/im1.png
        - 00001/001/im2.png
        - 00001/001/im3.png

Parameters

root (string) – root directory of the dataset
transform (callable, optional) – a function or transform that takes in a PIL image and returns a transformed version
split (string) – split mode (‘train’ or ‘valid’)
tuplet (int) – order of dataset tuplet (e.g. 3 for “triplet” dataset)

Point cloud datasets#

ModelNetDataset#

class compressai.datasets.ModelNetDataset(root=None, cache_root=None, split='train', split_name=None, name='40', pre_transform=None, transform=None, download=True)[source]#

ModelNet dataset.

This dataset of 3D CAD models of objects was introduced by [Wu2015], consisting of 10 or 40 classes, with 4899 and 12311 aligned items, respectively. Each 3D model is represented in the OFF file format by a triangle mesh (i.e. faces) and has a single label (e.g. airplane). To convert the triangle meshes to point clouds, one may use a mesh sampling method (e.g. SamplePoints).

S3disDataset#

class compressai.datasets.S3disDataset(root=None, cache_root=None, split='train', split_name=None, areas=(1, 2, 3, 4, 6), pre_transform=None, transform=None, download=True)[source]#

S3DIS dataset.

The Stanford 3D Indoor Scene Dataset (S3DIS) dataset, introduced by [Armeni2012], contains 3D point clouds of 6 large-scale indoor areas. There are multiple rooms (e.g. office, lounge, hallway, etc) per area. See the [ProjectPage_S3DIS] for a visualization.

The semantic_index is a number between 0 and 12 (inclusive), which can be used as the semantic label for each point.

SemanticKittiDataset#

class compressai.datasets.SemanticKittiDataset(root=None, cache_root=None, split='train', split_name=None, sequences=(0, 1, 2, 3, 4, 5, 6, 7, 9, 10), pre_transform=None, transform=None, download=True)[source]#

SemanticKITTI dataset.

The KITTI dataset, introduced by [Geiger2012], contains 3D point clouds sequences (i.e. video) of LiDAR sensor data from the perspective of a driving vehicle. The SemanticKITTI dataset, introduced by [Behley2019] and [Behley2021], provides semantic annotation of all 22 sequences from the odometry task [Odometry_KITTI] of KITTI. See the [ProjectPage_SemanticKITTI] for a visualization. Note that the test set is unlabelled, and must be evaluated on the server, as mentioned at [ProjectPageTasks_SemanticKITTI].

The semantic_index is a number between 0 and 33 (inclusive), which can be used as the semantic label for each point.

References

Geiger2012: “Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite,”, by Andreas Geiger, Philip Lenz, and Raquel Urtasun, CVPR 2012.
Behley2019: “SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences,”, by Jens Behley, Martin Garbade, Andres Milioto, Jan Quenzel, Sven Behnke, Cyrill Stachniss, and Juergen Gall, ICCV 2019.
Behley2021: “Towards 3D LiDAR-based semantic scene understanding of 3D point cloud sequences: The SemanticKITTI Dataset,”, by Jens Behley, Martin Garbade, Andres Milioto, Jan Quenzel, Sven Behnke, Jürgen Gall, and Cyrill Stachniss, IJRR 2021.
ProjectPage_SemanticKITTI: Project page (SemanticKITTI)
ProjectPageTasks_SemanticKITTI: Project page: Tasks (SemanticKITTI)
Odometry_KITTI: “Visual Odometry / SLAM Evaluation 2012”
PapersWithCode_SemanticKITTI: PapersWithCode: SemanticKITTI

ShapeNetCorePartDataset#

class compressai.datasets.ShapeNetCorePartDataset(root=None, cache_root=None, split='train', split_name=None, pre_transform=None, transform=None, name='shapenetcore_partanno_segmentation_benchmark_v0_normal', download=True)[source]#

ShapeNet-Part dataset.

The ShapeNet dataset of 3D CAD models of objects was introduced by [Yi2016], consisting of over 3000000 models. The ShapeNetCore (v2) dataset is a “clean” subset of ShapeNet, consisting of 51127 aligned items from 55 object categories. The ShapeNet-Part dataset is a further subset of this dataset, consisting of 16881 items from 16 object categories. See page 2 of [Yi2017] for additional description.

Object categories are labeled with two to six segmentation parts each, as shown in the image below. (Purple represents a “miscellaneous” part.)

https://cs.stanford.edu/~ericyi/project_page/part_annotation/figures/categoriesNumbers.png

[ProjectPage_ShapeNetPart] also releases a processed version of ShapeNet-Part containing point cloud and normals with expert-verified segmentations, which we use here.

The semantic_index is a number between 0 and 49 (inclusive), which can be used as the semantic label for each point.

See also: [PapersWithCode_ShapeNetPart] (benchmarks).

References

Yi2016: “A scalable active framework for region annotation in 3D shape collections,”, by Li Yi, Vladimir G. Kim, Duygu Ceylan, I-Chao Shen, Mengyan Yan, Hao Su, Cewu Lu, Qixing Huang, Alla Sheffer, and Leonidas Guibas, ACM Transactions on Graphics, 2016.
Yi2017: “Large-scale 3D shape reconstruction and segmentation from ShapeNet Core55,”, by Li Yi et al. (total 50 authors), ICCV 2017.
ProjectPage_ShapeNetPart: Project page (ShapeNet-Part)
PapersWithCode_ShapeNetPart: PapersWithCode: ShapeNet-Part Benchmark (3D Part Segmentation)

compressai.datasets

Contents

compressai.datasets#

Image/video datasets#

ImageFolder#

PreGeneratedMemmapDataset#

VideoFolder#

Vimeo90kDataset#

Point cloud datasets#

ModelNetDataset#

S3disDataset#

SemanticKittiDataset#

ShapeNetCorePartDataset#