compressai.datasets#

Image/video datasets#

ImageFolder#

class compressai.datasets.ImageFolder(root, transform=None, split='train')[source]#

Load an image folder database. Training and testing image samples are respectively stored in separate directories:

- rootdir/
    - train/
        - img000.png
        - img001.png
    - test/
        - img000.png
        - img001.png
Parameters:
  • root (string) – root directory of the dataset

  • transform (callable, optional) – a function or transform that takes in a PIL image and returns a transformed version

  • split (string) – split mode (‘train’ or ‘val’)

PreGeneratedMemmapDataset#

class compressai.datasets.PreGeneratedMemmapDataset(root: str, transform=None, split: str = 'train', image_size: int | Tuple[int, int] = (256, 256))[source]#

A data loader for memory-mapped numpy arrays.

This allows for fast training where the images patches have already been extracted and shuffled. The numpy array in expected to have the following size: NxHxWx3, with N the number of samples, H and W the images dimensions.

Parameters:
  • root (string) – root directory where the numpy arrays are located.

  • image_size (int, int) – size of the images in the array.

  • patch_size (int) – size of the patches to be randomly cropped for training.

  • split (string) – split mode (‘train’ or ‘val’).

  • batch_size (int) – batch size.

  • num_workers (int) – number of CPU thread workers.

  • pin_memory (bool) – pin memory.

VideoFolder#

class compressai.datasets.VideoFolder(root, rnd_interval=False, rnd_temp_order=False, transform=None, split='train')[source]#

Load a video folder database. Training and testing video clips are stored in a directorie containing mnay sub-directorie like Vimeo90K Dataset:

- rootdir/
    train.list
    test.list
    - sequences/
        - 00010/
            ...
            -0932/
            -0933/
            ...
        - 00011/
            ...
        - 00012/
            ...

training and testing (valid) clips are withdrew from sub-directory navigated by corresponding input files listing relevant folders.

This class returns a set of three video frames in a tuple. Random interval can be applied to if subfolders includes more than 6 frames.

Parameters:
  • root (string) – root directory of the dataset

  • rnd_interval (bool) – enable random interval [1,2,3] when drawing sample frames

  • transform (callable, optional) – a function or transform that takes in a PIL image and returns a transformed version

  • split (string) – split mode (‘train’ or ‘test’)

Vimeo90kDataset#

class compressai.datasets.Vimeo90kDataset(root, transform=None, split='train', tuplet=3)[source]#

Load a Vimeo-90K structured dataset.

Vimeo-90K dataset from Tianfan Xue, Baian Chen, Jiajun Wu, Donglai Wei, William T. Freeman: “Video Enhancement with Task-Oriented Flow”, International Journal of Computer Vision (IJCV), 2019.

Training and testing image samples are respectively stored in separate directories:

- rootdir/
    - sequence/
        - 00001/001/im1.png
        - 00001/001/im2.png
        - 00001/001/im3.png
Parameters:
  • root (string) – root directory of the dataset

  • transform (callable, optional) – a function or transform that takes in a PIL image and returns a transformed version

  • split (string) – split mode (‘train’ or ‘valid’)

  • tuplet (int) – order of dataset tuplet (e.g. 3 for “triplet” dataset)

Point cloud datasets#

ModelNetDataset#

class compressai.datasets.ModelNetDataset(root=None, cache_root=None, split='train', split_name=None, name='40', pre_transform=None, transform=None, download=True)[source]#

ModelNet dataset.

This dataset of 3D CAD models of objects was introduced by [Wu2015], consisting of 10 or 40 classes, with 4899 and 12311 aligned items, respectively. Each 3D model is represented in the OFF file format by a triangle mesh (i.e. faces) and has a single label (e.g. airplane). To convert the triangle meshes to point clouds, one may use a mesh sampling method (e.g. SamplePoints).

See also: [PapersWithCode_ModelNet].

References

[Wu2015]

“3D ShapeNets: A deep representation for volumetric shapes,”, by Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, and Jianxiong Xiao, CVPR 2015.

S3disDataset#

class compressai.datasets.S3disDataset(root=None, cache_root=None, split='train', split_name=None, areas=(1, 2, 3, 4, 6), pre_transform=None, transform=None, download=True)[source]#

S3DIS dataset.

The Stanford 3D Indoor Scene Dataset (S3DIS) dataset, introduced by [Armeni2012], contains 3D point clouds of 6 large-scale indoor areas. There are multiple rooms (e.g. office, lounge, hallway, etc) per area. See the [ProjectPage_S3DIS] for a visualization.

The semantic_index is a number between 0 and 12 (inclusive), which can be used as the semantic label for each point.

See also: [PapersWithCode_S3DIS].

References

[Armeni2012]

“3D Semantic Parsing of Large-Scale Indoor Spaces,”, by Iro Armeni, Ozan Sener, Amir R. Zamir, Helen Jiang, Ioannis Brilakis, Martin Fischer, and Silvio Savarese, CVPR 2012.

SemanticKittiDataset#

class compressai.datasets.SemanticKittiDataset(root=None, cache_root=None, split='train', split_name=None, sequences=(0, 1, 2, 3, 4, 5, 6, 7, 9, 10), pre_transform=None, transform=None, download=True)[source]#

SemanticKITTI dataset.

The KITTI dataset, introduced by [Geiger2012], contains 3D point clouds sequences (i.e. video) of LiDAR sensor data from the perspective of a driving vehicle. The SemanticKITTI dataset, introduced by [Behley2019] and [Behley2021], provides semantic annotation of all 22 sequences from the odometry task [Odometry_KITTI] of KITTI. See the [ProjectPage_SemanticKITTI] for a visualization. Note that the test set is unlabelled, and must be evaluated on the server, as mentioned at [ProjectPageTasks_SemanticKITTI].

The semantic_index is a number between 0 and 33 (inclusive), which can be used as the semantic label for each point.

See also: [PapersWithCode_SemanticKITTI].

References

ShapeNetCorePartDataset#

class compressai.datasets.ShapeNetCorePartDataset(root=None, cache_root=None, split='train', split_name=None, pre_transform=None, transform=None, name='shapenetcore_partanno_segmentation_benchmark_v0_normal', download=True)[source]#

ShapeNet-Part dataset.

The ShapeNet dataset of 3D CAD models of objects was introduced by [Yi2016], consisting of over 3000000 models. The ShapeNetCore (v2) dataset is a “clean” subset of ShapeNet, consisting of 51127 aligned items from 55 object categories. The ShapeNet-Part dataset is a further subset of this dataset, consisting of 16881 items from 16 object categories. See page 2 of [Yi2017] for additional description.

Object categories are labeled with two to six segmentation parts each, as shown in the image below. (Purple represents a “miscellaneous” part.)

https://cs.stanford.edu/~ericyi/project_page/part_annotation/figures/categoriesNumbers.png

[ProjectPage_ShapeNetPart] also releases a processed version of ShapeNet-Part containing point cloud and normals with expert-verified segmentations, which we use here.

The semantic_index is a number between 0 and 49 (inclusive), which can be used as the semantic label for each point.

See also: [PapersWithCode_ShapeNetPart] (benchmarks).

References

[Yi2016]

“A scalable active framework for region annotation in 3D shape collections,”, by Li Yi, Vladimir G. Kim, Duygu Ceylan, I-Chao Shen, Mengyan Yan, Hao Su, Cewu Lu, Qixing Huang, Alla Sheffer, and Leonidas Guibas, ACM Transactions on Graphics, 2016.