compressai.datasets#
Image/video datasets#
ImageFolder#
- class compressai.datasets.ImageFolder(root, transform=None, split='train')[source]#
Load an image folder database. Training and testing image samples are respectively stored in separate directories:
- rootdir/ - train/ - img000.png - img001.png - test/ - img000.png - img001.png
- Parameters:
root (string) – root directory of the dataset
transform (callable, optional) – a function or transform that takes in a PIL image and returns a transformed version
split (string) – split mode (‘train’ or ‘val’)
PreGeneratedMemmapDataset#
- class compressai.datasets.PreGeneratedMemmapDataset(root: str, transform=None, split: str = 'train', image_size: int | Tuple[int, int] = (256, 256))[source]#
A data loader for memory-mapped numpy arrays.
This allows for fast training where the images patches have already been extracted and shuffled. The numpy array in expected to have the following size: NxHxWx3, with N the number of samples, H and W the images dimensions.
- Parameters:
root (string) – root directory where the numpy arrays are located.
image_size (int, int) – size of the images in the array.
patch_size (int) – size of the patches to be randomly cropped for training.
split (string) – split mode (‘train’ or ‘val’).
batch_size (int) – batch size.
num_workers (int) – number of CPU thread workers.
pin_memory (bool) – pin memory.
VideoFolder#
- class compressai.datasets.VideoFolder(root, rnd_interval=False, rnd_temp_order=False, transform=None, split='train')[source]#
Load a video folder database. Training and testing video clips are stored in a directorie containing mnay sub-directorie like Vimeo90K Dataset:
- rootdir/ train.list test.list - sequences/ - 00010/ ... -0932/ -0933/ ... - 00011/ ... - 00012/ ...
training and testing (valid) clips are withdrew from sub-directory navigated by corresponding input files listing relevant folders.
This class returns a set of three video frames in a tuple. Random interval can be applied to if subfolders includes more than 6 frames.
- Parameters:
root (string) – root directory of the dataset
rnd_interval (bool) – enable random interval [1,2,3] when drawing sample frames
transform (callable, optional) – a function or transform that takes in a PIL image and returns a transformed version
split (string) – split mode (‘train’ or ‘test’)
Vimeo90kDataset#
- class compressai.datasets.Vimeo90kDataset(root, transform=None, split='train', tuplet=3)[source]#
Load a Vimeo-90K structured dataset.
Vimeo-90K dataset from Tianfan Xue, Baian Chen, Jiajun Wu, Donglai Wei, William T. Freeman: “Video Enhancement with Task-Oriented Flow”, International Journal of Computer Vision (IJCV), 2019.
Training and testing image samples are respectively stored in separate directories:
- rootdir/ - sequence/ - 00001/001/im1.png - 00001/001/im2.png - 00001/001/im3.png
- Parameters:
root (string) – root directory of the dataset
transform (callable, optional) – a function or transform that takes in a PIL image and returns a transformed version
split (string) – split mode (‘train’ or ‘valid’)
tuplet (int) – order of dataset tuplet (e.g. 3 for “triplet” dataset)
Point cloud datasets#
ModelNetDataset#
- class compressai.datasets.ModelNetDataset(root=None, cache_root=None, split='train', split_name=None, name='40', pre_transform=None, transform=None, download=True)[source]#
ModelNet dataset.
This dataset of 3D CAD models of objects was introduced by [Wu2015], consisting of 10 or 40 classes, with 4899 and 12311 aligned items, respectively. Each 3D model is represented in the OFF file format by a triangle mesh (i.e. faces) and has a single label (e.g. airplane). To convert the triangle meshes to point clouds, one may use a mesh sampling method (e.g.
SamplePoints
).See also: [PapersWithCode_ModelNet].
References
[Wu2015]“3D ShapeNets: A deep representation for volumetric shapes,”, by Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, and Jianxiong Xiao, CVPR 2015.
S3disDataset#
- class compressai.datasets.S3disDataset(root=None, cache_root=None, split='train', split_name=None, areas=(1, 2, 3, 4, 6), pre_transform=None, transform=None, download=True)[source]#
S3DIS dataset.
The Stanford 3D Indoor Scene Dataset (S3DIS) dataset, introduced by [Armeni2012], contains 3D point clouds of 6 large-scale indoor areas. There are multiple rooms (e.g. office, lounge, hallway, etc) per area. See the [ProjectPage_S3DIS] for a visualization.
The
semantic_index
is a number between 0 and 12 (inclusive), which can be used as the semantic label for each point.See also: [PapersWithCode_S3DIS].
References
[Armeni2012]“3D Semantic Parsing of Large-Scale Indoor Spaces,”, by Iro Armeni, Ozan Sener, Amir R. Zamir, Helen Jiang, Ioannis Brilakis, Martin Fischer, and Silvio Savarese, CVPR 2012.
SemanticKittiDataset#
- class compressai.datasets.SemanticKittiDataset(root=None, cache_root=None, split='train', split_name=None, sequences=(0, 1, 2, 3, 4, 5, 6, 7, 9, 10), pre_transform=None, transform=None, download=True)[source]#
SemanticKITTI dataset.
The KITTI dataset, introduced by [Geiger2012], contains 3D point clouds sequences (i.e. video) of LiDAR sensor data from the perspective of a driving vehicle. The SemanticKITTI dataset, introduced by [Behley2019] and [Behley2021], provides semantic annotation of all 22 sequences from the odometry task [Odometry_KITTI] of KITTI. See the [ProjectPage_SemanticKITTI] for a visualization. Note that the test set is unlabelled, and must be evaluated on the server, as mentioned at [ProjectPageTasks_SemanticKITTI].
The
semantic_index
is a number between 0 and 33 (inclusive), which can be used as the semantic label for each point.See also: [PapersWithCode_SemanticKITTI].
References
[Geiger2012]“Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite,”, by Andreas Geiger, Philip Lenz, and Raquel Urtasun, CVPR 2012.
[Behley2019]“SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences,”, by Jens Behley, Martin Garbade, Andres Milioto, Jan Quenzel, Sven Behnke, Cyrill Stachniss, and Juergen Gall, ICCV 2019.
[Behley2021]“Towards 3D LiDAR-based semantic scene understanding of 3D point cloud sequences: The SemanticKITTI Dataset,”, by Jens Behley, Martin Garbade, Andres Milioto, Jan Quenzel, Sven Behnke, Jürgen Gall, and Cyrill Stachniss, IJRR 2021.
ShapeNetCorePartDataset#
- class compressai.datasets.ShapeNetCorePartDataset(root=None, cache_root=None, split='train', split_name=None, pre_transform=None, transform=None, name='shapenetcore_partanno_segmentation_benchmark_v0_normal', download=True)[source]#
ShapeNet-Part dataset.
The ShapeNet dataset of 3D CAD models of objects was introduced by [Yi2016], consisting of over 3000000 models. The ShapeNetCore (v2) dataset is a “clean” subset of ShapeNet, consisting of 51127 aligned items from 55 object categories. The ShapeNet-Part dataset is a further subset of this dataset, consisting of 16881 items from 16 object categories. See page 2 of [Yi2017] for additional description.
Object categories are labeled with two to six segmentation parts each, as shown in the image below. (Purple represents a “miscellaneous” part.)
[ProjectPage_ShapeNetPart] also releases a processed version of ShapeNet-Part containing point cloud and normals with expert-verified segmentations, which we use here.
The
semantic_index
is a number between 0 and 49 (inclusive), which can be used as the semantic label for each point.See also: [PapersWithCode_ShapeNetPart] (benchmarks).
References
[Yi2016]“A scalable active framework for region annotation in 3D shape collections,”, by Li Yi, Vladimir G. Kim, Duygu Ceylan, I-Chao Shen, Mengyan Yan, Hao Su, Cewu Lu, Qixing Huang, Alla Sheffer, and Leonidas Guibas, ACM Transactions on Graphics, 2016.
[Yi2017]“Large-scale 3D shape reconstruction and segmentation from ShapeNet Core55,”, by Li Yi et al. (total 50 authors), ICCV 2017.