compressai.models#
CompressionModel#
- class compressai.models.CompressionModel(entropy_bottleneck_channels=None, init_weights=None)[source]#
Base class for constructing an auto-encoder with any number of EntropyBottleneck or GaussianConditional modules.
- aux_loss() Tensor [source]#
Returns the total auxiliary loss over all
EntropyBottleneck
s.In contrast to the primary “net” loss used by the “net” optimizer, the “aux” loss is only used by the “aux” optimizer to update only the
EntropyBottleneck.quantiles
parameters. In fact, the “aux” loss does not depend on image data at all.The purpose of the “aux” loss is to determine the range within which most of the mass of a given distribution is contained, as well as its median (i.e. 50% probability). That is, for a given distribution, the “aux” loss converges towards satisfying the following conditions for some chosen
tail_mass
probability:cdf(quantiles[0]) = tail_mass / 2
cdf(quantiles[1]) = 0.5
cdf(quantiles[2]) = 1 - tail_mass / 2
This ensures that the concrete
_quantized_cdf
s operate primarily within a finitely supported region. Any symbols outside this range must be coded using some alternative method that does not involve the_quantized_cdf
s. Luckily, one may choose atail_mass
probability that is sufficiently small so that this rarely occurs. It is important that we work with_quantized_cdf
s that have a small finite support; otherwise, entropy coding runtime performance would suffer. Thus,tail_mass
should not be too small, either!
- load_state_dict(state_dict, strict=True)[source]#
Copy parameters and buffers from
state_dict
into this module and its descendants.If
strict
isTrue
, then the keys ofstate_dict
must exactly match the keys returned by this module’sstate_dict()
function.Warning
If
assign
isTrue
the optimizer must be created after the call toload_state_dict
.- Parameters:
state_dict (dict) – a dict containing parameters and persistent buffers.
strict (bool, optional) – whether to strictly enforce that the keys in
state_dict
match the keys returned by this module’sstate_dict()
function. Default:True
assign (bool, optional) – whether to assign items in the state dictionary to their corresponding keys in the module instead of copying them inplace into the module’s current parameters and buffers. When
False
, the properties of the tensors in the current module are preserved while whenTrue
, the properties of the Tensors in the state dict are preserved. Default:False
- Returns:
missing_keys is a list of str containing the missing keys
unexpected_keys is a list of str containing the unexpected keys
- Return type:
NamedTuple
withmissing_keys
andunexpected_keys
fields
Note
If a parameter or buffer is registered as
None
and its corresponding key exists instate_dict
,load_state_dict()
will raise aRuntimeError
.
- update(scale_table=None, force=False, update_quantiles: bool = False)[source]#
Updates EntropyBottleneck and GaussianConditional CDFs.
Needs to be called once after training to be able to later perform the evaluation with an actual entropy coder.
- Parameters:
scale_table (torch.Tensor) – table of scales (i.e. stdev) for initializing the Gaussian distributions (default: 64 logarithmically spaced scales from 0.11 to 256)
force (bool) – overwrite previous values (default: False)
update_quantiles (bool) – fast update quantiles (default: False)
- Returns:
True if at least one of the modules was updated.
- Return type:
updated (bool)
SimpleVAECompressionModel#
FactorizedPrior#
- class compressai.models.FactorizedPrior(N, M, **kwargs)[source]#
Factorized Prior model from J. Balle, D. Minnen, S. Singh, S.J. Hwang, N. Johnston: “Variational Image Compression with a Scale Hyperprior”, Int Conf. on Learning Representations (ICLR), 2018.
┌───┐ y x ──►─┤g_a├──►─┐ └───┘ │ ▼ ┌─┴─┐ │ Q │ └─┬─┘ │ y_hat ▼ │ · EB : · │ y_hat ▼ │ ┌───┐ │ x_hat ──◄─┤g_s├────┘ └───┘ EB = Entropy bottleneck
- Parameters:
N (int) – Number of channels
M (int) – Number of channels in the expansion layers (last layer of the encoder and last layer of the hyperprior decoder)
ScaleHyperprior#
- class compressai.models.ScaleHyperprior(N, M, **kwargs)[source]#
Scale Hyperprior model from J. Balle, D. Minnen, S. Singh, S.J. Hwang, N. Johnston: “Variational Image Compression with a Scale Hyperprior” Int. Conf. on Learning Representations (ICLR), 2018.
┌───┐ y ┌───┐ z ┌───┐ z_hat z_hat ┌───┐ x ──►─┤g_a├──►─┬──►──┤h_a├──►──┤ Q ├───►───·⋯⋯·───►───┤h_s├─┐ └───┘ │ └───┘ └───┘ EB └───┘ │ ▼ │ ┌─┴─┐ │ │ Q │ ▼ └─┬─┘ │ │ │ y_hat ▼ │ │ │ · │ GC : ◄─────────────────────◄────────────────────┘ · scales_hat │ y_hat ▼ │ ┌───┐ │ x_hat ──◄─┤g_s├────┘ └───┘ EB = Entropy bottleneck GC = Gaussian conditional
- Parameters:
N (int) – Number of channels
M (int) – Number of channels in the expansion layers (last layer of the encoder and last layer of the hyperprior decoder)
MeanScaleHyperprior#
- class compressai.models.MeanScaleHyperprior(N, M, **kwargs)[source]#
Scale Hyperprior with non zero-mean Gaussian conditionals from D. Minnen, J. Balle, G.D. Toderici: “Joint Autoregressive and Hierarchical Priors for Learned Image Compression”, Adv. in Neural Information Processing Systems 31 (NeurIPS 2018).
┌───┐ y ┌───┐ z ┌───┐ z_hat z_hat ┌───┐ x ──►─┤g_a├──►─┬──►──┤h_a├──►──┤ Q ├───►───·⋯⋯·───►───┤h_s├─┐ └───┘ │ └───┘ └───┘ EB └───┘ │ ▼ │ ┌─┴─┐ │ │ Q │ ▼ └─┬─┘ │ │ │ y_hat ▼ │ │ │ · │ GC : ◄─────────────────────◄────────────────────┘ · scales_hat │ means_hat y_hat ▼ │ ┌───┐ │ x_hat ──◄─┤g_s├────┘ └───┘ EB = Entropy bottleneck GC = Gaussian conditional
- Parameters:
N (int) – Number of channels
M (int) – Number of channels in the expansion layers (last layer of the encoder and last layer of the hyperprior decoder)
JointAutoregressiveHierarchicalPriors#
- class compressai.models.JointAutoregressiveHierarchicalPriors(N=192, M=192, **kwargs)[source]#
Joint Autoregressive Hierarchical Priors model from D. Minnen, J. Balle, G.D. Toderici: “Joint Autoregressive and Hierarchical Priors for Learned Image Compression”, Adv. in Neural Information Processing Systems 31 (NeurIPS 2018).
┌───┐ y ┌───┐ z ┌───┐ z_hat z_hat ┌───┐ x ──►─┤g_a├──►─┬──►──┤h_a├──►──┤ Q ├───►───·⋯⋯·───►───┤h_s├─┐ └───┘ │ └───┘ └───┘ EB └───┘ │ ▼ │ ┌─┴─┐ │ │ Q │ params ▼ └─┬─┘ │ y_hat ▼ ┌─────┐ │ ├──────────►───────┤ CP ├────────►──────────┤ │ └─────┘ │ ▼ ▼ │ │ · ┌─────┐ │ GC : ◄────────◄───────┤ EP ├────────◄──────────┘ · scales_hat └─────┘ │ means_hat y_hat ▼ │ ┌───┐ │ x_hat ──◄─┤g_s├────┘ └───┘ EB = Entropy bottleneck GC = Gaussian conditional EP = Entropy parameters network CP = Context prediction (masked convolution)
- Parameters:
N (int) – Number of channels
M (int) – Number of channels in the expansion layers (last layer of the encoder and last layer of the hyperprior decoder)
Cheng2020Anchor#
- class compressai.models.Cheng2020Anchor(N=192, **kwargs)[source]#
Anchor model variant from “Learned Image Compression with Discretized Gaussian Mixture Likelihoods and Attention Modules”, by Zhengxue Cheng, Heming Sun, Masaru Takeuchi, Jiro Katto.
Uses residual blocks with small convolutions (3x3 and 1x1), and sub-pixel convolutions for up-sampling.
- Parameters:
N (int) – Number of channels
Cheng2020Attention#
- class compressai.models.Cheng2020Attention(N=192, **kwargs)[source]#
Self-attention model variant from “Learned Image Compression with Discretized Gaussian Mixture Likelihoods and Attention Modules”, by Zhengxue Cheng, Heming Sun, Masaru Takeuchi, Jiro Katto.
Uses self-attention, residual blocks with small convolutions (3x3 and 1x1), and sub-pixel convolutions for up-sampling.
- Parameters:
N (int) – Number of channels
Cheng2020AnchorCheckerboard#
- class compressai.models.Cheng2020AnchorCheckerboard(N=192, **kwargs)[source]#
Cheng2020 anchor model with checkerboard context model.
Base transform model from [Cheng2020]. Context model from [He2021].
[Cheng2020]: “Learned Image Compression with Discretized Gaussian Mixture Likelihoods and Attention Modules”, by Zhengxue Cheng, Heming Sun, Masaru Takeuchi, and Jiro Katto, CVPR 2020.
[He2021]: “Checkerboard Context Model for Efficient Learned Image Compression”, by Dailan He, Yaoyan Zheng, Baocheng Sun, Yan Wang, and Hongwei Qin, CVPR 2021.
Uses residual blocks with small convolutions (3x3 and 1x1), and sub-pixel convolutions for up-sampling.
- Parameters:
N (int) – Number of channels
Elic2022Official#
- class compressai.models.Elic2022Official(N=192, M=320, groups=None, **kwargs)[source]#
ELIC 2022; uneven channel groups with checkerboard spatial context.
Context model from [He2022]. Based on modified attention model architecture from [Cheng2020].
[He2022]: “ELIC: Efficient Learned Image Compression with Unevenly Grouped Space-Channel Contextual Adaptive Coding”, by Dailan He, Ziming Yang, Weikun Peng, Rui Ma, Hongwei Qin, and Yan Wang, CVPR 2022.
[Cheng2020]: “Learned Image Compression with Discretized Gaussian Mixture Likelihoods and Attention Modules”, by Zhengxue Cheng, Heming Sun, Masaru Takeuchi, and Jiro Katto, CVPR 2020.
- Parameters:
N (int) – Number of main network channels
M (int) – Number of latent space channels
groups (list[int]) – Number of channels in each channel group
Elic2022Chandelier#
- class compressai.models.Elic2022Chandelier(N=192, M=320, groups=None, **kwargs)[source]#
ELIC 2022; simplified context model using only first and most recent groups.
Context model from [He2022], with simplifications and parameters from the [Chandelier2023] implementation. Based on modified attention model architecture from [Cheng2020].
Note
This implementation contains some differences compared to the original [He2022] paper. For instance, the implemented context model only uses the first and the most recently decoded channel groups to predict the current channel group. In contrast, the original paper uses all previously decoded channel groups. Also, the last layer of h_s is now a conv rather than a deconv.
[Chandelier2023]: “ELiC-ReImplemetation”, by Vincent Chandelier, 2023.
[He2022]: “ELIC: Efficient Learned Image Compression with Unevenly Grouped Space-Channel Contextual Adaptive Coding”, by Dailan He, Ziming Yang, Weikun Peng, Rui Ma, Hongwei Qin, and Yan Wang, CVPR 2022.
[Cheng2020]: “Learned Image Compression with Discretized Gaussian Mixture Likelihoods and Attention Modules”, by Zhengxue Cheng, Heming Sun, Masaru Takeuchi, and Jiro Katto, CVPR 2020.
- Parameters:
N (int) – Number of main network channels
M (int) – Number of latent space channels
groups (list[int]) – Number of channels in each channel group
ScaleHyperpriorVbr#
- class compressai.models.ScaleHyperpriorVbr(N, M, vr_entbttlnck=False, **kwargs)[source]#
Variable bitrate (vbr) version of bmshj2018-hyperprior (see compressai/models/google.py) with variable bitrate components detailed in: Fatih Kamisli, Fabien Racape and Hyomin Choi “Variable-Rate Learned Image Compression with Multi-Objective Optimization and Quantization-Reconstruction Offsets” <https://arxiv.org/abs/2402.18930>`_, Data Compression Conference (DCC), 2024.
MeanScaleHyperpriorVbr#
- class compressai.models.MeanScaleHyperpriorVbr(N=192, M=320, vr_entbttlnck=False, **kwargs)[source]#
Variable bitrate (vbr) version of mbt2018-mean (see compressai/models/google.py) with variable bitrate components detailed in: Fatih Kamisli, Fabien Racape and Hyomin Choi “Variable-Rate Learned Image Compression with Multi-Objective Optimization and Quantization-Reconstruction Offsets” <https://arxiv.org/abs/2402.18930>`_, Data Compression Conference (DCC), 2024.
JointAutoregressiveHierarchicalPriorsVbr#
- class compressai.models.JointAutoregressiveHierarchicalPriorsVbr(N=192, M=320, **kwargs)[source]#
Variable bitrate (vbr) version of mbt2018 (see compressai/models/google.py) with variable bitrate components detailed in: Fatih Kamisli, Fabien Racape and Hyomin Choi “Variable-Rate Learned Image Compression with Multi-Objective Optimization and Quantization-Reconstruction Offsets” <https://arxiv.org/abs/2402.18930>`_, Data Compression Conference (DCC), 2024.
ScaleSpaceFlow#
- class compressai.models.video.ScaleSpaceFlow(num_levels: int = 5, sigma0: float = 1.5, scale_field_shift: float = 1.0)[source]#
Google’s first end-to-end optimized video compression from E. Agustsson, D. Minnen, N. Johnston, J. Balle, S. J. Hwang, G. Toderici: “Scale-space flow for end-to-end optimized video compression”, IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2020).
- Parameters:
num_levels (int) – Number of Scale-space
sigma0 (float) – standard deviation for gaussian kernel of the first space scale.
scale_field_shift (float)
DensityPreservingReconstructionPccModel#
- class compressai.models.pointcloud.DensityPreservingReconstructionPccModel(downsample_rate=(0.3333333333333333, 0.3333333333333333, 0.3333333333333333), candidate_upsample_rate=(8, 8, 8), in_dim=3, feat_dim=8, hidden_dim=64, k=16, ngroups=1, sub_point_conv_mode='mlp', compress_normal=False, latent_xyzs_codec=None, **kwargs)[source]#
Density-preserving deep point cloud compression.
Model introduced by [He2022pcc].
References
[He2022pcc]“Density-preserving Deep Point Cloud Compression”, by Yun He, Xinlin Ren, Danhang Tang, Yinda Zhang, Xiangyang Xue, and Yanwei Fu, CVPR 2022.
PointNetReconstructionPccModel#
- class compressai.models.pointcloud.PointNetReconstructionPccModel(num_points=1024, num_channels={'g_a': [3, 64, 64, 64, 128, 1024], 'g_s': [1024, 256, 512, 3072]}, groups={'g_a': [1, 1, 1, 1, 1]})[source]#
PointNet-based PCC reconstruction model.
Model based on PointNet [Qi2017PointNet], modified for compression by [Yan2019], with layer configurations and other modifications as used in [Ulhaq2023].
References
[Qi2017PointNet]“PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation”, by Charles R. Qi, Hao Su, Kaichun Mo, and Leonidas J. Guibas, CVPR 2017.
[Yan2019]“Deep AutoEncoder-based Lossy Geometry Compression for Point Clouds”, by Wei Yan, Yiting Shao, Shan Liu, Thomas H Li, Zhu Li, and Ge Li, 2019.
[Ulhaq2023]“Learned Point Cloud Compression for Classification”, by Mateen Ulhaq and Ivan V. Bajić, MMSP 2023.
PointNet2SsgReconstructionPccModel#
- class compressai.models.pointcloud.PointNet2SsgReconstructionPccModel(num_points=1024, num_classes=40, D=(0, 128, 192, 256), P=(1024, 256, 64, 1), S=(None, 4, 4, 64), R=(None, 0.2, 0.4, None), E=(3, 64, 32, 16, 0), M=(0, 0, 64, 64), normal_channel=False)[source]#
PointNet++-based PCC reconstruction model.
Model based on PointNet++ [Qi2017PointNetPlusPlus], and modified for compression by [Ulhaq2024]. Uses single-scale grouping (SSG) for point set abstraction.
References
[Qi2017PointNetPlusPlus]“PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space”, by Charles R. Qi, Li Yi, Hao Su, and Leonidas J. Guibas, NIPS 2017.
[Ulhaq2024]“Scalable Human-Machine Point Cloud Compression”, by Mateen Ulhaq and Ivan V. Bajić, PCS 2024.