compressai.models#

CompressionModel#

class compressai.models.CompressionModel(entropy_bottleneck_channels=None, init_weights=None)[source]#

Base class for constructing an auto-encoder with any number of EntropyBottleneck or GaussianConditional modules.

aux_loss() → torch.Tensor[source]#

Returns the total auxiliary loss over all EntropyBottlenecks.

In contrast to the primary “net” loss used by the “net” optimizer, the “aux” loss is only used by the “aux” optimizer to update only the EntropyBottleneck.quantiles parameters. In fact, the “aux” loss does not depend on image data at all.

The purpose of the “aux” loss is to determine the range within which most of the mass of a given distribution is contained, as well as its median (i.e. 50% probability). That is, for a given distribution, the “aux” loss converges towards satisfying the following conditions for some chosen tail_mass probability:

cdf(quantiles[0]) = tail_mass / 2
cdf(quantiles[1]) = 0.5
cdf(quantiles[2]) = 1 - tail_mass / 2

This ensures that the concrete _quantized_cdfs operate primarily within a finitely supported region. Any symbols outside this range must be coded using some alternative method that does not involve the _quantized_cdfs. Luckily, one may choose a tail_mass probability that is sufficiently small so that this rarely occurs. It is important that we work with _quantized_cdfs that have a small finite support; otherwise, entropy coding runtime performance would suffer. Thus, tail_mass should not be too small, either!

load_state_dict(state_dict, strict=True)[source]#

Copy parameters and buffers from state_dict into this module and its descendants.

If strict is True, then the keys of state_dict must exactly match the keys returned by this module’s state_dict() function.

Warning

If assign is True the optimizer must be created after the call to load_state_dict.

Parameters

state_dict (dict) – a dict containing parameters and persistent buffers.
strict (bool, optional) – whether to strictly enforce that the keys in state_dict match the keys returned by this module’s state_dict() function. Default: True
assign (bool, optional) – whether to assign items in the state dictionary to their corresponding keys in the module instead of copying them inplace into the module’s current parameters and buffers. When False, the properties of the tensors in the current module are preserved while when True, the properties of the Tensors in the state dict are preserved. Default: False

Returns

missing_keys is a list of str containing the missing keys
unexpected_keys is a list of str containing the unexpected keys

Return type

NamedTuple with missing_keys and unexpected_keys fields

Note

If a parameter or buffer is registered as None and its corresponding key exists in state_dict, load_state_dict() will raise a RuntimeError.

update(scale_table=None, force=False, update_quantiles: bool = False)[source]#

Updates EntropyBottleneck and GaussianConditional CDFs.

Needs to be called once after training to be able to later perform the evaluation with an actual entropy coder.

Parameters

scale_table (torch.Tensor) – table of scales (i.e. stdev) for initializing the Gaussian distributions (default: 64 logarithmically spaced scales from 0.11 to 256)
force (bool) – overwrite previous values (default: False)
update_quantiles (bool) – fast update quantiles (default: False)

Returns

True if at least one of the modules was updated.

Return type

updated (bool)

SimpleVAECompressionModel#

class compressai.models.SimpleVAECompressionModel(entropy_bottleneck_channels=None, init_weights=None)[source]#

Simple VAE model with arbitrary latent codec.

       ┌───┐  y  ┌────┐ y_hat ┌───┐
x ──►──┤g_a├──►──┤ lc ├───►───┤g_s├──►── x_hat
       └───┘     └────┘       └───┘

FactorizedPrior#

class compressai.models.FactorizedPrior(N, M, **kwargs)[source]#

Factorized Prior model from J. Balle, D. Minnen, S. Singh, S.J. Hwang, N. Johnston: “Variational Image Compression with a Scale Hyperprior”, Int Conf. on Learning Representations (ICLR), 2018.

          ┌───┐    y
    x ──►─┤g_a├──►─┐
          └───┘    │
                   ▼
                 ┌─┴─┐
                 │ Q │
                 └─┬─┘
                   │
             y_hat ▼
                   │
                   ·
                EB :
                   ·
                   │
             y_hat ▼
                   │
          ┌───┐    │
x_hat ──◄─┤g_s├────┘
          └───┘

EB = Entropy bottleneck

Parameters

N (int) – Number of channels
M (int) – Number of channels in the expansion layers (last layer of the encoder and last layer of the hyperprior decoder)

ScaleHyperprior#

class compressai.models.ScaleHyperprior(N, M, **kwargs)[source]#

Scale Hyperprior model from J. Balle, D. Minnen, S. Singh, S.J. Hwang, N. Johnston: “Variational Image Compression with a Scale Hyperprior” Int. Conf. on Learning Representations (ICLR), 2018.

          ┌───┐    y     ┌───┐  z  ┌───┐ z_hat      z_hat ┌───┐
    x ──►─┤g_a├──►─┬──►──┤h_a├──►──┤ Q ├───►───·⋯⋯·───►───┤h_s├─┐
          └───┘    │     └───┘     └───┘        EB        └───┘ │
                   ▼                                            │
                 ┌─┴─┐                                          │
                 │ Q │                                          ▼
                 └─┬─┘                                          │
                   │                                            │
             y_hat ▼                                            │
                   │                                            │
                   ·                                            │
                GC : ◄─────────────────────◄────────────────────┘
                   ·                 scales_hat
                   │
             y_hat ▼
                   │
          ┌───┐    │
x_hat ──◄─┤g_s├────┘
          └───┘

EB = Entropy bottleneck
GC = Gaussian conditional

Parameters

N (int) – Number of channels
M (int) – Number of channels in the expansion layers (last layer of the encoder and last layer of the hyperprior decoder)

MeanScaleHyperprior#

class compressai.models.MeanScaleHyperprior(N, M, **kwargs)[source]#

Scale Hyperprior with non zero-mean Gaussian conditionals from D. Minnen, J. Balle, G.D. Toderici: “Joint Autoregressive and Hierarchical Priors for Learned Image Compression”, Adv. in Neural Information Processing Systems 31 (NeurIPS 2018).

          ┌───┐    y     ┌───┐  z  ┌───┐ z_hat      z_hat ┌───┐
    x ──►─┤g_a├──►─┬──►──┤h_a├──►──┤ Q ├───►───·⋯⋯·───►───┤h_s├─┐
          └───┘    │     └───┘     └───┘        EB        └───┘ │
                   ▼                                            │
                 ┌─┴─┐                                          │
                 │ Q │                                          ▼
                 └─┬─┘                                          │
                   │                                            │
             y_hat ▼                                            │
                   │                                            │
                   ·                                            │
                GC : ◄─────────────────────◄────────────────────┘
                   ·                 scales_hat
                   │                 means_hat
             y_hat ▼
                   │
          ┌───┐    │
x_hat ──◄─┤g_s├────┘
          └───┘

EB = Entropy bottleneck
GC = Gaussian conditional

Parameters

N (int) – Number of channels
M (int) – Number of channels in the expansion layers (last layer of the encoder and last layer of the hyperprior decoder)

JointAutoregressiveHierarchicalPriors#

class compressai.models.JointAutoregressiveHierarchicalPriors(N=192, M=192, **kwargs)[source]#

Joint Autoregressive Hierarchical Priors model from D. Minnen, J. Balle, G.D. Toderici: “Joint Autoregressive and Hierarchical Priors for Learned Image Compression”, Adv. in Neural Information Processing Systems 31 (NeurIPS 2018).

          ┌───┐    y     ┌───┐  z  ┌───┐ z_hat      z_hat ┌───┐
    x ──►─┤g_a├──►─┬──►──┤h_a├──►──┤ Q ├───►───·⋯⋯·───►───┤h_s├─┐
          └───┘    │     └───┘     └───┘        EB        └───┘ │
                   ▼                                            │
                 ┌─┴─┐                                          │
                 │ Q │                                   params ▼
                 └─┬─┘                                          │
             y_hat ▼                  ┌─────┐                   │
                   ├──────────►───────┤  CP ├────────►──────────┤
                   │                  └─────┘                   │
                   ▼                                            ▼
                   │                                            │
                   ·                  ┌─────┐                   │
                GC : ◄────────◄───────┤  EP ├────────◄──────────┘
                   ·     scales_hat   └─────┘
                   │      means_hat
             y_hat ▼
                   │
          ┌───┐    │
x_hat ──◄─┤g_s├────┘
          └───┘

EB = Entropy bottleneck
GC = Gaussian conditional
EP = Entropy parameters network
CP = Context prediction (masked convolution)

Parameters

N (int) – Number of channels
M (int) – Number of channels in the expansion layers (last layer of the encoder and last layer of the hyperprior decoder)

Cheng2020Anchor#

class compressai.models.Cheng2020Anchor(N=192, **kwargs)[source]#

Anchor model variant from “Learned Image Compression with Discretized Gaussian Mixture Likelihoods and Attention Modules”, by Zhengxue Cheng, Heming Sun, Masaru Takeuchi, Jiro Katto.

Uses residual blocks with small convolutions (3x3 and 1x1), and sub-pixel convolutions for up-sampling.

Parameters: N (int) – Number of channels

Cheng2020Attention#

class compressai.models.Cheng2020Attention(N=192, **kwargs)[source]#

Self-attention model variant from “Learned Image Compression with Discretized Gaussian Mixture Likelihoods and Attention Modules”, by Zhengxue Cheng, Heming Sun, Masaru Takeuchi, Jiro Katto.

Uses self-attention, residual blocks with small convolutions (3x3 and 1x1), and sub-pixel convolutions for up-sampling.

Parameters: N (int) – Number of channels

Cheng2020AnchorCheckerboard#

class compressai.models.Cheng2020AnchorCheckerboard(N=192, **kwargs)[source]#

Cheng2020 anchor model with checkerboard context model.

Base transform model from [Cheng2020]. Context model from [He2021].

[Cheng2020]: “Learned Image Compression with Discretized Gaussian Mixture Likelihoods and Attention Modules”, by Zhengxue Cheng, Heming Sun, Masaru Takeuchi, and Jiro Katto, CVPR 2020.

[He2021]: “Checkerboard Context Model for Efficient Learned Image Compression”, by Dailan He, Yaoyan Zheng, Baocheng Sun, Yan Wang, and Hongwei Qin, CVPR 2021.

Uses residual blocks with small convolutions (3x3 and 1x1), and sub-pixel convolutions for up-sampling.

Parameters: N (int) – Number of channels

Elic2022Official#

class compressai.models.Elic2022Official(N=192, M=320, groups=None, **kwargs)[source]#

ELIC 2022; uneven channel groups with checkerboard spatial context.

Context model from [He2022]. Based on modified attention model architecture from [Cheng2020].

[He2022]: “ELIC: Efficient Learned Image Compression with Unevenly Grouped Space-Channel Contextual Adaptive Coding”, by Dailan He, Ziming Yang, Weikun Peng, Rui Ma, Hongwei Qin, and Yan Wang, CVPR 2022.

[Cheng2020]: “Learned Image Compression with Discretized Gaussian Mixture Likelihoods and Attention Modules”, by Zhengxue Cheng, Heming Sun, Masaru Takeuchi, and Jiro Katto, CVPR 2020.

Parameters

N (int) – Number of main network channels
M (int) – Number of latent space channels
groups (list[int]) – Number of channels in each channel group

Elic2022Chandelier#

class compressai.models.Elic2022Chandelier(N=192, M=320, groups=None, **kwargs)[source]#

ELIC 2022; simplified context model using only first and most recent groups.

Context model from [He2022], with simplifications and parameters from the [Chandelier2023] implementation. Based on modified attention model architecture from [Cheng2020].

Note

This implementation contains some differences compared to the original [He2022] paper. For instance, the implemented context model only uses the first and the most recently decoded channel groups to predict the current channel group. In contrast, the original paper uses all previously decoded channel groups. Also, the last layer of h_s is now a conv rather than a deconv.

[Chandelier2023]: “ELiC-ReImplemetation”, by Vincent Chandelier, 2023.

[Cheng2020]: “Learned Image Compression with Discretized Gaussian Mixture Likelihoods and Attention Modules”, by Zhengxue Cheng, Heming Sun, Masaru Takeuchi, and Jiro Katto, CVPR 2020.

Parameters

N (int) – Number of main network channels
M (int) – Number of latent space channels
groups (list[int]) – Number of channels in each channel group

ScaleHyperpriorVbr#

class compressai.models.ScaleHyperpriorVbr(N, M, vr_entbttlnck=False, **kwargs)[source]#: Variable bitrate (vbr) version of bmshj2018-hyperprior (see compressai/models/google.py) with variable bitrate components detailed in: Fatih Kamisli, Fabien Racape and Hyomin Choi “Variable-Rate Learned Image Compression with Multi-Objective Optimization and Quantization-Reconstruction Offsets” <https://arxiv.org/abs/2402.18930>`_, Data Compression Conference (DCC), 2024.

MeanScaleHyperpriorVbr#

class compressai.models.MeanScaleHyperpriorVbr(N=192, M=320, vr_entbttlnck=False, **kwargs)[source]#: Variable bitrate (vbr) version of mbt2018-mean (see compressai/models/google.py) with variable bitrate components detailed in: Fatih Kamisli, Fabien Racape and Hyomin Choi “Variable-Rate Learned Image Compression with Multi-Objective Optimization and Quantization-Reconstruction Offsets” <https://arxiv.org/abs/2402.18930>`_, Data Compression Conference (DCC), 2024.

JointAutoregressiveHierarchicalPriorsVbr#

class compressai.models.JointAutoregressiveHierarchicalPriorsVbr(N=192, M=320, **kwargs)[source]#: Variable bitrate (vbr) version of mbt2018 (see compressai/models/google.py) with variable bitrate components detailed in: Fatih Kamisli, Fabien Racape and Hyomin Choi “Variable-Rate Learned Image Compression with Multi-Objective Optimization and Quantization-Reconstruction Offsets” <https://arxiv.org/abs/2402.18930>`_, Data Compression Conference (DCC), 2024.

ScaleSpaceFlow#

class compressai.models.video.ScaleSpaceFlow(num_levels: int = 5, sigma0: float = 1.5, scale_field_shift: float = 1.0)[source]#

Google’s first end-to-end optimized video compression from E. Agustsson, D. Minnen, N. Johnston, J. Balle, S. J. Hwang, G. Toderici: “Scale-space flow for end-to-end optimized video compression”, IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2020).

Parameters

num_levels (int) – Number of Scale-space
sigma0 (float) – standard deviation for gaussian kernel of the first space scale.
scale_field_shift (float) –

DensityPreservingReconstructionPccModel#

class compressai.models.pointcloud.DensityPreservingReconstructionPccModel(downsample_rate=(0.3333333333333333, 0.3333333333333333, 0.3333333333333333), candidate_upsample_rate=(8, 8, 8), in_dim=3, feat_dim=8, hidden_dim=64, k=16, ngroups=1, sub_point_conv_mode='mlp', compress_normal=False, latent_xyzs_codec=None, **kwargs)[source]#

Density-preserving deep point cloud compression.

Model introduced by [He2022pcc].

References

He2022pcc: “Density-preserving Deep Point Cloud Compression”, by Yun He, Xinlin Ren, Danhang Tang, Yinda Zhang, Xiangyang Xue, and Yanwei Fu, CVPR 2022.

PointNetReconstructionPccModel#

class compressai.models.pointcloud.PointNetReconstructionPccModel(num_points=1024, num_channels={'g_a': [3, 64, 64, 64, 128, 1024], 'g_s': [1024, 256, 512, 3072]}, groups={'g_a': [1, 1, 1, 1, 1]})[source]#

PointNet-based PCC reconstruction model.

Model based on PointNet [Qi2017PointNet], modified for compression by [Yan2019], with layer configurations and other modifications as used in [Ulhaq2023].

References

Qi2017PointNet: “PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation”, by Charles R. Qi, Hao Su, Kaichun Mo, and Leonidas J. Guibas, CVPR 2017.
Yan2019: “Deep AutoEncoder-based Lossy Geometry Compression for Point Clouds”, by Wei Yan, Yiting Shao, Shan Liu, Thomas H Li, Zhu Li, and Ge Li, 2019.
Ulhaq2023: “Learned Point Cloud Compression for Classification”, by Mateen Ulhaq and Ivan V. Bajić, MMSP 2023.

PointNet2SsgReconstructionPccModel#

class compressai.models.pointcloud.PointNet2SsgReconstructionPccModel(num_points=1024, num_classes=40, D=(0, 128, 192, 256), P=(1024, 256, 64, 1), S=(None, 4, 4, 64), R=(None, 0.2, 0.4, None), E=(3, 64, 32, 16, 0), M=(0, 0, 64, 64), normal_channel=False)[source]#

PointNet++-based PCC reconstruction model.

Model based on PointNet++ [Qi2017PointNetPlusPlus], and modified for compression by [Ulhaq2024]. Uses single-scale grouping (SSG) for point set abstraction.

References

Qi2017PointNetPlusPlus: “PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space”, by Charles R. Qi, Li Yi, Hao Su, and Leonidas J. Guibas, NIPS 2017.
Ulhaq2024: “Scalable Human-Machine Point Cloud Compression”, by Mateen Ulhaq and Ivan V. Bajić, PCS 2024.

compressai.models

Contents

compressai.models#

CompressionModel#

SimpleVAECompressionModel#

FactorizedPrior#

ScaleHyperprior#

MeanScaleHyperprior#

JointAutoregressiveHierarchicalPriors#

Cheng2020Anchor#

Cheng2020Attention#

Cheng2020AnchorCheckerboard#

Elic2022Official#

Elic2022Chandelier#

ScaleHyperpriorVbr#

MeanScaleHyperpriorVbr#

JointAutoregressiveHierarchicalPriorsVbr#

ScaleSpaceFlow#

DensityPreservingReconstructionPccModel#

PointNetReconstructionPccModel#

PointNet2SsgReconstructionPccModel#