compressai.models#

CompressionModel#

class compressai.models.CompressionModel(entropy_bottleneck_channels=None, init_weights=None)[source]#

Base class for constructing an auto-encoder with any number of EntropyBottleneck or GaussianConditional modules.

aux_loss() Tensor[source]#

Returns the total auxiliary loss over all EntropyBottlenecks.

In contrast to the primary “net” loss used by the “net” optimizer, the “aux” loss is only used by the “aux” optimizer to update only the EntropyBottleneck.quantiles parameters. In fact, the “aux” loss does not depend on image data at all.

The purpose of the “aux” loss is to determine the range within which most of the mass of a given distribution is contained, as well as its median (i.e. 50% probability). That is, for a given distribution, the “aux” loss converges towards satisfying the following conditions for some chosen tail_mass probability:

  • cdf(quantiles[0]) = tail_mass / 2

  • cdf(quantiles[1]) = 0.5

  • cdf(quantiles[2]) = 1 - tail_mass / 2

This ensures that the concrete _quantized_cdfs operate primarily within a finitely supported region. Any symbols outside this range must be coded using some alternative method that does not involve the _quantized_cdfs. Luckily, one may choose a tail_mass probability that is sufficiently small so that this rarely occurs. It is important that we work with _quantized_cdfs that have a small finite support; otherwise, entropy coding runtime performance would suffer. Thus, tail_mass should not be too small, either!

load_state_dict(state_dict, strict=True)[source]#

Copies parameters and buffers from state_dict into this module and its descendants. If strict is True, then the keys of state_dict must exactly match the keys returned by this module’s state_dict() function.

Parameters:
  • state_dict (dict) – a dict containing parameters and persistent buffers.

  • strict (bool, optional) – whether to strictly enforce that the keys in state_dict match the keys returned by this module’s state_dict() function. Default: True

Returns:

  • missing_keys is a list of str containing the missing keys

  • unexpected_keys is a list of str containing the unexpected keys

Return type:

NamedTuple with missing_keys and unexpected_keys fields

Note

If a parameter or buffer is registered as None and its corresponding key exists in state_dict, load_state_dict() will raise a RuntimeError.

update(scale_table=None, force=False)[source]#

Updates EntropyBottleneck and GaussianConditional CDFs.

Needs to be called once after training to be able to later perform the evaluation with an actual entropy coder.

Parameters:
  • scale_table (torch.Tensor) – table of scales (i.e. stdev) for initializing the Gaussian distributions (default: 64 logarithmically spaced scales from 0.11 to 256)

  • force (bool) – overwrite previous values (default: False)

Returns:

True if at least one of the modules was updated.

Return type:

updated (bool)

SimpleVAECompressionModel#

class compressai.models.SimpleVAECompressionModel(entropy_bottleneck_channels=None, init_weights=None)[source]#

Simple VAE model with arbitrary latent codec.

       ┌───┐  y  ┌────┐ y_hat ┌───┐
x ──►──┤g_a├──►──┤ lc ├───►───┤g_s├──►── x_hat
       └───┘     └────┘       └───┘

FactorizedPrior#

class compressai.models.FactorizedPrior(N, M, **kwargs)[source]#

Factorized Prior model from J. Balle, D. Minnen, S. Singh, S.J. Hwang, N. Johnston: “Variational Image Compression with a Scale Hyperprior”, Int Conf. on Learning Representations (ICLR), 2018.

          ┌───┐    y
    x ──►─┤g_a├──►─┐
          └───┘    │
                   ▼
                 ┌─┴─┐
                 │ Q │
                 └─┬─┘
                   │
             y_hat ▼
                   │
                   ·
                EB :
                   ·
                   │
             y_hat ▼
                   │
          ┌───┐    │
x_hat ──◄─┤g_s├────┘
          └───┘

EB = Entropy bottleneck
Parameters:
  • N (int) – Number of channels

  • M (int) – Number of channels in the expansion layers (last layer of the encoder and last layer of the hyperprior decoder)

ScaleHyperprior#

class compressai.models.ScaleHyperprior(N, M, **kwargs)[source]#

Scale Hyperprior model from J. Balle, D. Minnen, S. Singh, S.J. Hwang, N. Johnston: “Variational Image Compression with a Scale Hyperprior” Int. Conf. on Learning Representations (ICLR), 2018.

          ┌───┐    y     ┌───┐  z  ┌───┐ z_hat      z_hat ┌───┐
    x ──►─┤g_a├──►─┬──►──┤h_a├──►──┤ Q ├───►───·⋯⋯·───►───┤h_s├─┐
          └───┘    │     └───┘     └───┘        EB        └───┘ │
                   ▼                                            │
                 ┌─┴─┐                                          │
                 │ Q │                                          ▼
                 └─┬─┘                                          │
                   │                                            │
             y_hat ▼                                            │
                   │                                            │
                   ·                                            │
                GC : ◄─────────────────────◄────────────────────┘
                   ·                 scales_hat
                   │
             y_hat ▼
                   │
          ┌───┐    │
x_hat ──◄─┤g_s├────┘
          └───┘

EB = Entropy bottleneck
GC = Gaussian conditional
Parameters:
  • N (int) – Number of channels

  • M (int) – Number of channels in the expansion layers (last layer of the encoder and last layer of the hyperprior decoder)

MeanScaleHyperprior#

class compressai.models.MeanScaleHyperprior(N, M, **kwargs)[source]#

Scale Hyperprior with non zero-mean Gaussian conditionals from D. Minnen, J. Balle, G.D. Toderici: “Joint Autoregressive and Hierarchical Priors for Learned Image Compression”, Adv. in Neural Information Processing Systems 31 (NeurIPS 2018).

          ┌───┐    y     ┌───┐  z  ┌───┐ z_hat      z_hat ┌───┐
    x ──►─┤g_a├──►─┬──►──┤h_a├──►──┤ Q ├───►───·⋯⋯·───►───┤h_s├─┐
          └───┘    │     └───┘     └───┘        EB        └───┘ │
                   ▼                                            │
                 ┌─┴─┐                                          │
                 │ Q │                                          ▼
                 └─┬─┘                                          │
                   │                                            │
             y_hat ▼                                            │
                   │                                            │
                   ·                                            │
                GC : ◄─────────────────────◄────────────────────┘
                   ·                 scales_hat
                   │                 means_hat
             y_hat ▼
                   │
          ┌───┐    │
x_hat ──◄─┤g_s├────┘
          └───┘

EB = Entropy bottleneck
GC = Gaussian conditional
Parameters:
  • N (int) – Number of channels

  • M (int) – Number of channels in the expansion layers (last layer of the encoder and last layer of the hyperprior decoder)

JointAutoregressiveHierarchicalPriors#

class compressai.models.JointAutoregressiveHierarchicalPriors(N=192, M=192, **kwargs)[source]#

Joint Autoregressive Hierarchical Priors model from D. Minnen, J. Balle, G.D. Toderici: “Joint Autoregressive and Hierarchical Priors for Learned Image Compression”, Adv. in Neural Information Processing Systems 31 (NeurIPS 2018).

          ┌───┐    y     ┌───┐  z  ┌───┐ z_hat      z_hat ┌───┐
    x ──►─┤g_a├──►─┬──►──┤h_a├──►──┤ Q ├───►───·⋯⋯·───►───┤h_s├─┐
          └───┘    │     └───┘     └───┘        EB        └───┘ │
                   ▼                                            │
                 ┌─┴─┐                                          │
                 │ Q │                                   params ▼
                 └─┬─┘                                          │
             y_hat ▼                  ┌─────┐                   │
                   ├──────────►───────┤  CP ├────────►──────────┤
                   │                  └─────┘                   │
                   ▼                                            ▼
                   │                                            │
                   ·                  ┌─────┐                   │
                GC : ◄────────◄───────┤  EP ├────────◄──────────┘
                   ·     scales_hat   └─────┘
                   │      means_hat
             y_hat ▼
                   │
          ┌───┐    │
x_hat ──◄─┤g_s├────┘
          └───┘

EB = Entropy bottleneck
GC = Gaussian conditional
EP = Entropy parameters network
CP = Context prediction (masked convolution)
Parameters:
  • N (int) – Number of channels

  • M (int) – Number of channels in the expansion layers (last layer of the encoder and last layer of the hyperprior decoder)

Cheng2020Anchor#

class compressai.models.Cheng2020Anchor(N=192, **kwargs)[source]#

Anchor model variant from “Learned Image Compression with Discretized Gaussian Mixture Likelihoods and Attention Modules”, by Zhengxue Cheng, Heming Sun, Masaru Takeuchi, Jiro Katto.

Uses residual blocks with small convolutions (3x3 and 1x1), and sub-pixel convolutions for up-sampling.

Parameters:

N (int) – Number of channels

Cheng2020Attention#

class compressai.models.Cheng2020Attention(N=192, **kwargs)[source]#

Self-attention model variant from “Learned Image Compression with Discretized Gaussian Mixture Likelihoods and Attention Modules”, by Zhengxue Cheng, Heming Sun, Masaru Takeuchi, Jiro Katto.

Uses self-attention, residual blocks with small convolutions (3x3 and 1x1), and sub-pixel convolutions for up-sampling.

Parameters:

N (int) – Number of channels

ScaleSpaceFlow#

class compressai.models.video.ScaleSpaceFlow(num_levels: int = 5, sigma0: float = 1.5, scale_field_shift: float = 1.0)[source]#

Google’s first end-to-end optimized video compression from E. Agustsson, D. Minnen, N. Johnston, J. Balle, S. J. Hwang, G. Toderici: “Scale-space flow for end-to-end optimized video compression”, IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2020).

Parameters:
  • num_levels (int) – Number of Scale-space

  • sigma0 (float) – standard deviation for gaussian kernel of the first space scale.

  • scale_field_shift (float) –