compressai.latent_codecs#

A LatentCodec is an abstraction for compressing a latent space using some entropy modeling technique. A LatentCodec can be thought of as a miniature CompressionModel. In fact, it implements some of the same methods: forward, compress, and decompress, as described in Defining a custom latent codec. By composing latent codecs, we can easily create more complex entropy models.

CompressAI provides the following predefined LatentCodec subclasses:

Module name	Description
`EntropyBottleneckLatentCodec`	Uses an `EntropyBottleneck` to encode `y`.
`GaussianConditionalLatentCodec`	Uses a `GaussianConditional` to encode `y` using `(scale, mean)` parameters.
`HyperLatentCodec`	Uses an `EntropyBottleneck` to encode `z`, with surrounding `h_a` and `h_s` transforms.
`HyperpriorLatentCodec`	Uses an e.g. `GaussianConditionalLatentCodec` or `RasterScanLatentCodec` to encode `y`, using `(scale, mean)` parameters generated from an e.g. `HyperLatentCodec`.
`RasterScanLatentCodec`	Encodes `y` in raster-scan order using a PixelCNN-style autoregressive context model.
`GainHyperLatentCodec`	Like `HyperLatentCodec`, but with trainable gain vectors for `z`.
`GainHyperpriorLatentCodec`	Like `HyperpriorLatentCodec`, but with trainable gain vectors for `y`.
`ChannelGroupsLatentCodec`	Encodes `y` in multiple chunked groups, each group conditioned on previously encoded groups.
`CheckerboardLatentCodec`	Encodes `y` in two passes in checkerboard order.

Diagrams for some of the above predefined latent codecs:

HyperLatentCodec:

           ┌───┐  z  ┌───┐ z_hat      z_hat ┌───┐
    y ──►──┤h_a├──►──┤ Q ├───►───····───►───┤h_s├──►── params
           └───┘     └───┘        EB        └───┘

    Entropy bottleneck codec with surrounding `h_a` and `h_s` transforms.

GaussianConditionalLatentCodec:

                      ctx_params
                          │
                          ▼
                          │
                       ┌──┴──┐
                       │  EP │
                       └──┬──┘
                          │
           ┌───┐  y_hat   ▼
    y ──►──┤ Q ├────►────····──►── y_hat
           └───┘          GC

    Gaussian conditional for compressing latent `y` using `ctx_params`.

HyperpriorLatentCodec:

             ┌──────────┐
        ┌─►──┤ lc_hyper ├──►─┐
        │    └──────────┘    │
        │                    ▼ params
        │                    │
        │                 ┌──┴───┐
    y ──┴───────►─────────┤ lc_y ├───►── y_hat
                          └──────┘

    Composes a HyperLatentCodec and a "lc_y" latent codec such as
    GaussianConditionalLatentCodec or RasterScanLatentCodec.

RasterScanLatentCodec:

                     ctx_params
                         │
                         ▼
                         │ ┌───◄───┐
                       ┌─┴─┴─┐  ┌──┴──┐
                       │  EP │  │  CP │
                       └──┬──┘  └──┬──┘
                          │        │
                          │        ▲
           ┌───┐  y_hat   ▼        │
    y ──►──┤ Q ├────►────····───►──┴──►── y_hat
           └───┘          GC

Rationale#

This abstraction makes it easy to swap between different entropy models such as “factorized”, “hyperprior”, “raster scan autoregressive”, “checkerboard”, or “channel conditional groups”.

It also aids in composition: we may now easily take any complicated composition of the above LatentCodec subclasses. For example, we may create models containing multiple hyperprior branches (Hu et al., 2020), or a “channel conditional group” context model which encodes each group using “raster-scan” (Minnen et al., 2020) or “checkerboard” (He et al., 2022) autoregression, and so on.

Lastly, it reduces code duplication, and favors composition instead of inheritance.

Example models#

A simple VAE model with an arbitrary latent codec can be implemented as follows:

class SimpleVAECompressionModel(CompressionModel):
    """Simple VAE model with arbitrary latent codec.

    .. code-block:: none

               ┌───┐  y  ┌────┐ y_hat ┌───┐
        x ──►──┤g_a├──►──┤ lc ├───►───┤g_s├──►── x_hat
               └───┘     └────┘       └───┘
    """

    g_a: nn.Module
    g_s: nn.Module
    latent_codec: LatentCodec

    def forward(self, x):
        y = self.g_a(x)
        y_out = self.latent_codec(y)
        y_hat = y_out["y_hat"]
        x_hat = self.g_s(y_hat)
        return {
            "x_hat": x_hat,
            "likelihoods": y_out["likelihoods"],
        }

    def compress(self, x):
        y = self.g_a(x)
        outputs = self.latent_codec.compress(y)
        return outputs

    def decompress(self, strings, shape):
        y_out = self.latent_codec.decompress(strings, shape)
        y_hat = y_out["y_hat"]
        x_hat = self.g_s(y_hat).clamp_(0, 1)
        return {
            "x_hat": x_hat,
        }

This pattern is so common that CompressAI provides it via the import:

from compressai.models.base import SimpleVAECompressionModel

Using SimpleVAECompressionModel, some Google-style VAE models may be implemented as follows:

@register_model("bmshj2018-factorized")
class FactorizedPrior(SimpleVAECompressionModel):
    def __init__(self, N, M, **kwargs):
        super().__init__(**kwargs)

        self.g_a = nn.Sequential(...)
        self.g_s = nn.Sequential(...)

        self.latent_codec = EntropyBottleneckLatentCodec(channels=M)

@register_model("mbt2018-mean")
class MeanScaleHyperprior(SimpleVAECompressionModel):
    def __init__(self, N, M, **kwargs):
        super().__init__(**kwargs)

        self.g_a = nn.Sequential(...)
        self.g_s = nn.Sequential(...)
        h_a = nn.Sequential(...)
        h_s = nn.Sequential(...)

        self.latent_codec = HyperpriorLatentCodec(
            # A HyperpriorLatentCodec is made of "hyper" and "y" latent codecs.
            latent_codec={
                # Side-information branch with entropy bottleneck for "z":
                "hyper": HyperLatentCodec(
                    h_a=h_a,
                    h_s=h_s,
                    entropy_bottleneck=EntropyBottleneck(N),
                ),
                # Encode y using GaussianConditional:
                "y": GaussianConditionalLatentCodec(),
            },
        )

@register_model("mbt2018")
class JointAutoregressiveHierarchicalPriors(SimpleVAECompressionModel):
    def __init__(self, N, M, **kwargs):
        super().__init__(**kwargs)

        self.g_a = nn.Sequential(...)
        self.g_s = nn.Sequential(...)
        h_a = nn.Sequential(...)
        h_s = nn.Sequential(...)

        self.latent_codec = HyperpriorLatentCodec(
            # A HyperpriorLatentCodec is made of "hyper" and "y" latent codecs.
            latent_codec={
                # Side-information branch with entropy bottleneck for "z":
                "hyper": HyperLatentCodec(
                    h_a=h_a,
                    h_s=h_s,
                    entropy_bottleneck=EntropyBottleneck(N),
                ),
                # Encode y using autoregression in raster-scan order:
                "y": RasterScanLatentCodec(
                    gaussian_conditional=GaussianConditional(None),
                    entropy_parameters=nn.Sequential(...),
                    context_prediction=MaskedConv2d(
                        M, M * 2, kernel_size=5, padding=2, stride=1
                    ),
                ),
            },
        )

Defining a custom latent codec#

Latent codecs should inherit from the abstract base class LatentCodec, which is defined as:

class LatentCodec(nn.Module):
    def forward(self, y: Tensor, *args, **kwargs) -> Dict[str, Any]:
        raise NotImplementedError

    def compress(self, y: Tensor, *args, **kwargs) -> Dict[str, Any]:
        raise NotImplementedError

    def decompress(
        self, strings: List[List[bytes]], shape: Any, *args, **kwargs
    ) -> Dict[str, Any]:
        raise NotImplementedError

Like CompressionModel, a subclass of LatentCodec should implement:

forward: differentiable function for training, returning a dict in the form of:

{
    "likelihoods": {
        "y": y_likelihoods,
        ...
    },
    "y_hat": y_hat,
}

compress: compressor to generate bitstreams from input tensor, returning a dict in the form of:
```
{
    "strings": [y_strings, z_strings],
    "shape": ...,
}
```
decompress: decompressor to reconstruct the input tensors using the bitstreams, returning a dict in the form of:
```
{
    "y_hat": y_hat,
}
```

Please refer to any of the predefined latent codecs for more concrete examples.

EntropyBottleneckLatentCodec#

class compressai.latent_codecs.EntropyBottleneckLatentCodec(entropy_bottleneck: Optional[compressai.entropy_models.entropy_models.EntropyBottleneck] = None, **kwargs)[source]#

Entropy bottleneck codec.

Factorized prior “entropy bottleneck” introduced in “Variational Image Compression with a Scale Hyperprior”, by J. Balle, D. Minnen, S. Singh, S.J. Hwang, and N. Johnston, International Conference on Learning Representations (ICLR), 2018.

       ┌───┐ y_hat
y ──►──┤ Q ├───►───····───►─── y_hat
       └───┘        EB

GaussianConditionalLatentCodec#

class compressai.latent_codecs.GaussianConditionalLatentCodec(scale_table: Optional[Union[List, Tuple]] = None, gaussian_conditional: Optional[compressai.entropy_models.entropy_models.GaussianConditional] = None, entropy_parameters: Optional[torch.nn.modules.module.Module] = None, quantizer: str = 'noise', chunks: Tuple[str, ...] = ('scales', 'means'), **kwargs)[source]#

Gaussian conditional for compressing latent y using ctx_params.

Probability model for Gaussian of (scales, means).

Gaussian conditonal entropy model introduced in “Variational Image Compression with a Scale Hyperprior”, by J. Balle, D. Minnen, S. Singh, S.J. Hwang, and N. Johnston, International Conference on Learning Representations (ICLR), 2018.

Note

Unlike the original paper, which models only the scale (i.e. “width”) of the Gaussian, this implementation models both the scale and the mean (i.e. “center”) of the Gaussian.

                  ctx_params
                      │
                      ▼
                      │
                   ┌──┴──┐
                   │  EP │
                   └──┬──┘
                      │
       ┌───┐  y_hat   ▼
y ──►──┤ Q ├────►────····──►── y_hat
       └───┘          GC

HyperLatentCodec#

class compressai.latent_codecs.HyperLatentCodec(entropy_bottleneck: compressai.entropy_models.entropy_models.EntropyBottleneck, h_a: torch.nn.modules.module.Module, h_s: torch.nn.modules.module.Module, quantizer: str = 'noise', **kwargs)[source]#

Entropy bottleneck codec with surrounding h_a and h_s transforms.

“Hyper” side-information branch introduced in “Variational Image Compression with a Scale Hyperprior”, by J. Balle, D. Minnen, S. Singh, S.J. Hwang, and N. Johnston, International Conference on Learning Representations (ICLR), 2018.

Note

HyperLatentCodec should be used inside HyperpriorLatentCodec to construct a full hyperprior.

       ┌───┐  z  ┌───┐ z_hat      z_hat ┌───┐
y ──►──┤h_a├──►──┤ Q ├───►───····───►───┤h_s├──►── params
       └───┘     └───┘        EB        └───┘

HyperpriorLatentCodec#

class compressai.latent_codecs.HyperpriorLatentCodec(latent_codec: Mapping[str, compressai.latent_codecs.base.LatentCodec], **kwargs)[source]#

Hyperprior codec constructed from latent codec for y that compresses y using params from hyper branch.

Hyperprior entropy modeling introduced in “Variational Image Compression with a Scale Hyperprior”, by J. Balle, D. Minnen, S. Singh, S.J. Hwang, and N. Johnston, International Conference on Learning Representations (ICLR), 2018.

         ┌──────────┐
    ┌─►──┤ lc_hyper ├──►─┐
    │    └──────────┘    │
    │                    ▼ params
    │                    │
    │                 ┌──┴───┐
y ──┴───────►─────────┤ lc_y ├───►── y_hat
                      └──────┘

By default, the following codec is constructed:

         ┌───┐  z  ┌───┐ z_hat      z_hat ┌───┐
    ┌─►──┤h_a├──►──┤ Q ├───►───····───►───┤h_s├──►─┐
    │    └───┘     └───┘        EB        └───┘    │
    │                                              │
    │                  ┌──────────────◄────────────┘
    │                  │            params
    │               ┌──┴──┐
    │               │  EP │
    │               └──┬──┘
    │                  │
    │   ┌───┐  y_hat   ▼
y ──┴─►─┤ Q ├────►────····────►── y_hat
        └───┘          GC

Common configurations of latent codecs include:

entropy bottleneck hyper (default) and gaussian conditional y (default)
entropy bottleneck hyper (default) and autoregressive y

RasterScanLatentCodec#

class compressai.latent_codecs.RasterScanLatentCodec(gaussian_conditional: compressai.entropy_models.entropy_models.GaussianConditional, entropy_parameters: torch.nn.modules.module.Module, context_prediction: compressai.layers.layers.MaskedConv2d, **kwargs)[source]#

Autoregression in raster-scan order with local decoded context.

PixelCNN context model introduced in “Pixel Recurrent Neural Networks”, by Aaron van den Oord, Nal Kalchbrenner, and Koray Kavukcuoglu, International Conference on Machine Learning (ICML), 2016.

First applied to learned image compression in “Joint Autoregressive and Hierarchical Priors for Learned Image Compression”, by D. Minnen, J. Balle, and G.D. Toderici, Adv. in Neural Information Processing Systems 31 (NeurIPS 2018).

                 ctx_params
                     │
                     ▼
                     │ ┌───◄───┐
                   ┌─┴─┴─┐  ┌──┴──┐
                   │  EP │  │  CP │
                   └──┬──┘  └──┬──┘
                      │        │
                      │        ▲
       ┌───┐  y_hat   ▼        │
y ──►──┤ Q ├────►────····───►──┴──►── y_hat
       └───┘          GC

GainHyperLatentCodec#

class compressai.latent_codecs.GainHyperLatentCodec(entropy_bottleneck: compressai.entropy_models.entropy_models.EntropyBottleneck, h_a: torch.nn.modules.module.Module, h_s: torch.nn.modules.module.Module, **kwargs)[source]#

Entropy bottleneck codec with surrounding h_a and h_s transforms.

Gain-controlled side branch for hyperprior introduced in “Asymmetric Gained Deep Image Compression With Continuous Rate Adaptation”, by Ze Cui, Jing Wang, Shangyin Gao, Bo Bai, Tiansheng Guo, and Yihui Feng, CVPR, 2021.

Note

GainHyperLatentCodec should be used inside GainHyperpriorLatentCodec to construct a full hyperprior.

               gain                        gain_inv
                 │                             │
                 ▼                             ▼
       ┌───┐  z  │     ┌───┐ z_hat      z_hat  │       ┌───┐
y ──►──┤h_a├──►──×──►──┤ Q ├───►───····───►────×────►──┤h_s├──►── params
       └───┘           └───┘        EB                 └───┘

GainHyperpriorLatentCodec#

class compressai.latent_codecs.GainHyperpriorLatentCodec(latent_codec: Mapping[str, compressai.latent_codecs.base.LatentCodec], **kwargs)[source]#

Hyperprior codec constructed from latent codec for y that compresses y using params from hyper branch.

Gain-controlled hyperprior introduced in “Asymmetric Gained Deep Image Compression With Continuous Rate Adaptation”, by Ze Cui, Jing Wang, Shangyin Gao, Bo Bai, Tiansheng Guo, and Yihui Feng, CVPR, 2021.

        z_gain  z_gain_inv
           │        │
           ▼        ▼
          ┌┴────────┴┐
    ┌──►──┤ lc_hyper ├──►─┐
    │     └──────────┘    │
    │                     │
    │     y_gain          ▼ params   y_gain_inv
    │        │            │              │
    │        ▼            │              ▼
    │        │         ┌──┴───┐          │
y ──┴────►───×───►─────┤ lc_y ├────►─────×─────►── y_hat
                       └──────┘

By default, the following codec is constructed:

                z_gain                      z_gain_inv
                   │                             │
                   ▼                             ▼
         ┌───┐  z  │ z_g ┌───┐ z_hat      z_hat  │       ┌───┐
    ┌─►──┤h_a├──►──×──►──┤ Q ├───►───····───►────×────►──┤h_s├──┐
    │    └───┘           └───┘        EB                 └───┘  │
    │                                                           │
    │                              ┌──────────────◄─────────────┘
    │                              │            params
    │                           ┌──┴──┐
    │    y_gain                 │  EP │    y_gain_inv
    │       │                   └──┬──┘        │
    │       ▼                      │           ▼
    │       │       ┌───┐          ▼           │
y ──┴───►───×───►───┤ Q ├────►────····───►─────×─────►── y_hat
                    └───┘          GC

Common configurations of latent codecs include:

entropy bottleneck hyper (default) and gaussian conditional y (default)
entropy bottleneck hyper (default) and autoregressive y

ChannelGroupsLatentCodec#

class compressai.latent_codecs.ChannelGroupsLatentCodec(latent_codec: Mapping[str, compressai.latent_codecs.base.LatentCodec], channel_context: Mapping[str, torch.nn.modules.module.Module], *, groups: List[int], **kwargs)[source]#

Reconstructs groups of channels using previously decoded groups.

Context model from [Minnen2020] and [He2022]. Also known as a “channel-conditional” (CC) entropy model.

See Elic2022Official for example usage.

[Minnen2020]: “Channel-wise Autoregressive Entropy Models for Learned Image Compression”, by David Minnen, and Saurabh Singh, ICIP 2020.

[He2022]: “ELIC: Efficient Learned Image Compression with Unevenly Grouped Space-Channel Contextual Adaptive Coding”, by Dailan He, Ziming Yang, Weikun Peng, Rui Ma, Hongwei Qin, and Yan Wang, CVPR 2022.

CheckerboardLatentCodec#

class compressai.latent_codecs.CheckerboardLatentCodec(latent_codec: Mapping[str, compressai.latent_codecs.base.LatentCodec], entropy_parameters: torch.nn.modules.module.Module, context_prediction: compressai.layers.layers.CheckerboardMaskedConv2d, anchor_parity='even', forward_method='twopass', **kwargs)[source]#

Reconstructs latent using 2-pass context model with checkerboard anchors.

Checkerboard context model introduced in [He2021].

See Cheng2020AnchorCheckerboard for example usage.

forward_method=”onepass” is fastest, but does not use quantization based on the intermediate means. Uses noise to model quantization.
forward_method=”twopass” is slightly slower, but accurately quantizes via STE based on the intermediate means. Uses the same operations as [Chandelier2023].
forward_method=”twopass_faster” uses slightly fewer redundant operations.

[He2021]: “Checkerboard Context Model for Efficient Learned Image Compression”, by Dailan He, Yaoyan Zheng, Baocheng Sun, Yan Wang, and Hongwei Qin, CVPR 2021.

[Chandelier2023]: “ELiC-ReImplemetation”, by Vincent Chandelier, 2023.

Warning

This implementation assumes that entropy_parameters is a pointwise function, e.g., a composition of 1x1 convs and pointwise nonlinearities.

0. Input:

□ □ □ □
□ □ □ □
□ □ □ □

1. Decode anchors:

◌ □ ◌ □
□ ◌ □ ◌
◌ □ ◌ □

2. Decode non-anchors:

■ ◌ ■ ◌
◌ ■ ◌ ■
■ ◌ ■ ◌

3. End result:

■ ■ ■ ■
■ ■ ■ ■
■ ■ ■ ■

LEGEND:
■   decoded
◌   currently decoding
□   empty

compressai.latent_codecs

Contents

compressai.latent_codecs#

Rationale#

Example models#

Defining a custom latent codec#

EntropyBottleneckLatentCodec#

GaussianConditionalLatentCodec#

HyperLatentCodec#

HyperpriorLatentCodec#

RasterScanLatentCodec#

GainHyperLatentCodec#

GainHyperpriorLatentCodec#

ChannelGroupsLatentCodec#

CheckerboardLatentCodec#