compressai.latent_codecs#
A LatentCodec
is an abstraction for compressing a latent space using some entropy modeling technique.
A LatentCodec
can be thought of as a miniature CompressionModel
.
In fact, it implements some of the same methods: forward
, compress
, and decompress
, as described in Defining a custom latent codec.
By composing latent codecs, we can easily create more complex entropy models.
CompressAI provides the following predefined LatentCodec
subclasses:
Module name |
Description |
---|---|
Uses an |
|
Uses a |
|
Uses an |
|
Uses an e.g. |
|
Encodes |
|
Like |
|
Like |
|
Encodes |
|
Encodes |
Diagrams for some of the above predefined latent codecs:
HyperLatentCodec:
┌───┐ z ┌───┐ z_hat z_hat ┌───┐
y ──►──┤h_a├──►──┤ Q ├───►───····───►───┤h_s├──►── params
└───┘ └───┘ EB └───┘
Entropy bottleneck codec with surrounding `h_a` and `h_s` transforms.
GaussianConditionalLatentCodec:
ctx_params
│
▼
│
┌──┴──┐
│ EP │
└──┬──┘
│
┌───┐ y_hat ▼
y ──►──┤ Q ├────►────····──►── y_hat
└───┘ GC
Gaussian conditional for compressing latent `y` using `ctx_params`.
HyperpriorLatentCodec:
┌──────────┐
┌─►──┤ lc_hyper ├──►─┐
│ └──────────┘ │
│ ▼ params
│ │
│ ┌──┴───┐
y ──┴───────►─────────┤ lc_y ├───►── y_hat
└──────┘
Composes a HyperLatentCodec and a "lc_y" latent codec such as
GaussianConditionalLatentCodec or RasterScanLatentCodec.
RasterScanLatentCodec:
ctx_params
│
▼
│ ┌───◄───┐
┌─┴─┴─┐ ┌──┴──┐
│ EP │ │ CP │
└──┬──┘ └──┬──┘
│ │
│ ▲
┌───┐ y_hat ▼ │
y ──►──┤ Q ├────►────····───►──┴──►── y_hat
└───┘ GC
Rationale#
This abstraction makes it easy to swap between different entropy models such as “factorized”, “hyperprior”, “raster scan autoregressive”, “checkerboard”, or “channel conditional groups”.
It also aids in composition: we may now easily take any complicated composition of the above LatentCodec
subclasses. For example, we may create models containing multiple hyperprior branches (Hu et al., 2020), or a “channel conditional group” context model which encodes each group using “raster-scan” (Minnen et al., 2020) or “checkerboard” (He et al., 2022) autoregression, and so on.
Lastly, it reduces code duplication, and favors composition instead of inheritance.
Example models#
A simple VAE model with an arbitrary latent codec can be implemented as follows:
class SimpleVAECompressionModel(CompressionModel):
"""Simple VAE model with arbitrary latent codec.
.. code-block:: none
┌───┐ y ┌────┐ y_hat ┌───┐
x ──►──┤g_a├──►──┤ lc ├───►───┤g_s├──►── x_hat
└───┘ └────┘ └───┘
"""
g_a: nn.Module
g_s: nn.Module
latent_codec: LatentCodec
def forward(self, x):
y = self.g_a(x)
y_out = self.latent_codec(y)
y_hat = y_out["y_hat"]
x_hat = self.g_s(y_hat)
return {
"x_hat": x_hat,
"likelihoods": y_out["likelihoods"],
}
def compress(self, x):
y = self.g_a(x)
outputs = self.latent_codec.compress(y)
return outputs
def decompress(self, strings, shape):
y_out = self.latent_codec.decompress(strings, shape)
y_hat = y_out["y_hat"]
x_hat = self.g_s(y_hat).clamp_(0, 1)
return {
"x_hat": x_hat,
}
This pattern is so common that CompressAI provides it via the import:
from compressai.models.base import SimpleVAECompressionModel
Using SimpleVAECompressionModel
, some Google-style VAE models may be implemented as follows:
@register_model("bmshj2018-factorized")
class FactorizedPrior(SimpleVAECompressionModel):
def __init__(self, N, M, **kwargs):
super().__init__(**kwargs)
self.g_a = nn.Sequential(...)
self.g_s = nn.Sequential(...)
self.latent_codec = EntropyBottleneckLatentCodec(channels=M)
@register_model("mbt2018-mean")
class MeanScaleHyperprior(SimpleVAECompressionModel):
def __init__(self, N, M, **kwargs):
super().__init__(**kwargs)
self.g_a = nn.Sequential(...)
self.g_s = nn.Sequential(...)
h_a = nn.Sequential(...)
h_s = nn.Sequential(...)
self.latent_codec = HyperpriorLatentCodec(
# A HyperpriorLatentCodec is made of "hyper" and "y" latent codecs.
latent_codec={
# Side-information branch with entropy bottleneck for "z":
"hyper": HyperLatentCodec(
h_a=h_a,
h_s=h_s,
entropy_bottleneck=EntropyBottleneck(N),
),
# Encode y using GaussianConditional:
"y": GaussianConditionalLatentCodec(),
},
)
@register_model("mbt2018")
class JointAutoregressiveHierarchicalPriors(SimpleVAECompressionModel):
def __init__(self, N, M, **kwargs):
super().__init__(**kwargs)
self.g_a = nn.Sequential(...)
self.g_s = nn.Sequential(...)
h_a = nn.Sequential(...)
h_s = nn.Sequential(...)
self.latent_codec = HyperpriorLatentCodec(
# A HyperpriorLatentCodec is made of "hyper" and "y" latent codecs.
latent_codec={
# Side-information branch with entropy bottleneck for "z":
"hyper": HyperLatentCodec(
h_a=h_a,
h_s=h_s,
entropy_bottleneck=EntropyBottleneck(N),
),
# Encode y using autoregression in raster-scan order:
"y": RasterScanLatentCodec(
entropy_parameters=nn.Sequential(...),
context_prediction=MaskedConv2d(
M, M * 2, kernel_size=5, padding=2, stride=1
),
),
},
)
Defining a custom latent codec#
Latent codecs should inherit from the abstract base class LatentCodec
, which is defined as:
class LatentCodec(nn.Module, _SetDefaultMixin):
def forward(self, y: Tensor, *args, **kwargs) -> Dict[str, Any]:
raise NotImplementedError
def compress(self, y: Tensor, *args, **kwargs) -> Dict[str, Any]:
raise NotImplementedError
def decompress(
self, strings: List[List[bytes]], shape: Any, *args, **kwargs
) -> Dict[str, Any]:
raise NotImplementedError
Like CompressionModel
, a subclass of LatentCodec
should implement:
forward
: differentiable function for training, returning adict
in the form of:{ "likelihoods": { "y": y_likelihoods, ... }, "y_hat": y_hat, }
compress
: compressor to generate bitstreams from input tensor, returning adict
in the form of:{ "strings": [y_strings, z_strings], "shape": ..., }
decompress
: decompressor to reconstruct the input tensors using the bitstreams, returning adict
in the form of:{ "y_hat": y_hat, }
Please refer to any of the predefined latent codecs for more concrete examples.
EntropyBottleneckLatentCodec#
- class compressai.latent_codecs.EntropyBottleneckLatentCodec(entropy_bottleneck: EntropyBottleneck | None = None, **kwargs)[source]#
Entropy bottleneck codec.
Factorized prior “entropy bottleneck” introduced in “Variational Image Compression with a Scale Hyperprior”, by J. Balle, D. Minnen, S. Singh, S.J. Hwang, and N. Johnston, International Conference on Learning Representations (ICLR), 2018.
┌───┐ y_hat y ──►──┤ Q ├───►───····───►─── y_hat └───┘ EB
GaussianConditionalLatentCodec#
- class compressai.latent_codecs.GaussianConditionalLatentCodec(scale_table: List | Tuple | None = None, gaussian_conditional: GaussianConditional | None = None, entropy_parameters: Module | None = None, quantizer: str = 'noise', chunks: Tuple[str] = ('scales', 'means'), **kwargs)[source]#
Gaussian conditional for compressing latent
y
usingctx_params
.Probability model for Gaussian of
(scales, means)
.Gaussian conditonal entropy model introduced in “Variational Image Compression with a Scale Hyperprior”, by J. Balle, D. Minnen, S. Singh, S.J. Hwang, and N. Johnston, International Conference on Learning Representations (ICLR), 2018.
Note
Unlike the original paper, which models only the scale (i.e. “width”) of the Gaussian, this implementation models both the scale and the mean (i.e. “center”) of the Gaussian.
ctx_params │ ▼ │ ┌──┴──┐ │ EP │ └──┬──┘ │ ┌───┐ y_hat ▼ y ──►──┤ Q ├────►────····──►── y_hat └───┘ GC
HyperLatentCodec#
- class compressai.latent_codecs.HyperLatentCodec(entropy_bottleneck: EntropyBottleneck | None = None, h_a: Module | None = None, h_s: Module | None = None, quantizer: str = 'noise', **kwargs)[source]#
Entropy bottleneck codec with surrounding h_a and h_s transforms.
“Hyper” side-information branch introduced in “Variational Image Compression with a Scale Hyperprior”, by J. Balle, D. Minnen, S. Singh, S.J. Hwang, and N. Johnston, International Conference on Learning Representations (ICLR), 2018.
Note
HyperLatentCodec
should be used insideHyperpriorLatentCodec
to construct a full hyperprior.┌───┐ z ┌───┐ z_hat z_hat ┌───┐ y ──►──┤h_a├──►──┤ Q ├───►───····───►───┤h_s├──►── params └───┘ └───┘ EB └───┘
HyperpriorLatentCodec#
- class compressai.latent_codecs.HyperpriorLatentCodec(latent_codec: Mapping[str, LatentCodec] | None = None, **kwargs)[source]#
Hyperprior codec constructed from latent codec for
y
that compressesy
usingparams
fromhyper
branch.Hyperprior entropy modeling introduced in “Variational Image Compression with a Scale Hyperprior”, by J. Balle, D. Minnen, S. Singh, S.J. Hwang, and N. Johnston, International Conference on Learning Representations (ICLR), 2018.
┌──────────┐ ┌─►──┤ lc_hyper ├──►─┐ │ └──────────┘ │ │ ▼ params │ │ │ ┌──┴───┐ y ──┴───────►─────────┤ lc_y ├───►── y_hat └──────┘
By default, the following codec is constructed:
┌───┐ z ┌───┐ z_hat z_hat ┌───┐ ┌─►──┤h_a├──►──┤ Q ├───►───····───►───┤h_s├──►─┐ │ └───┘ └───┘ EB └───┘ │ │ │ │ ┌──────────────◄────────────┘ │ │ params │ ┌──┴──┐ │ │ EP │ │ └──┬──┘ │ │ │ ┌───┐ y_hat ▼ y ──┴─►─┤ Q ├────►────····────►── y_hat └───┘ GC
- Common configurations of latent codecs include:
entropy bottleneck
hyper
(default) and gaussian conditionaly
(default)entropy bottleneck
hyper
(default) and autoregressivey
RasterScanLatentCodec#
- class compressai.latent_codecs.RasterScanLatentCodec(gaussian_conditional: GaussianConditional | None = None, entropy_parameters: Module | None = None, context_prediction: MaskedConv2d | None = None, **kwargs)[source]#
Autoregression in raster-scan order with local decoded context.
PixelCNN context model introduced in “Pixel Recurrent Neural Networks”, by Aaron van den Oord, Nal Kalchbrenner, and Koray Kavukcuoglu, International Conference on Machine Learning (ICML), 2016.
First applied to learned image compression in “Joint Autoregressive and Hierarchical Priors for Learned Image Compression”, by D. Minnen, J. Balle, and G.D. Toderici, Adv. in Neural Information Processing Systems 31 (NeurIPS 2018).
ctx_params │ ▼ │ ┌───◄───┐ ┌─┴─┴─┐ ┌──┴──┐ │ EP │ │ CP │ └──┬──┘ └──┬──┘ │ │ │ ▲ ┌───┐ y_hat ▼ │ y ──►──┤ Q ├────►────····───►──┴──►── y_hat └───┘ GC
GainHyperLatentCodec#
- class compressai.latent_codecs.GainHyperLatentCodec(entropy_bottleneck: EntropyBottleneck | None = None, h_a: Module | None = None, h_s: Module | None = None, **kwargs)[source]#
Entropy bottleneck codec with surrounding h_a and h_s transforms.
Gain-controlled side branch for hyperprior introduced in “Asymmetric Gained Deep Image Compression With Continuous Rate Adaptation”, by Ze Cui, Jing Wang, Shangyin Gao, Bo Bai, Tiansheng Guo, and Yihui Feng, CVPR, 2021.
Note
GainHyperLatentCodec
should be used insideGainHyperpriorLatentCodec
to construct a full hyperprior.gain gain_inv │ │ ▼ ▼ ┌───┐ z │ ┌───┐ z_hat z_hat │ ┌───┐ y ──►──┤h_a├──►──×──►──┤ Q ├───►───····───►────×────►──┤h_s├──►── params └───┘ └───┘ EB └───┘
GainHyperpriorLatentCodec#
- class compressai.latent_codecs.GainHyperpriorLatentCodec(latent_codec: Mapping[str, LatentCodec] | None = None, **kwargs)[source]#
Hyperprior codec constructed from latent codec for
y
that compressesy
usingparams
fromhyper
branch.Gain-controlled hyperprior introduced in “Asymmetric Gained Deep Image Compression With Continuous Rate Adaptation”, by Ze Cui, Jing Wang, Shangyin Gao, Bo Bai, Tiansheng Guo, and Yihui Feng, CVPR, 2021.
z_gain z_gain_inv │ │ ▼ ▼ ┌┴────────┴┐ ┌──►──┤ lc_hyper ├──►─┐ │ └──────────┘ │ │ │ │ y_gain ▼ params y_gain_inv │ │ │ │ │ ▼ │ ▼ │ │ ┌──┴───┐ │ y ──┴────►───×───►─────┤ lc_y ├────►─────×─────►── y_hat └──────┘
By default, the following codec is constructed:
z_gain z_gain_inv │ │ ▼ ▼ ┌───┐ z │ z_g ┌───┐ z_hat z_hat │ ┌───┐ ┌─►──┤h_a├──►──×──►──┤ Q ├───►───····───►────×────►──┤h_s├──┐ │ └───┘ └───┘ EB └───┘ │ │ │ │ ┌──────────────◄─────────────┘ │ │ params │ ┌──┴──┐ │ y_gain │ EP │ y_gain_inv │ │ └──┬──┘ │ │ ▼ │ ▼ │ │ ┌───┐ ▼ │ y ──┴───►───×───►───┤ Q ├────►────····───►─────×─────►── y_hat └───┘ GC
- Common configurations of latent codecs include:
entropy bottleneck
hyper
(default) and gaussian conditionaly
(default)entropy bottleneck
hyper
(default) and autoregressivey
ChannelGroupsLatentCodec#
- class compressai.latent_codecs.ChannelGroupsLatentCodec(latent_codec: Mapping[str, LatentCodec] | None = None, channel_context: Mapping[str, Module] | None = None, *, groups: List[int], **kwargs)[source]#
Reconstructs groups of channels using previously decoded groups.
Context model from [Minnen2020] and [He2022]. Also known as a “channel-conditional” (CC) entropy model.
See
Elic2022Official
for example usage.[Minnen2020]: “Channel-wise Autoregressive Entropy Models for Learned Image Compression”, by David Minnen, and Saurabh Singh, ICIP 2020.
[He2022]: “ELIC: Efficient Learned Image Compression with Unevenly Grouped Space-Channel Contextual Adaptive Coding”, by Dailan He, Ziming Yang, Weikun Peng, Rui Ma, Hongwei Qin, and Yan Wang, CVPR 2022.
CheckerboardLatentCodec#
- class compressai.latent_codecs.CheckerboardLatentCodec(latent_codec: Mapping[str, LatentCodec] | None = None, entropy_parameters: Module | None = None, context_prediction: Module | None = None, anchor_parity='even', forward_method='twopass', **kwargs)[source]#
Reconstructs latent using 2-pass context model with checkerboard anchors.
Checkerboard context model introduced in [He2021].
See
Cheng2020AnchorCheckerboard
for example usage.forward_method=”onepass” is fastest, but does not use quantization based on the intermediate means. Uses noise to model quantization.
forward_method=”twopass” is slightly slower, but accurately quantizes via STE based on the intermediate means. Uses the same operations as [Chandelier2023].
forward_method=”twopass_faster” uses slightly fewer redundant operations.
[He2021]: “Checkerboard Context Model for Efficient Learned Image Compression”, by Dailan He, Yaoyan Zheng, Baocheng Sun, Yan Wang, and Hongwei Qin, CVPR 2021.
[Chandelier2023]: “ELiC-ReImplemetation”, by Vincent Chandelier, 2023.
Warning
This implementation assumes that
entropy_parameters
is a pointwise function, e.g., a composition of 1x1 convs and pointwise nonlinearities.0. Input: □ □ □ □ □ □ □ □ □ □ □ □ 1. Decode anchors: ◌ □ ◌ □ □ ◌ □ ◌ ◌ □ ◌ □ 2. Decode non-anchors: ■ ◌ ■ ◌ ◌ ■ ◌ ■ ■ ◌ ■ ◌ 3. End result: ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ LEGEND: ■ decoded ◌ currently decoding □ empty