compressai.models#
CompressionModel#
- class compressai.models.CompressionModel(entropy_bottleneck_channels=None, init_weights=None)[source]#
Base class for constructing an auto-encoder with any number of EntropyBottleneck or GaussianConditional modules.
- aux_loss() Tensor [source]#
Returns the total auxiliary loss over all
EntropyBottleneck
s.In contrast to the primary “net” loss used by the “net” optimizer, the “aux” loss is only used by the “aux” optimizer to update only the
EntropyBottleneck.quantiles
parameters. In fact, the “aux” loss does not depend on image data at all.The purpose of the “aux” loss is to determine the range within which most of the mass of a given distribution is contained, as well as its median (i.e. 50% probability). That is, for a given distribution, the “aux” loss converges towards satisfying the following conditions for some chosen
tail_mass
probability:cdf(quantiles[0]) = tail_mass / 2
cdf(quantiles[1]) = 0.5
cdf(quantiles[2]) = 1 - tail_mass / 2
This ensures that the concrete
_quantized_cdf
s operate primarily within a finitely supported region. Any symbols outside this range must be coded using some alternative method that does not involve the_quantized_cdf
s. Luckily, one may choose atail_mass
probability that is sufficiently small so that this rarely occurs. It is important that we work with_quantized_cdf
s that have a small finite support; otherwise, entropy coding runtime performance would suffer. Thus,tail_mass
should not be too small, either!
- load_state_dict(state_dict, strict=True)[source]#
Copies parameters and buffers from
state_dict
into this module and its descendants. Ifstrict
isTrue
, then the keys ofstate_dict
must exactly match the keys returned by this module’sstate_dict()
function.- Parameters:
state_dict (dict) – a dict containing parameters and persistent buffers.
strict (bool, optional) – whether to strictly enforce that the keys in
state_dict
match the keys returned by this module’sstate_dict()
function. Default:True
- Returns:
missing_keys is a list of str containing the missing keys
unexpected_keys is a list of str containing the unexpected keys
- Return type:
NamedTuple
withmissing_keys
andunexpected_keys
fields
Note
If a parameter or buffer is registered as
None
and its corresponding key exists instate_dict
,load_state_dict()
will raise aRuntimeError
.
- update(scale_table=None, force=False)[source]#
Updates EntropyBottleneck and GaussianConditional CDFs.
Needs to be called once after training to be able to later perform the evaluation with an actual entropy coder.
- Parameters:
scale_table (torch.Tensor) – table of scales (i.e. stdev) for initializing the Gaussian distributions (default: 64 logarithmically spaced scales from 0.11 to 256)
force (bool) – overwrite previous values (default: False)
- Returns:
True if at least one of the modules was updated.
- Return type:
updated (bool)
SimpleVAECompressionModel#
FactorizedPrior#
- class compressai.models.FactorizedPrior(N, M, **kwargs)[source]#
Factorized Prior model from J. Balle, D. Minnen, S. Singh, S.J. Hwang, N. Johnston: “Variational Image Compression with a Scale Hyperprior”, Int Conf. on Learning Representations (ICLR), 2018.
┌───┐ y x ──►─┤g_a├──►─┐ └───┘ │ ▼ ┌─┴─┐ │ Q │ └─┬─┘ │ y_hat ▼ │ · EB : · │ y_hat ▼ │ ┌───┐ │ x_hat ──◄─┤g_s├────┘ └───┘ EB = Entropy bottleneck
- Parameters:
N (int) – Number of channels
M (int) – Number of channels in the expansion layers (last layer of the encoder and last layer of the hyperprior decoder)
ScaleHyperprior#
- class compressai.models.ScaleHyperprior(N, M, **kwargs)[source]#
Scale Hyperprior model from J. Balle, D. Minnen, S. Singh, S.J. Hwang, N. Johnston: “Variational Image Compression with a Scale Hyperprior” Int. Conf. on Learning Representations (ICLR), 2018.
┌───┐ y ┌───┐ z ┌───┐ z_hat z_hat ┌───┐ x ──►─┤g_a├──►─┬──►──┤h_a├──►──┤ Q ├───►───·⋯⋯·───►───┤h_s├─┐ └───┘ │ └───┘ └───┘ EB └───┘ │ ▼ │ ┌─┴─┐ │ │ Q │ ▼ └─┬─┘ │ │ │ y_hat ▼ │ │ │ · │ GC : ◄─────────────────────◄────────────────────┘ · scales_hat │ y_hat ▼ │ ┌───┐ │ x_hat ──◄─┤g_s├────┘ └───┘ EB = Entropy bottleneck GC = Gaussian conditional
- Parameters:
N (int) – Number of channels
M (int) – Number of channels in the expansion layers (last layer of the encoder and last layer of the hyperprior decoder)
MeanScaleHyperprior#
- class compressai.models.MeanScaleHyperprior(N, M, **kwargs)[source]#
Scale Hyperprior with non zero-mean Gaussian conditionals from D. Minnen, J. Balle, G.D. Toderici: “Joint Autoregressive and Hierarchical Priors for Learned Image Compression”, Adv. in Neural Information Processing Systems 31 (NeurIPS 2018).
┌───┐ y ┌───┐ z ┌───┐ z_hat z_hat ┌───┐ x ──►─┤g_a├──►─┬──►──┤h_a├──►──┤ Q ├───►───·⋯⋯·───►───┤h_s├─┐ └───┘ │ └───┘ └───┘ EB └───┘ │ ▼ │ ┌─┴─┐ │ │ Q │ ▼ └─┬─┘ │ │ │ y_hat ▼ │ │ │ · │ GC : ◄─────────────────────◄────────────────────┘ · scales_hat │ means_hat y_hat ▼ │ ┌───┐ │ x_hat ──◄─┤g_s├────┘ └───┘ EB = Entropy bottleneck GC = Gaussian conditional
- Parameters:
N (int) – Number of channels
M (int) – Number of channels in the expansion layers (last layer of the encoder and last layer of the hyperprior decoder)
JointAutoregressiveHierarchicalPriors#
- class compressai.models.JointAutoregressiveHierarchicalPriors(N=192, M=192, **kwargs)[source]#
Joint Autoregressive Hierarchical Priors model from D. Minnen, J. Balle, G.D. Toderici: “Joint Autoregressive and Hierarchical Priors for Learned Image Compression”, Adv. in Neural Information Processing Systems 31 (NeurIPS 2018).
┌───┐ y ┌───┐ z ┌───┐ z_hat z_hat ┌───┐ x ──►─┤g_a├──►─┬──►──┤h_a├──►──┤ Q ├───►───·⋯⋯·───►───┤h_s├─┐ └───┘ │ └───┘ └───┘ EB └───┘ │ ▼ │ ┌─┴─┐ │ │ Q │ params ▼ └─┬─┘ │ y_hat ▼ ┌─────┐ │ ├──────────►───────┤ CP ├────────►──────────┤ │ └─────┘ │ ▼ ▼ │ │ · ┌─────┐ │ GC : ◄────────◄───────┤ EP ├────────◄──────────┘ · scales_hat └─────┘ │ means_hat y_hat ▼ │ ┌───┐ │ x_hat ──◄─┤g_s├────┘ └───┘ EB = Entropy bottleneck GC = Gaussian conditional EP = Entropy parameters network CP = Context prediction (masked convolution)
- Parameters:
N (int) – Number of channels
M (int) – Number of channels in the expansion layers (last layer of the encoder and last layer of the hyperprior decoder)
Cheng2020Anchor#
- class compressai.models.Cheng2020Anchor(N=192, **kwargs)[source]#
Anchor model variant from “Learned Image Compression with Discretized Gaussian Mixture Likelihoods and Attention Modules”, by Zhengxue Cheng, Heming Sun, Masaru Takeuchi, Jiro Katto.
Uses residual blocks with small convolutions (3x3 and 1x1), and sub-pixel convolutions for up-sampling.
- Parameters:
N (int) – Number of channels
Cheng2020Attention#
- class compressai.models.Cheng2020Attention(N=192, **kwargs)[source]#
Self-attention model variant from “Learned Image Compression with Discretized Gaussian Mixture Likelihoods and Attention Modules”, by Zhengxue Cheng, Heming Sun, Masaru Takeuchi, Jiro Katto.
Uses self-attention, residual blocks with small convolutions (3x3 and 1x1), and sub-pixel convolutions for up-sampling.
- Parameters:
N (int) – Number of channels
ScaleSpaceFlow#
- class compressai.models.video.ScaleSpaceFlow(num_levels: int = 5, sigma0: float = 1.5, scale_field_shift: float = 1.0)[source]#
Google’s first end-to-end optimized video compression from E. Agustsson, D. Minnen, N. Johnston, J. Balle, S. J. Hwang, G. Toderici: “Scale-space flow for end-to-end optimized video compression”, IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2020).
- Parameters:
num_levels (int) – Number of Scale-space
sigma0 (float) – standard deviation for gaussian kernel of the first space scale.
scale_field_shift (float) –