compressai.layers#

MaskedConv2d#

class compressai.layers.MaskedConv2d(*args: Any, mask_type: str = 'A', **kwargs: Any)[source]#

Masked 2D convolution implementation, mask future “unseen” pixels. Useful for building auto-regressive network components.

Introduced in “Conditional Image Generation with PixelCNN Decoders”.

Inherits the same arguments as a nn.Conv2d. Use mask_type=’A’ for the first layer (which also masks the “current pixel”), mask_type=’B’ for the following layers.

GDN#

class compressai.layers.GDN(in_channels: int, inverse: bool = False, beta_min: float = 1e-06, gamma_init: float = 0.1)[source]#

Generalized Divisive Normalization layer.

Introduced in “Density Modeling of Images Using a Generalized Normalization Transformation”, by Balle Johannes, Valero Laparra, and Eero P. Simoncelli, (2016).

\[y[i] = \frac{x[i]}{\sqrt{\beta[i] + \sum_j(\gamma[j, i] * x[j]^2)}}\]

GDN1#

class compressai.layers.GDN1(in_channels: int, inverse: bool = False, beta_min: float = 1e-06, gamma_init: float = 0.1)[source]#

Simplified GDN layer.

Introduced in “Computationally Efficient Neural Image Compression”, by Johnston Nick, Elad Eban, Ariel Gordon, and Johannes Ballé, (2019).

\[y[i] = \frac{x[i]}{\beta[i] + \sum_j(\gamma[j, i] * |x[j]|}\]

ResidualBlock#

class compressai.layers.ResidualBlock(in_ch: int, out_ch: int)[source]#

Simple residual block with two 3x3 convolutions.

Parameters:
  • in_ch (int) – number of input channels

  • out_ch (int) – number of output channels

ResidualBlockWithStride#

class compressai.layers.ResidualBlockWithStride(in_ch: int, out_ch: int, stride: int = 2)[source]#

Residual block with a stride on the first convolution.

Parameters:
  • in_ch (int) – number of input channels

  • out_ch (int) – number of output channels

  • stride (int) – stride value (default: 2)

ResidualBlockUpsample#

class compressai.layers.ResidualBlockUpsample(in_ch: int, out_ch: int, upsample: int = 2)[source]#

Residual block with sub-pixel upsampling on the last convolution.

Parameters:
  • in_ch (int) – number of input channels

  • out_ch (int) – number of output channels

  • upsample (int) – upsampling factor (default: 2)

AttentionBlock#

class compressai.layers.AttentionBlock(N: int)[source]#

Self attention block.

Simplified variant from “Learned Image Compression with Discretized Gaussian Mixture Likelihoods and Attention Modules”, by Zhengxue Cheng, Heming Sun, Masaru Takeuchi, Jiro Katto.

Parameters:

N (int) – Number of channels)

QReLU#

class compressai.layers.QReLU(*args, **kwargs)[source]#

Clamping input with given bit-depth range. Suppose that input data presents integer through an integer network otherwise any precision of input will simply clamp without rounding operation.

Pre-computed scale with gamma function is used for backward computation.

More details can be found in “Integer networks for data compression with latent-variable models”, by Johannes Ballé, Nick Johnston and David Minnen, ICLR in 2019

Parameters:
  • input – a tensor data

  • bit_depth – source bit-depth (used for clamping)

  • beta – a parameter for modeling the gradient during backward computation