compressai_vision.pipelines.fo_vcm.pipeline#

base#

class compressai_vision.pipelines.fo_vcm.pipeline.base.EncoderDecoder[source]#

NOTE: virtual class that you need to subclass

An instance of this class encodes an image, calculates the number of bits and decodes the encoded image, resulting in “transformed” image.

Transformed image is similar to the original image, while the encoding+decoding process might have introduced some distortion.

The instance may (say, H266 video encoder+decoder) or may not (say, jpeg encoder+decoder) have an internal state.

BGR(bgr_image, tag=None)[source]#
Parameters:
  • bgr_image – numpy BGR image (y,x,3)

  • tag – a string that can be used to identify & cache images (optional)

Takes in an BGR image, pushes it through encoder + decoder.

Returns nbits, transformed BGR image.

computeMetrics(state: bool)[source]#
compute_msssim(a, b)[source]#
compute_psnr(a, b)[source]#
getMetrics()[source]#

returns tuple with (psnr, mssim) from latest encode+decode calculation

reset()[source]#

Reset the internal state of the encoder & decoder, if there is any

class compressai_vision.pipelines.fo_vcm.pipeline.base.VoidEncoderDecoder[source]#

Does no encoding/decoding whatsoever. Use for debugging.

BGR(bgr_image, tag=None)[source]#
Parameters:
  • bgr_image – numpy BGR image (y,x,3)

  • tag – a string that can be used to identify & cache images (optional)

Returns BGR image that has gone through transformation (the encoding + decoding process)

Returns nbits, transformed BGR image

reset()[source]#

Reset the internal state of the encoder & decoder, if any

compressai#

class compressai_vision.pipelines.fo_vcm.pipeline.compressai.CompressAIEncoderDecoder(net, device='cpu', dump=False, m: int = 64, ffmpeg='ffmpeg', scale: int | None = None, half=False)[source]#

EncoderDecoder class for CompressAI

Parameters:

net – compressai network, for example:

net = bmshj2018_factorized(quality=2, pretrained=True).eval().to(device)
Parameters:
  • device – “cpu” or “cuda”

  • dump – (debugging) dump transformed images to disk. default = False

  • m – images should be multiples of this number. If not, a padding is applied before passing to compressai. default = 64

  • ffmpeg – ffmpeg command used for padding/scaling (as defined by VCM working group). Default: “ffmpeg”.

  • scale – enable the VCM working group defined padding/scaling pre & post-processings steps. Possible values: 100 (default), 75, 50, 25. Special value: None = ffmpeg scaling. 100 equals to a simple padding operation

  • dump – debugging option: dump input, intermediate and output images to disk in local directory

This class uses CompressAI model API’s compress and decompress methods, so if your model has them, then it is compatible with this particular EncoderDecoder class, in detail:

::

# CompressAI model API: # compression: out_enc = self.net.compress(x) bitstream = out_enc[“strings”][0][0] # compressed bitstream # decompression: out_dec = self.net.decompress(out_enc[“strings”], out_enc[“shape”]) x_hat = out_dec[“x_hat”] # reconstructed image

BGR(bgr_image: array, tag=None) tuple[source]#

Return transformed image and nbits for a BGR image

Parameters:
  • bgr_image – numpy BGR image (y,x,3)

  • tag – a string that can be used to identify & cache images (optional)

Returns number of bits and transformed BGR image that has gone through compressai encoding+decoding.

  • Scales the image if scaling is requested (1) [with ffmpeg]

  • Pads the image for CompressAI (2) [with ffmpeg - feel free to switch to torch if you want]

  • Runs the image through CompressAI model

  • Removes padding (2) [with ffmpeg]

  • Backscales (1) [with ffmpeg]

Necessary padding for compressai is added and removed on-the-fly

computeMetrics(state: bool)[source]#
getMetrics()[source]#

returns tuple with (psnr, mssim) from latest encode+decode calculation

reset()[source]#

Reset internal image counter

toByte = ConvertImageDtype()#
toFloat = ConvertImageDtype()#

vtm#

class compressai_vision.pipelines.fo_vcm.pipeline.vtm.VTMEncoderDecoder(encoderApp=None, decoderApp=None, ffmpeg='ffmpeg', vtm_cfg=None, qp=47, scale=100, save=False, base_path='/dev/shm', cache=None, dump=False, skip=False, keep=False, warn=False)[source]#

EncoderDecoder class for VTM encoder

Parameters:
  • encoderApp – VTM encoder command

  • decoderApp – VTM decoder command

  • vtm_cfg – path of encoder cfg file

  • ffmpeg – ffmpeg command used for padding/scaling

  • qp – the default quantization parameter of the instance. Integer from 0 to 63. Default=30.

  • scale – enable the VCM working group defined padding/scaling pre & post-processings steps. Possible values: 100 (default), 75, 50, 25. Special value: None = ffmpeg scaling. 100 equals to a simple padding operation

  • save – save intermediate steps into member saved (for debugging). Default: False.

  • cache – (optional) define a directory where all encoded bitstreams are cached. NOTE: If scale is defined, “scale/qp/” is appended to the cache path. If no scale is defined, the appended path is “0/qp/”

  • dump – debugging option: dump input, intermediate and output images to disk in local directory

  • skip – if bitstream is found in cache, then do absolutely nothing. Good for restarting the bitstream generation. default: False. When enabled, method BGR returns (0, None). NOTE: do not use if you want to verify the bitstream files.

  • warn – warn always when a bitstream is generated. default: False.

This class tries always to use the cached bitstreams if they are available (for this you need to define a cache directory, see above). If the bitstream is available in cache, it will be used and the encoding step is skipped. Otherwise encoder is started to produce bitstream.

Example:

import cv2, os, logging
from compressai_vision.evaluation.pipeline import VTMEncoderDecoder
from compressai_vision.pipelines.fo_vcm.tools import getDataFile

path="/path/to/VVCSoftware_VTM/bin"
encoderApp=os.path.join(path, "EncoderAppStatic")
decoderApp=os.path.join(path, "DecoderAppStatic")

# enable debugging log to see explicitly all the steps
loglev=logging.DEBUG
quickLog("VTMEncoderDecoder", loglev)

encdec=VTMEncoderDecoder(encoderApp=encoderApp, decoderApp=decoderApp, ffmpeg="ffmpeg", vtm_cfg=getDataFile("encoder_intra_vtm_1.cfg"), qp=47)
nbits, img_hat = encdec.BGR(cv2.imread("fname.png"))

You can enable caching and avoid re-encoding of images:

encdec=VTMEncoderDecoder(encoderApp=encoderApp, decoderApp=decoderApp, ffmpeg="ffmpeg", vtm_cfg=getDataFile("encoder_intra_vtm_1.cfg"), qp=47, cache="/tmp/kokkelis")
nbits, img_hat = encdec.BGR(cv2.imread("fname.png"), tag="a_unique_tag")

Cache can be inspected with:

encdec.dump()
BGR(bgr_image, tag=None) tuple[source]#
Parameters:
  • bgr_image – numpy BGR image (y,x,3)

  • tag – a string that can be used to identify & cache images (optional). Necessary if you’re using caching

Returns BGR image that has gone through VTM encoding and decoding process and all other operations as defined by MPEG/VCM.

Returns a tuple of (nbits, transformed_bgr_image)

This method is somewhat complex: in addition to perform the necessary image transformation, it also handles caching of bitstreams, inspection if bitstreams exist, etc. Error conditions from ffmpeg and/or from VTMEncoder/Decoder must be taken correctly into account.

VCM working group ops:

padded_hgt = math.ceil(height/2)*2
padded_wdt = math.ceil(width/2)*2
1. ffmpeg vf -i {input_tmp_path} -o {input_padded_tmp_path}

vf depends on the scale:

for 100%: -vf “pad=ceil(iw/2)*2:ceil(ih/2)*2”           # NOTE: simply padding
for 75%:  -vf "scale=ceil(iw*3/8)*2:ceil(ih*3/8)*2"
for 50%:  -vf "scale=ceil(iw/4)*2:ceil(ih/4)*2"
for 25%:  -vf "scale=ceil(iw/8)*2:ceil(ih/8)*2"

2. ffmpeg -i {input_padded_tmp_path} -f rawvideo -pix_fmt yuv420p -dst_range 1 {yuv_image_path}
3. {VTM_encoder_path} -c {VTM_AI_cfg} -i {yuv_image_path} -b {bin_image_path} -o {temp_yuv_path} -fr 1 -f 1 -wdt {padded_wdt} -hgt {padded_hgt}
    -q {qp} --ConformanceWindowMode=1 --InternalBitDepth=10
4. {VTM_decoder_path} -b {bin_image_path} -o {rec_yuv_path}
5. ffmpeg -y -f rawvideo -pix_fmt yuv420p10le -s {padded_wdt}x{padded_hgt} -src_range 1 -i {rec_yuv_path} -frames 1 -pix_fmt rgb24 {rec_png_path}
6. ffmpeg -y -i {rec_png_path} -vf "crop={width}:{height}" {rec_image_path} # NOTE: This can be done only if scale=100%, i.e. to remove padding
dump()[source]#

Dumps files cached on disk by the VTMEncoderDecoder

getCacheDir()[source]#

Returns directory where temporary and cached files are saved

reset()[source]#

Reset encoder/decoder internal state. At the moment, there ain’t any.

compressai_vision.pipelines.fo_vcm.pipeline.vtm.removeFileIf(path) bool[source]#