compressai_vision.pipelines.fo_vcm.conversion#

class compressai_vision.pipelines.fo_vcm.conversion.FO2DetectronDataset(fo_dataset: Dataset | None = None, detection_field='detections', model_catids=[])[source]#

A class to construct a Detectron2 dataset from a FiftyOne dataset. Subclass of torch.utils.data.Dataset.

Parameters:
  • fo_dataset – fiftyone dataset

  • detection_field – name of member in the FiftyOne Sample where the detector (ground truth) is put into. Default: “detections”.

  • model_catids – a list of category labels as provided from Detectron2 model’s metadata. Used to transform fiftyone category label into an index number used by Detectron2

NOTE: Usually we are more interested in going from Detectron results to FiftyOne format, so you might not use this torch Dataset class that much

refs:

WARNING: at the moment, only detection (not segmentation) is supported

compressai_vision.pipelines.fo_vcm.conversion.MPEGVCMToOpenImageV6(validation_csv_file: str | None = None, list_file: str | None = None, bbox_csv_file: str | None = None, segmentation_csv_file: str | None = None, output_directory: str | None = None, data_dir: str | None = None, mask_dir: str | None = None, link=True, verbose=False, append_mask_dir=None)[source]#

From MPEG/VCM input file format to proper OpenImageV6 format

Parameters:
  • validation_csv_file – MPEG/VCM image-level labels (typically detection_validation_labels_5k.csv or segmentation_validation_labels_5k.csv)

  • list_file – MPEG/VCM image list (typically detection_validation_input_5k.lst or segmentation_validation_input_5k.lst)

  • bbox_csv_file – MPEG/VCM detection input file (typically detection_validation_5k_bbox.csv or segmentation_validation_bbox_5k.csv)

  • seg_masks_csv_file – MPEG/VCM segmentation input file (typically segmentation_validation_masks_5k.csv)

  • output_directory – Path where the OpenImageV6 formatted files are dumped

  • data_dir – Source directory where the image jpg files are. Use the standard OpenImageV6 directory.

  • mask_dir – Source directory where the mask png files are. Use the standard OpenImageV6 directory.

  • link – True (default): create a softlink from source data_dir to target data_dir. False: copy all images to target.

More details on the conversion follow

bbox_csv_file: A filename (detection_validation_5k_bbox.csv) with the MPEG/VCM format that looks like this:

ImageID,LabelName,XMin,XMax,YMin,YMax,IsGroupOf
bef50424c62d12c5,airplane,0.15641026,0.8282050999999999,0.16284987,0.82188296,0
c540d9c96b6a79a2,person,0.4421875,0.5796875,0.67083335,0.84791666,0
...

–> Converted to proper OpenImageV6 format:

ImageID,Source,LabelName,Confidence,XMin,XMax,YMin,YMax,IsOccluded,IsTruncated,IsGroupOf,IsDepiction,IsInside
...

seg_masks_csv_file: A filename (segmentation_validation_masks_5k.csv) with the MPEG/VCM format that looks like this:

ImageID,LabelName,ImageWidth,ImageHeight,XMin,YMin,XMax,YMax,IsGroupOf,Mask,MaskPath
001464cfae2a30b8,sandwich,1024,683,0.261062,0.245575,0.681416,0.573009,0,eNqtlNlSwzAMR..GtiA5L,001464cfae2a30b8_m0cdn1_5fa59bf3.png
...

We’re using mask bitmaps from the original OpenImageV6 image set, i.e. we’re omitting that “Mask” column that seems to be a byte blob encoded in some way

–> Converted to proper OpenImageV6 format:

MaskPath,ImageID,LabelName,BoxID,BoxXMin,BoxXMax,BoxYMin,BoxYMax,PredictedIoU,Clicks
114d6b81e7b1fa08_m01bl7v_b62eb236.png,114d6b81e7b1fa08,/m/01bl7v,b62eb236,0.036101,0.332130,0.099278,0.888087,0.00000
...

validation_csv_file = detection_validation_labels_5k.csv looks like this:

ImageID,LabelName,Confidence
0001eeaf4aed83f9,airplane,1
000a1249af2bc5f0,person,1
001083f05db4352b,car,1
00146ba1e50ed8d8,person,1
...

–> Converted to proper OpenImageV6 format (into classifications.csv):

ImageID,Source,LabelName,Confidence
0001eeaf4aed83f9,verification,/m/0cmf2,1
0004886b7d043cfd,verification,/m/01g317,0
0004886b7d043cfd,verification,/m/04hgtk,0
0004886b7d043cfd,verification,/m/09j2d,0
...

output_directory: Path to where the OpenImageV6 formatted files are dumped. Files under that path are:

.
├── data : --> softlink to original images
├── labels
│   └── detections.csv          (converted from 'detection_validation_5k_bbox.csv' / 'segmentation_validation_bbox_5k.csv') # bbox_csv_file
|       classifications.csv     (converted from 'detection_validation_labels_5k.csv' / 'segmentation_validation_labels_5k.csv') # validation_csv_file # image-level labels
|       segmentations.csv       (converted from 'segmentation_validation_masks_5k.csv')
|       masks/  --> softlink to original mask png files
└── metadata
    └── classes.csv             take all possible classes from classifications.csv

In particular, detections.csv has this format:

ImageID,Source,LabelName,Confidence,XMin,XMax,YMin,YMax,IsOccluded,IsTruncated,IsGroupOf,IsDepiction,IsInside
0001eeaf4aed83f9,source,tag,1,0.022673031,0.9642005,0.07103825,0.80054647,0,0,0,0,0
...
compressai_vision.pipelines.fo_vcm.conversion.detectron251(res, model_catids: list = [], allowed_labels: list | None = None, verbose=False) list[source]#

Detectron2 formatted results, i.e. {'instances': Instances} into FiftyOne-formatted results

This works for detectors and instance segmentation, where a segmentation is always accompanied with a bounding box

Parameters:
  • res – Detectron2 predictor output (a dictionary {'instances': Instances})

  • model_catids – A category label list, as provided by Detectron2 model’s metadata

Returns FiftyOne Detections instance that can be attached to a FiftyOne Sample instance.

compressai_vision.pipelines.fo_vcm.conversion.findLabels(dataset: Dataset, detection_field: str = 'detections') list[source]#
compressai_vision.pipelines.fo_vcm.conversion.imageIdFileList(*args)[source]#

Just list arguments of .lst files. They will be combined together.

imageIdFileIt(first.lst, second.lst, ..)

.lst file format is:

bef50424c62d12c5.jpg
c540d9c96b6a79a2.jpg
a1b20ed591193c06.jpg
945d6f685752e31b.jpg
d18700eda95548c8.jpg
...

detectron2#

From 51 dataset into Detectron2-compatible dataset

class compressai_vision.pipelines.fo_vcm.conversion.detectron2.FO2DetectronDataset(fo_dataset: Dataset | None = None, detection_field='detections', model_catids=[])[source]#

A class to construct a Detectron2 dataset from a FiftyOne dataset. Subclass of torch.utils.data.Dataset.

Parameters:
  • fo_dataset – fiftyone dataset

  • detection_field – name of member in the FiftyOne Sample where the detector (ground truth) is put into. Default: “detections”.

  • model_catids – a list of category labels as provided from Detectron2 model’s metadata. Used to transform fiftyone category label into an index number used by Detectron2

NOTE: Usually we are more interested in going from Detectron results to FiftyOne format, so you might not use this torch Dataset class that much

refs:

WARNING: at the moment, only detection (not segmentation) is supported

compressai_vision.pipelines.fo_vcm.conversion.detectron2.detectron251(res, model_catids: list = [], allowed_labels: list | None = None, verbose=False) list[source]#

Detectron2 formatted results, i.e. {'instances': Instances} into FiftyOne-formatted results

This works for detectors and instance segmentation, where a segmentation is always accompanied with a bounding box

Parameters:
  • res – Detectron2 predictor output (a dictionary {'instances': Instances})

  • model_catids – A category label list, as provided by Detectron2 model’s metadata

Returns FiftyOne Detections instance that can be attached to a FiftyOne Sample instance.

compressai_vision.pipelines.fo_vcm.conversion.detectron2.findLabels(dataset: Dataset, detection_field: str = 'detections') list[source]#
compressai_vision.pipelines.fo_vcm.conversion.detectron2.findVideoLabels(dataset: Dataset, detection_field: str = 'detections') list[source]#

Video datasets look like this:

Name:        sfu-hw-objects-v1
Media type:  video
Num samples: 1
Persistent:  True
Tags:        []
Sample fields:
    id:         fiftyone.core.fields.ObjectIdField
    filepath:   fiftyone.core.fields.StringField
    tags:       fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
    metadata:   fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.VideoMetadata)
    media_type: fiftyone.core.fields.StringField
    class_tag:  fiftyone.core.fields.StringField
    name_tag:   fiftyone.core.fields.StringField
Frame fields:
    id:           fiftyone.core.fields.ObjectIdField
    frame_number: fiftyone.core.fields.FrameNumberField
    detections:   fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)

Frame labels can be accessed like this:

dataset.distinct("frames.%s.detections.label" % detection_field)

sfu_hw_objects_v1#

compressai_vision.pipelines.fo_vcm.conversion.sfu_hw_objects_v1.read_detections(sample, lis)[source]#

reads detections into a video sample

Parameters:
  • sample – fiftyone.Sample

  • lis – a list of tuples with (frame_number, path)

the file indicated by path has the following annotations format:

class_num, x0, y0, w, h [all in relative coords]

0 0.343100 0.912700 0.181200 0.167800
0 0.696700 0.166200 0.120700 0.314900
...
compressai_vision.pipelines.fo_vcm.conversion.sfu_hw_objects_v1.register(dirname, name='sfu-hw-objects-v1')[source]#

Register SFU-HW-Objects-v1 video directory into fiftyone

├── ClassA
│   ├── Annotations
│       ├── PeopleOnStreet / .txt files, video.webm
│       └── Traffic / .txt files, video.webm
├── ClassB
│   ├── Annotations
│       ├── BasketballDrive
│       ├── BQTerrace
│       ├── Cactus
│       ├── Kimono
│       └── ParkScene
...
...
compressai_vision.pipelines.fo_vcm.conversion.sfu_hw_objects_v1.sfu_txt_files_to_list(basedir)[source]#

Looks from basedir for files

something_NNN.txt

where N is an integer.

The frame numbering starts from “000”.

Returns a sorted list of tuples (index, filename), where indexes are taken (correctly) from the filenames.

compressai_vision.pipelines.fo_vcm.conversion.sfu_hw_objects_v1.video_convert(basedir)[source]#

Converts video from YUV to lossless RAW@MP4

Assumes this directory structure:

basedir/
├── ClassA
│   ├── Annotations
│   │   ├── PeopleOnStreet [151 entries exceeds filelimit, not opening dir]
│   │   └── Traffic [151 entries exceeds filelimit, not opening dir]
│   ├── PeopleOnStreet_2560x1600_30_crop.yuv
│   └── Traffic_2560x1600_30_crop.yuv
├── ClassB
│   ├── Annotations
│   │   ├── BasketballDrive [501 entries exceeds filelimit, not opening dir]
│   │   ├── BQTerrace [601 entries exceeds filelimit, not opening dir]
│   │   ├── Cactus [501 entries exceeds filelimit, not opening dir]
│   │   ├── Kimono [241 entries exceeds filelimit, not opening dir]
│   │   └── ParkScene [241 entries exceeds filelimit, not opening dir]
│   ├── BasketballDrive_1920x1080_50Hz_8bit_P420.yuv
    etc. etc.
```

Takes ClassA/Annotations/PeopleOnStreet_2560x1600_30_crop.yuv and converts it into ClassA/Annotations/PeopleOnStreet/video.webm at the lossless VP9 format.

Same thing for all .yuv files found in the directory tree

tvd_object_tracking_v1#

compressai_vision.pipelines.fo_vcm.conversion.tvd_object_tracking_v1.read_detections(sample, fname)[source]#
Parameters:
  • sample – fiftyone.Sample

  • fname – frame-by-frame annotations

TVD format

[Frame_Index, Object_ID, Top_left_x, Top_left_y, Width, Height, Confidence, 3D_x, 3D_y] ?

or is one of them a class label..? what label set? note that these are abs coordinates

Example:

1,1,193,686,125,331,1,1,1
2,1,193,686,124,330,1,1,1
3,1,194,686,124,330,1,1,1
4,1,197,684,116,339,1,1,1
5,1,194,684,121,330,1,1,1
6,1,199,685,113,335,1,1,1
...
543,1,645,855,47,125,1,1,1
544,1,646,860,48,118,1,1,1
1,3,746,894,1098,106,0,9,1
2,3,746,894,1098,106,0,9,1
...

i.e. note that frame indexes can start again from 1

compressai_vision.pipelines.fo_vcm.conversion.tvd_object_tracking_v1.register(dirname, name='tvd-object-tracking-v1')[source]#

Register tencent video dataset (TVD), object tracking subset.

The directory structure for this looks like:

dirname/
    |
    ├── TVD-01
    │   ├── gt
    │   │   └── gt.txt
    │   ├── img1
    │   └── seqinfo.ini
    ├── TVD-01.mp4
    ├── TVD-02
    │   ├── gt
    │   │   └── gt.txt
    │   ├── img1
    │   └── seqinfo.ini
    ├── TVD-02.mp4
    ├── TVD-03
    │   ├── gt
    │   │   └── gt.txt
    │   ├── img1
    │   └── seqinfo.ini
    └── TVD-03.mp4