compressai_vision.pipelines.fo_vcm.conversion#
- class compressai_vision.pipelines.fo_vcm.conversion.FO2DetectronDataset(fo_dataset: Dataset | None = None, detection_field='detections', model_catids=[])[source]#
A class to construct a Detectron2 dataset from a FiftyOne dataset. Subclass of
torch.utils.data.Dataset
.- Parameters:
fo_dataset – fiftyone dataset
detection_field – name of member in the FiftyOne Sample where the detector (ground truth) is put into. Default: “detections”.
model_catids – a list of category labels as provided from Detectron2 model’s metadata. Used to transform fiftyone category label into an index number used by Detectron2
NOTE: Usually we are more interested in going from Detectron results to FiftyOne format, so you might not use this torch Dataset class that much
refs:
https://voxel51.com/docs/fiftyone/user_guide/using_datasets.html
https://towardsdatascience.com/stop-wasting-time-with-pytorch-datasets-17cac2c22fa8
https://medium.com/voxel51/how-to-train-your-dragon-detector-a35ed4672ca7
WARNING: at the moment, only detection (not segmentation) is supported
- compressai_vision.pipelines.fo_vcm.conversion.MPEGVCMToOpenImageV6(validation_csv_file: str | None = None, list_file: str | None = None, bbox_csv_file: str | None = None, segmentation_csv_file: str | None = None, output_directory: str | None = None, data_dir: str | None = None, mask_dir: str | None = None, link=True, verbose=False, append_mask_dir=None)[source]#
From MPEG/VCM input file format to proper OpenImageV6 format
- Parameters:
validation_csv_file – MPEG/VCM image-level labels (typically
detection_validation_labels_5k.csv
orsegmentation_validation_labels_5k.csv
)list_file – MPEG/VCM image list (typically
detection_validation_input_5k.lst
orsegmentation_validation_input_5k.lst
)bbox_csv_file – MPEG/VCM detection input file (typically
detection_validation_5k_bbox.csv
orsegmentation_validation_bbox_5k.csv
)seg_masks_csv_file – MPEG/VCM segmentation input file (typically
segmentation_validation_masks_5k.csv
)output_directory – Path where the OpenImageV6 formatted files are dumped
data_dir – Source directory where the image jpg files are. Use the standard OpenImageV6 directory.
mask_dir – Source directory where the mask png files are. Use the standard OpenImageV6 directory.
link – True (default): create a softlink from source data_dir to target data_dir. False: copy all images to target.
More details on the conversion follow
bbox_csv_file
: A filename (detection_validation_5k_bbox.csv
) with the MPEG/VCM format that looks like this:ImageID,LabelName,XMin,XMax,YMin,YMax,IsGroupOf bef50424c62d12c5,airplane,0.15641026,0.8282050999999999,0.16284987,0.82188296,0 c540d9c96b6a79a2,person,0.4421875,0.5796875,0.67083335,0.84791666,0 ...
–> Converted to proper OpenImageV6 format:
ImageID,Source,LabelName,Confidence,XMin,XMax,YMin,YMax,IsOccluded,IsTruncated,IsGroupOf,IsDepiction,IsInside ...
seg_masks_csv_file
: A filename (segmentation_validation_masks_5k.csv
) with the MPEG/VCM format that looks like this:ImageID,LabelName,ImageWidth,ImageHeight,XMin,YMin,XMax,YMax,IsGroupOf,Mask,MaskPath 001464cfae2a30b8,sandwich,1024,683,0.261062,0.245575,0.681416,0.573009,0,eNqtlNlSwzAMR..GtiA5L,001464cfae2a30b8_m0cdn1_5fa59bf3.png ...
We’re using mask bitmaps from the original OpenImageV6 image set, i.e. we’re omitting that “Mask” column that seems to be a byte blob encoded in some way
–> Converted to proper OpenImageV6 format:
MaskPath,ImageID,LabelName,BoxID,BoxXMin,BoxXMax,BoxYMin,BoxYMax,PredictedIoU,Clicks 114d6b81e7b1fa08_m01bl7v_b62eb236.png,114d6b81e7b1fa08,/m/01bl7v,b62eb236,0.036101,0.332130,0.099278,0.888087,0.00000 ...
validation_csv_file
=detection_validation_labels_5k.csv
looks like this:ImageID,LabelName,Confidence 0001eeaf4aed83f9,airplane,1 000a1249af2bc5f0,person,1 001083f05db4352b,car,1 00146ba1e50ed8d8,person,1 ...
–> Converted to proper OpenImageV6 format (into
classifications.csv
):ImageID,Source,LabelName,Confidence 0001eeaf4aed83f9,verification,/m/0cmf2,1 0004886b7d043cfd,verification,/m/01g317,0 0004886b7d043cfd,verification,/m/04hgtk,0 0004886b7d043cfd,verification,/m/09j2d,0 ...
output_directory
: Path to where the OpenImageV6 formatted files are dumped. Files under that path are:. ├── data : --> softlink to original images ├── labels │ └── detections.csv (converted from 'detection_validation_5k_bbox.csv' / 'segmentation_validation_bbox_5k.csv') # bbox_csv_file | classifications.csv (converted from 'detection_validation_labels_5k.csv' / 'segmentation_validation_labels_5k.csv') # validation_csv_file # image-level labels | segmentations.csv (converted from 'segmentation_validation_masks_5k.csv') | masks/ --> softlink to original mask png files └── metadata └── classes.csv take all possible classes from classifications.csv
In particular,
detections.csv
has this format:ImageID,Source,LabelName,Confidence,XMin,XMax,YMin,YMax,IsOccluded,IsTruncated,IsGroupOf,IsDepiction,IsInside 0001eeaf4aed83f9,source,tag,1,0.022673031,0.9642005,0.07103825,0.80054647,0,0,0,0,0 ...
- compressai_vision.pipelines.fo_vcm.conversion.detectron251(res, model_catids: list = [], allowed_labels: list | None = None, verbose=False) list [source]#
Detectron2 formatted results, i.e.
{'instances': Instances}
into FiftyOne-formatted resultsThis works for detectors and instance segmentation, where a segmentation is always accompanied with a bounding box
- Parameters:
res – Detectron2 predictor output (a dictionary
{'instances': Instances}
)model_catids – A category label list, as provided by Detectron2 model’s metadata
Returns FiftyOne
Detections
instance that can be attached to a FiftyOneSample
instance.
- compressai_vision.pipelines.fo_vcm.conversion.findLabels(dataset: Dataset, detection_field: str = 'detections') list [source]#
- compressai_vision.pipelines.fo_vcm.conversion.imageIdFileList(*args)[source]#
Just list arguments of .lst files. They will be combined together.
imageIdFileIt(first.lst, second.lst, ..)
.lst file format is:
bef50424c62d12c5.jpg c540d9c96b6a79a2.jpg a1b20ed591193c06.jpg 945d6f685752e31b.jpg d18700eda95548c8.jpg ...
detectron2#
From 51 dataset into Detectron2-compatible dataset
- class compressai_vision.pipelines.fo_vcm.conversion.detectron2.FO2DetectronDataset(fo_dataset: Dataset | None = None, detection_field='detections', model_catids=[])[source]#
A class to construct a Detectron2 dataset from a FiftyOne dataset. Subclass of
torch.utils.data.Dataset
.- Parameters:
fo_dataset – fiftyone dataset
detection_field – name of member in the FiftyOne Sample where the detector (ground truth) is put into. Default: “detections”.
model_catids – a list of category labels as provided from Detectron2 model’s metadata. Used to transform fiftyone category label into an index number used by Detectron2
NOTE: Usually we are more interested in going from Detectron results to FiftyOne format, so you might not use this torch Dataset class that much
refs:
https://voxel51.com/docs/fiftyone/user_guide/using_datasets.html
https://towardsdatascience.com/stop-wasting-time-with-pytorch-datasets-17cac2c22fa8
https://medium.com/voxel51/how-to-train-your-dragon-detector-a35ed4672ca7
WARNING: at the moment, only detection (not segmentation) is supported
- compressai_vision.pipelines.fo_vcm.conversion.detectron2.detectron251(res, model_catids: list = [], allowed_labels: list | None = None, verbose=False) list [source]#
Detectron2 formatted results, i.e.
{'instances': Instances}
into FiftyOne-formatted resultsThis works for detectors and instance segmentation, where a segmentation is always accompanied with a bounding box
- Parameters:
res – Detectron2 predictor output (a dictionary
{'instances': Instances}
)model_catids – A category label list, as provided by Detectron2 model’s metadata
Returns FiftyOne
Detections
instance that can be attached to a FiftyOneSample
instance.
- compressai_vision.pipelines.fo_vcm.conversion.detectron2.findLabels(dataset: Dataset, detection_field: str = 'detections') list [source]#
- compressai_vision.pipelines.fo_vcm.conversion.detectron2.findVideoLabels(dataset: Dataset, detection_field: str = 'detections') list [source]#
Video datasets look like this:
Name: sfu-hw-objects-v1 Media type: video Num samples: 1 Persistent: True Tags: [] Sample fields: id: fiftyone.core.fields.ObjectIdField filepath: fiftyone.core.fields.StringField tags: fiftyone.core.fields.ListField(fiftyone.core.fields.StringField) metadata: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.VideoMetadata) media_type: fiftyone.core.fields.StringField class_tag: fiftyone.core.fields.StringField name_tag: fiftyone.core.fields.StringField Frame fields: id: fiftyone.core.fields.ObjectIdField frame_number: fiftyone.core.fields.FrameNumberField detections: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)
Frame labels can be accessed like this:
dataset.distinct("frames.%s.detections.label" % detection_field)
sfu_hw_objects_v1#
- compressai_vision.pipelines.fo_vcm.conversion.sfu_hw_objects_v1.read_detections(sample, lis)[source]#
reads detections into a video sample
- Parameters:
sample – fiftyone.Sample
lis – a list of tuples with (frame_number, path)
the file indicated by path has the following annotations format:
class_num, x0, y0, w, h [all in relative coords]
0 0.343100 0.912700 0.181200 0.167800 0 0.696700 0.166200 0.120700 0.314900 ...
- compressai_vision.pipelines.fo_vcm.conversion.sfu_hw_objects_v1.register(dirname, name='sfu-hw-objects-v1')[source]#
Register SFU-HW-Objects-v1 video directory into fiftyone
├── ClassA │ ├── Annotations │ ├── PeopleOnStreet / .txt files, video.webm │ └── Traffic / .txt files, video.webm ├── ClassB │ ├── Annotations │ ├── BasketballDrive │ ├── BQTerrace │ ├── Cactus │ ├── Kimono │ └── ParkScene ... ...
- compressai_vision.pipelines.fo_vcm.conversion.sfu_hw_objects_v1.sfu_txt_files_to_list(basedir)[source]#
Looks from basedir for files
something_NNN.txt
where N is an integer.
The frame numbering starts from “000”.
Returns a sorted list of tuples (index, filename), where indexes are taken (correctly) from the filenames.
- compressai_vision.pipelines.fo_vcm.conversion.sfu_hw_objects_v1.video_convert(basedir)[source]#
Converts video from YUV to lossless RAW@MP4
Assumes this directory structure:
basedir/ ├── ClassA │ ├── Annotations │ │ ├── PeopleOnStreet [151 entries exceeds filelimit, not opening dir] │ │ └── Traffic [151 entries exceeds filelimit, not opening dir] │ ├── PeopleOnStreet_2560x1600_30_crop.yuv │ └── Traffic_2560x1600_30_crop.yuv ├── ClassB │ ├── Annotations │ │ ├── BasketballDrive [501 entries exceeds filelimit, not opening dir] │ │ ├── BQTerrace [601 entries exceeds filelimit, not opening dir] │ │ ├── Cactus [501 entries exceeds filelimit, not opening dir] │ │ ├── Kimono [241 entries exceeds filelimit, not opening dir] │ │ └── ParkScene [241 entries exceeds filelimit, not opening dir] │ ├── BasketballDrive_1920x1080_50Hz_8bit_P420.yuv etc. etc. ```
Takes
ClassA/Annotations/PeopleOnStreet_2560x1600_30_crop.yuv
and converts it intoClassA/Annotations/PeopleOnStreet/video.webm
at the lossless VP9 format.Same thing for all .yuv files found in the directory tree
tvd_object_tracking_v1#
- compressai_vision.pipelines.fo_vcm.conversion.tvd_object_tracking_v1.read_detections(sample, fname)[source]#
- Parameters:
sample – fiftyone.Sample
fname – frame-by-frame annotations
TVD format
[Frame_Index, Object_ID, Top_left_x, Top_left_y, Width, Height, Confidence, 3D_x, 3D_y] ?
or is one of them a class label..? what label set? note that these are abs coordinates
Example:
1,1,193,686,125,331,1,1,1 2,1,193,686,124,330,1,1,1 3,1,194,686,124,330,1,1,1 4,1,197,684,116,339,1,1,1 5,1,194,684,121,330,1,1,1 6,1,199,685,113,335,1,1,1 ... 543,1,645,855,47,125,1,1,1 544,1,646,860,48,118,1,1,1 1,3,746,894,1098,106,0,9,1 2,3,746,894,1098,106,0,9,1 ...
i.e. note that frame indexes can start again from 1
- compressai_vision.pipelines.fo_vcm.conversion.tvd_object_tracking_v1.register(dirname, name='tvd-object-tracking-v1')[source]#
Register tencent video dataset (TVD), object tracking subset.
The directory structure for this looks like:
dirname/ | ├── TVD-01 │ ├── gt │ │ └── gt.txt │ ├── img1 │ └── seqinfo.ini ├── TVD-01.mp4 ├── TVD-02 │ ├── gt │ │ └── gt.txt │ ├── img1 │ └── seqinfo.ini ├── TVD-02.mp4 ├── TVD-03 │ ├── gt │ │ └── gt.txt │ ├── img1 │ └── seqinfo.ini └── TVD-03.mp4