videoflow.processors.vision package

The videoflow.processors.vision package contains processors that are used in the computer vision domain, such as detectors, classifiers, trackers, pose estimation, etc.

Submodules

videoflow.processors.vision.annotators module

class videoflow.processors.vision.annotators.BoundingBoxAnnotator(class_labels_path=None, class_labels_dataset='coco', box_color=(255, 225, 0), box_thickness=2, text_color=(255, 255, 0), nb_tasks=1)

Bases: videoflow.processors.vision.annotators.ImageAnnotator

Draws bounding boxes on images.

  • Arguments:
    • class_labels_path: path to pbtxt file that defines the labels indexes

    • class_labels_dataset: If class_labels_path is None, then we use this attribute to download the file from the releases folder. Currently supported datasets are: coco, oidv4, pascal and kitti.

    • box_color: color to use to draw the boxes

    • box_thickness: thickness of boxes to draw

    • text_color: color of text to draw

supported_datasets = ['coco', 'oidv4', 'pascal', 'kitti']
class videoflow.processors.vision.annotators.ImageAnnotator(nb_tasks=1)

Bases: videoflow.core.node.ProcessorNode

Interface for all image annotators. All image annotators receive as input an image and annotation metadata, and return as output a copy of the image with the drawings representing the metadata.

process(im: numpy.core.multiarray.array, annotations: any) → numpy.core.multiarray.array

Returns a copy of im visually annotated with the annotations defined in annotations

class videoflow.processors.vision.annotators.SegmenterAnnotator(class_labels_path=None, class_labels_dataset='coco', transparency=0.5, nb_tasks=1)

Bases: videoflow.processors.vision.annotators.ImageAnnotator

Draws bounding boxes on images. - Arguments:

  • class_labels_path: path to pbtxt file that defines the labels indexes

  • class_labels_dataset: If class_labels_path is None, then we use this attribute to download the file from the releases folder. Currently supported datasets are: coco, oidv4, pascal and kitti.

  • transparency: A value between 0 and 1 for the transparency of the mask

colors = [(255, 0, 0), (0, 255, 0), (0, 0, 255), (255, 255, 0), (255, 0, 255), (0, 255, 255), (255, 127.5, 0), (255, 0, 127.5), (127.5, 255, 0), (0, 255, 127.5), (127.5, 0, 255), (0, 127.5, 255)]
supported_datasets = ['coco', 'oidv4', 'pascal', 'kitti']
class videoflow.processors.vision.annotators.TrackerAnnotator(box_color=(255, 225, 0), box_thickness=2, text_color=(255, 255, 255), nb_tasks=1)

Bases: videoflow.processors.vision.annotators.ImageAnnotator

Draws bounding boxes on images with track id.

videoflow.processors.vision.counters module

videoflow.processors.vision.detectors module

Collection of object detection processors

class videoflow.processors.vision.detectors.ObjectDetector(nb_tasks: int = 1, device_type='cpu', **kwargs)

Bases: videoflow.core.node.ProcessorNode

Abstract class that defines the interface of object detectors

process(im: numpy.core.multiarray.array) → numpy.core.multiarray.array
  • Arguments:
    • im (np.array): (h, w, 3)

  • Returns:
    • dets: np.array of shape (nb_boxes, 6) Specifically (nb_boxes, [ymin, xmin, ymax, xmax, class_index, score])

      The box coordinates are returned unnormalized (values NOT between 0 and 1, but using the original dimension of the image)

class videoflow.processors.vision.detectors.TensorflowObjectDetector(num_classes=90, path_to_pb_file=None, architecture='ssd-resnet50-fpn', dataset='coco', min_score_threshold=0.5, nb_tasks=1, device_type='gpu')

Bases: videoflow.processors.vision.detectors.ObjectDetector

Finds object detections by running a Tensorflow model on an image.

Initializes the tensorflow model. If path_to_pb_file is provided, it uses a local model. If not, it uses architecture and dataset parameters to download tensorflow pretrained models.

Models supported COCO Dataset

Model

Speed (ms)

COCO mAP

ssd-mobilenetv2_coco

30

21

ssd-resnet50-fpn_coco

76

35

fasterrcnn-resnet101_coco

106

32

Models supported Kitti Dataset

Model

Speed (ms)

Pascal mAP@0.5

fasterrcnn-resnet101_kitti

79

87

Models supported Open Images V4 Dataset

Model

Speed (ms)

Open Images V4 mAP@0.5

fasterrcnn-inception-resnetv2-atrous_oidv4

425

54

ssd-mobilenetv2_oidv4

89

36

  • Arguments:
    • num_classes (int): number of classes that the detector can recognize.

    • path_to_pb_file (str): Path where model pb file is It expects the model to have the following input tensors: image_tensor:0, and the following output tensors: detection_boxes:0, detection_scores:0, detection_classes:0, and num_detections:0. If no path is provided, then it will download the model from the internet using the values provided for architecture and dataset.

    • architecture (str): One of the architectures mentioned in the tables above.

    • dataset (str): coco, kitti and oidv4 are accepted.

    • min_score_threshold (float): detection will filter out entries with score below threshold score

close()

Closes tensorflow model session.

open()

Creates session with tensorflow model

supported_models = ['ssd-mobilenetv2_coco', 'ssd-resnet50-fpn_coco', 'fasterrcnn-resnet101_coco', 'fasterrcnn-resnet101_kitti', 'fasterrcnn-inception-resnetv2-atrous_oidv4', 'ssd-mobilenetv2_oidv4']

videoflow.processors.vision.pose module

videoflow.processors.vision.segmentation module

class videoflow.processors.vision.segmentation.Segmenter(nb_tasks: int = 1, device_type='cpu', **kwargs)

Bases: videoflow.core.node.ProcessorNode

Abstract class that defines the interface to do image segmentation in images

process(im: numpy.core.multiarray.array) → numpy.core.multiarray.array
  • Arguments:
    • im (np.array): (h, w, 3)

  • Returns:
    • masks: np.array of shape (nb_masks, h, w)

    • classes: np.array of shape (nb_masks,)

    • scores: np.array of shape (nb_masks,)

class videoflow.processors.vision.segmentation.TensorflowSegmenter(num_classes=90, path_to_pb_file=None, architecture='maskrcnn-inceptionv2', dataset='coco', min_score_threshold=0.5, nb_tasks=1, device_type='gpu')

Bases: videoflow.processors.vision.segmentation.Segmenter

Finds masks by running a Tensorflow model on an image.

Initializes the tensorflow model. If path_to_pb_file is provided, it uses a local model. If not, it uses architecture and dataset parameters to download tensorflow pretrained models.

Models supported COCO dataset

Model

Speed (ms)

COCO mAP

maskrcnn-resnet101_coco

470

33

maskrcnn-inceptionv2_coco

79

25

  • Arguments:
    • num_classes (int): The number of classes that the segmenter can recognize.

    • path_to_pb_file (str): Path where model pb file is It expects the model to have the following input tensors: image_tensor:0, and the following output tensors: detection_boxes:0, detection_scores:0, detection_classes:0, num_detections:0 and detection_masks:0. If no path is provided, then it will download the model from the internet using the values provided for architecture and dataset.

    • architecture (str): One of the architectures mentioned in the tables above is accepted.

    • dataset (str): For now, only coco is accepted.

    • min_score_threshold (float): detection will filter out entries with score below threshold score

close()

Closes tensorflow model session.

open()

Creates session with tensorflow model

supported_models = ['maskrcnn-resnet101_coco', 'maskrcnn-inceptionv2_coco']
videoflow.processors.vision.segmentation.reframe_box_masks_to_image_masks(box_masks, boxes, h, w)

Transforms the box masks back to full image masks

  • Arguments:
    • box_masks: tf.float32 tensor of size [nb_masks, height, width],

    • boxes: tf.float32 tensor of size [nb_masks, 4] containing box coordinates [ymin, xmin, ymax, xmax]. Note that the coordinates are normalized

    • h: Image height. The output masks will have height h.

    • w: Image width. The output masks will have widht w.

  • Returns:
    • image_masks: tf.float32 tnesor of size [nb_masks, h, w]

videoflow.processors.vision.trackers module

class videoflow.processors.vision.trackers.BoundingBoxTracker(*args, **kwargs)

Bases: videoflow.core.node.OneTaskProcessorNode

Tracks bounding boxes from one frame to another. It keeps an internal state representation that allows it to track across frames.

process(dets: numpy.core.multiarray.array) → numpy.core.multiarray.array
  • Arguments:
    • dets: np.array of shape (nb_boxes, 6) Specifically (nb_boxes, [ymin, xmin, ymax, xmax, class_index, score])

  • Returns:
    • tracks: np.array of shape (nb_boxes, 5) Specifically (nb_boxes, [ymin, xmin, ymax, xmax, track_id])

class videoflow.processors.vision.trackers.KalmanBoxTracker(bbox)

Bases: object

This class represents the internel state of individual tracked objects observed as bbox.

count = 0
get_state()

Returns the current bounding box estimate.

predict()

Advances the state vector and returns the predicted bounding box estimate.

update(bbox)

Updates the state vector with observed bbox.

class videoflow.processors.vision.trackers.KalmanFilterBoundingBoxTracker(max_age=7, min_hits=3, metric_function_type='iou')

Bases: videoflow.processors.vision.trackers.BoundingBoxTracker

  • Arguments:
    • max_age: If no bounding box is matched to an internal tracklet for max_age steps the internal tracklet is considered dead and is removed.

    • min_hits: A tracklet is considered a valid track if it has a hit streak larger than or equal to min_hits

    • metric_function_type : str, one of iou or euclidean

videoflow.processors.vision.trackers.associate_detections_to_trackers(detections, trackers, metric_function, iou_threshold=0.1)

Assigns detections to tracked object (both represented as bounding boxes) Returns 3 lists of matches, unmatched_detections and unmatched_trackers

videoflow.processors.vision.trackers.convert_bbox_to_z(bbox)
Takes a bounding box in the form [x1, y1, x2, y2] and returns z in the form

[x, y, s, r] where x, y is the centre of the box and s is the scale/area and r is the aspect ratio

videoflow.processors.vision.trackers.convert_x_to_bbox(x, score=None)

Takes a bounding box in the form [x, y, s, r] and returns it in the form [y1, x1, y2, x2] where x1, y1 is the top left and x2, y2 is the bottom right

videoflow.processors.vision.trackers.eucl(bb_test, bb_gt)

Computes the euclidean distance between two boxes in the form [x1, y1, x2, y2]

videoflow.processors.vision.trackers.iou(bb_test, bb_gt)

Computes IUO between two bboxes in the form [y1, x1, y2, x2] IOU is the intersection of areas.

videoflow.processors.vision.trackers.metric_factory(metric_type)

videoflow.processors.vision.transformers module

class videoflow.processors.vision.transformers.CropImageTransformer(crop_dimensions)

Bases: videoflow.core.node.ProcessorNode

  • Arguments:
    • crop_dimensions (tuple): (ymin, xmin, ymax, xmax)

process(im: numpy.core.multiarray.array) → numpy.core.multiarray.array

Crops image according to the coordinates in crop_dimensions. If those coordinates are out of bounds, it will raise errors

  • Arguments:
    • im (np.array): shape of (h, w, 3)

  • Raises:
    • ValueError:
      • If any of crop_dimensions less than 0

      • If any of crop_dimensions out of bounds

      • If ymin > ymax or xmin > xmax

class videoflow.processors.vision.transformers.MaskImageTransformer

Bases: videoflow.core.node.ProcessorNode

process(im: numpy.core.multiarray.array, mask: numpy.core.multiarray.array) → numpy.core.multiarray.array

Masks an image according to given masks

  • Arguments:
    • im (np.array): shape of (h, w, 3)

    • mask (np.array): (h, w) of type np.float32, with values between zero and one

  • Raises:
    • ValueError:
      • If mask does not have same height and width as im

class videoflow.processors.vision.transformers.ResizeImageTransformer(maintain_ratio=False)

Bases: videoflow.core.node.ProcessorNode

process(im: numpy.core.multiarray.array, new_size) → numpy.core.multiarray.array

Resizes image according to coordinates in new_size

  • Arguments:
    • im (np.array): shape of (h, w, 3)

    • new_size (tuple): (new_height, new_width)

  • Raises:
    • ValueError:
      • If new_height or new_width are negative