Getting started with Videoflow

The main datastructure of Videoflow is a Flow. A Flow is defined as a directed acyclic graph (DAG) of nodes that can be of three types: producers, processors and consumers. Directed edges in the graph represent dependency relationships: A node B as a child of node A in the computation graph means that node B receives as its input(s) the output(s) of node A.

Producer Node

Producer nodes generate data and “place it” on the flow. They have no parents. Examples of producers are nodes that randomly generate sequences of numbers, or nodes that read data from a file, or nodes that consume a video stream. Notice that producer nodes usually consume data from an external data source, but they are called producers because they are the nodes that originate or produce the data and place it in the flow.

Processor Node

Processor nodes receive data as input. They compute or process on it, and return the result of the computation as output to be used by its child nodes.

Consumer Node

Consumer nodes receive or consume data. They have no children. They usually publish the data to sources external to the flow, but they are called consumer nodes because they are the sinks of the flow since no node in the flow receives data from them.

A first Videoflow application

The sample application that we are going to create does the following: It produces integers from 0 to 40 inclusive, at 0.01 second intervals. It computes the aggregate sum of the produced integers and it prints the result to the command line. You can find the complete example here.

The first section of the example imports the nodes to be used:

from videoflow.core import Flow
from videoflow.producers import IntProducer
from videoflow.processors import IdentityProcessor, JoinerProcessor
from videoflow.consumers import CommandlineConsume

After the imports, the example defines the computation graph of nodes:

producer = IntProducer(0, 40, 0.01)
sum_agg = SumAggregator()(producer)
printer = CommandlineConsumer()(sum_agg)

producer is a producer node. sum_agg is a processor node. And printer is a consumer node. producer does not have parents, and printer does not have children. processors and consumers are callable objects. They accept as arguments the parents that they depend on. In this simple example the computation graph of nodes is very simple, a linear one:

../_images/linear_graph.png

The next lines of code create, start and wait for the the flow to finish:

flow = Flow([producer], [printer])
flow.run()
flow.join()

flow = Flow([producer], [printer]) creates the flow. To create the flow you need at least a list of producers. You can also specify a list of consumers and the type of flow (if it is a realtime flow or a batch processing flow). By default a flow is a realtime flow.

Warning

Videoflow currently supports flows with only one producer. If you pass to the flow a list of more than one producer, an exception is raised.

flow.run() creates tasks for each node in the computation graph. Each task runs in an independent CPU process. These tasks communicate and coordinate between each other using queues, but the user of Videoflow does not need to be aware of how this happens.

flow.join() blocks until all the tasks of the flow finish running.

When you run this example, you should see a sequence of monotonically increasing numbers being printed on your screen.

Note

You can write more complex flows with Videoflow really easily:

../_images/videoflow_logo_annotated.png

The graph shown above can be easily translated to code as:

from videoflow.producers import IntProducer
from videoflow.processors import IdentityProcessor
from videoflow.consumers import CommandlineConsumer

A = IntProducer()
B = IdentityProcessor()(A)
C = IdentityProcessor()(B)
D = IdentityProcessor()(C)
E = IdentityProcessor()(D)
F = CommandlineConsumer()(E)
G = CommandLineConsumer()(D)
H = CommandlineConsumer()(D)