Skip to content

flowmachine.core.dependency_graph

calculate_dependency_graph

calculate_dependency_graph(query_obj: 'Query', analyse: bool = False) -> networkx.classes.digraph.DiGraph
Source: flowmachine/core/dependency_graph.py

Produce a graph of all the queries that go into producing this one, with their estimated run costs, and whether they are stored as node attributes. The resulting networkx object can then be visualised, or analysed. When visualised, nodes corresponding to stored queries will be rendered green. See the function plot_dependency_graph() for a convenient way of plotting a dependency graph directly for visualisation in a Jupyter notebook. The dependency graph includes the estimated cost of the query in the 'cost' attribute, the query object the node represents in the 'query_object' attribute, and with the analyse parameter set to true, the actual running time of the query in the runtime attribute.

Parameters

  • query_obj: Query

    Query object to produce a dependency graph for.

  • analyse: bool, default False

    Set to True to get actual runtimes for queries. Note that this will actually run the query!

Returns

  • networkx.classes.digraph.DiGraph

Examples

If you don't want to visualise the dependency graph directly (for example using plot_dependency_graph(), you can export it to a .dot file as follows:

import flowmachine
from flowmachine.features import daily_location
from networkx.drawing.nx_agraph import write_dot
flowmachine.connect()
G = daily_location("2016-01-01").dependency_graph()
write_dot(G, "daily_location_dependencies.dot")
G = daily_location("2016-01-01").dependency_graph(True)
write_dot(G, "daily_location_dependencies_runtimes.dot")
The resulting .dot file then be converted to a .pdf file using the external tool dot which comes as part of the GraphViz package:
$ dot -Tpdf daily_location_dependencies.dot -o daily_location_dependencies.pdf [Graphviz]: https://www.graphviz.org/

Note

The queries listed as dependencies are not guaranteed to be used in the actual running of a query, only to be referenced by it.

dependencies_eligible_for_store

dependencies_eligible_for_store(query_obj: 'Query') -> Set[ForwardRef('Query')]
Source: flowmachine/core/dependency_graph.py

Get the set of dependencies for this query which may be stored before it is run.

Parameters

  • query_obj: Query

    Query object to get potentially eligible dependencies for

Returns

  • typing.Set[ForwardRef('Query')]

    The set of dependencies of this query which may be stored before it is run.

executing_dependencies

executing_dependencies(eligible_dependencies: Set[ForwardRef('Query')]) -> List[ForwardRef('Query')]
Source: flowmachine/core/dependency_graph.py

Get the query objects from a set which are currently executing.

Parameters

  • eligible_dependencies: typing.Set[ForwardRef('Query')]

    Set of query objects that might be executing

Returns

  • typing.List[ForwardRef('Query')]

    List of query objects currently executing

plot_dependency_graph

plot_dependency_graph(query_obj: 'Query', analyse: bool = False, format: str = 'png', width: Union[int, NoneType] = None, height: Union[int, NoneType] = None) -> Union[ForwardRef('IPython.display.Image'), ForwardRef('IPython.display.SVG')]
Source: flowmachine/core/dependency_graph.py

Plot a graph of all the queries that go into producing this one (see calculate_dependency_graph for more details). This returns an IPython.display object which can be directly displayed in Jupyter notebooks. Note that this requires the IPython and pygraphviz packages to be installed.

Parameters

  • query_obj: Query

    Query object to plot a dependency graph for.

  • analyse: bool, default False

    Set to True to get actual runtimes for queries. Note that this will actually run the query!

  • format: str, default png

    Output format of the resulting

  • width: typing.Union[int, NoneType], default None

    Width in pixels to which to constrain the image. Note this is only supported for format="png".

  • height: typing.Union[int, NoneType], default None

    Height in pixels to which to constrain the image. Note this is only supported for format="png".

Returns

  • typing.Union[ForwardRef('IPython.display.Image'), ForwardRef('IPython.display.SVG')]

print_dependency_tree(query_obj: 'Query', show_stored: bool = False, stream: Union[ForwardRef('IOBase'), NoneType] = None, indent_level: int = 0) -> None
Source: flowmachine/core/dependency_graph.py

Print the dependencies of a flowmachine query in a tree-like structure.

Parameters

  • query_obj: Query

    An instance of a query object.

  • show_stored: bool, default False

    If True, show for each query whether it is stored or not. Default: False.

  • stream: typing.Union[ForwardRef('IOBase'), NoneType], default None

    The stream to which the output should be written (default: stdout).

  • indent_level: int, default 0

    The current level of indentation.

query_progress

query_progress(query: 'Query') -> Dict[str, int]
Source: flowmachine/core/dependency_graph.py

Check the progress of a query.

Parameters

  • query: Query

    Query object to check progress of

Returns

  • typing.Dict[str, int]

    eligible: Number of subqueries that must be run queued: number queued to be run executing: number currently running

queued_dependencies

queued_dependencies(eligible_dependencies: Set[ForwardRef('Query')]) -> List[ForwardRef('Query')]
Source: flowmachine/core/dependency_graph.py

Get the query objects from a set which are currently queued.

Parameters

  • eligible_dependencies: typing.Set[ForwardRef('Query')]

    Set of query objects that might be queued

Returns

  • typing.List[ForwardRef('Query')]

    List of query objects currently queued

store_all_unstored_dependencies

store_all_unstored_dependencies(query_obj: 'Query') -> None
Source: flowmachine/core/dependency_graph.py

Store all of the unstored dependencies of a query.

Parameters

  • query_obj: Query

    Query object whose dependencies will be stored.

Note

This function stores only the unstored dependencies of a query, and not the query itself. This is a blocking function. Storing the dependencies happens in background threads, but this function will not return until all the dependencies are stored.

store_queries_in_order

store_queries_in_order(dependency_graph: networkx.classes.digraph.DiGraph) -> Dict[str, ForwardRef('Future')]
Source: flowmachine/core/dependency_graph.py

Execute queries in an order that ensures each query store is triggered after its dependencies.

Parameters

  • dependency_graph: networkx.classes.digraph.DiGraph

    Dependency graph of query objects to be stored

Returns

  • typing.Dict[str, ForwardRef('Future')]

    Mapping from query nodes to Future objects representing the store tasks

unstored_dependencies_graph

unstored_dependencies_graph(query_obj: 'Query') -> networkx.classes.digraph.DiGraph
Source: flowmachine/core/dependency_graph.py

Produce a dependency graph of the unstored queries on which this query depends.

Parameters

  • query_obj: Query

    Query object to produce a dependency graph for.

Returns

  • networkx.classes.digraph.DiGraph

Note

If store() or invalidate_db_cache() is called on any query while this function is executing, the resulting graph may not be correct. The queries listed as dependencies are not guaranteed to be used in the actual running of a query, only to be referenced by it.