flowmachine.core.dependency_graph¶
calculate_dependency_graph¶
calculate_dependency_graph(query_obj: 'Query', analyse: bool = False) -> networkx.classes.digraph.DiGraph
Produce a graph of all the queries that go into producing this one, with their estimated run costs, and whether they are stored as node attributes.
The resulting networkx object can then be visualised, or analysed. When visualised, nodes corresponding to stored queries will be rendered green. See the function plot_dependency_graph()
for a convenient way of plotting a dependency graph directly for visualisation in a Jupyter notebook. The dependency graph includes the estimated cost of the query in the 'cost' attribute, the query object the node represents in the 'query_object' attribute, and with the analyse parameter set to true, the actual running time of the query in the runtime
attribute.
Parameters¶
-
query_obj
:Query
Query object to produce a dependency graph for.
-
analyse
:bool
, defaultFalse
Set to True to get actual runtimes for queries. Note that this will actually run the query!
Returns¶
networkx.classes.digraph.DiGraph
Examples¶
If you don't want to visualise the dependency graph directly (for example using plot_dependency_graph()
, you can export it to a .dot file as follows:
import flowmachine
from flowmachine.features import daily_location
from networkx.drawing.nx_agraph import write_dot
flowmachine.connect()
G = daily_location("2016-01-01").dependency_graph()
write_dot(G, "daily_location_dependencies.dot")
G = daily_location("2016-01-01").dependency_graph(True)
write_dot(G, "daily_location_dependencies_runtimes.dot")
dot
which comes as part of the GraphViz package:$ dot -Tpdf daily_location_dependencies.dot -o daily_location_dependencies.pdf
[Graphviz]: https://www.graphviz.org/
Note
The queries listed as dependencies are not guaranteed to be used in the actual running of a query, only to be referenced by it.
dependencies_eligible_for_store¶
dependencies_eligible_for_store(query_obj: 'Query') -> Set[ForwardRef('Query')]
Get the set of dependencies for this query which may be stored before it is run.
Parameters¶
-
query_obj
:Query
Query object to get potentially eligible dependencies for
Returns¶
-
typing.Set
The set of dependencies of this query which may be stored before it is run.
executing_dependencies¶
executing_dependencies(eligible_dependencies: Set[ForwardRef('Query')]) -> List[ForwardRef('Query')]
Get the query objects from a set which are currently executing.
Parameters¶
-
eligible_dependencies
:typing.Set
Set of query objects that might be executing
Returns¶
-
typing.List
List of query objects currently executing
plot_dependency_graph¶
plot_dependency_graph(query_obj: 'Query', analyse: bool = False, format: str = 'png', width: Optional[int] = None, height: Optional[int] = None) -> Union[ForwardRef('IPython.display.Image'), ForwardRef('IPython.display.SVG')]
Plot a graph of all the queries that go into producing this one (see calculate_dependency_graph
for more details). This returns an IPython.display object which can be directly displayed in Jupyter notebooks.
Note that this requires the IPython and pygraphviz packages to be installed.
Parameters¶
-
query_obj
:Query
Query object to plot a dependency graph for.
-
analyse
:bool
, defaultFalse
Set to True to get actual runtimes for queries. Note that this will actually run the query!
-
format
:str
, defaultpng
Output format of the resulting
-
width
:typing.Optional
, defaultNone
Width in pixels to which to constrain the image. Note this is only supported for format="png".
-
height
:typing.Optional
, defaultNone
Height in pixels to which to constrain the image. Note this is only supported for format="png".
Returns¶
typing.Union
print_dependency_tree¶
print_dependency_tree(query_obj: 'Query', show_stored: bool = False, stream: Optional[ForwardRef('IOBase')] = None, indent_level: int = 0) -> None
Print the dependencies of a flowmachine query in a tree-like structure.
Parameters¶
-
query_obj
:Query
An instance of a query object.
-
show_stored
:bool
, defaultFalse
If True, show for each query whether it is stored or not. Default: False.
-
stream
:typing.Optional
, defaultNone
The stream to which the output should be written (default: stdout).
-
indent_level
:int
, default0
The current level of indentation.
query_progress¶
query_progress(query: 'Query') -> Dict[str, int]
Check the progress of a query.
Parameters¶
-
query
:Query
Query object to check progress of
Returns¶
-
typing.Dict
eligible: Number of subqueries that must be run queued: number queued to be run executing: number currently running
queued_dependencies¶
queued_dependencies(eligible_dependencies: Set[ForwardRef('Query')]) -> List[ForwardRef('Query')]
Get the query objects from a set which are currently queued.
Parameters¶
-
eligible_dependencies
:typing.Set
Set of query objects that might be queued
Returns¶
-
typing.List
List of query objects currently queued
store_queries_in_order¶
store_queries_in_order(dependency_graph: networkx.classes.digraph.DiGraph) -> Dict[str, ForwardRef('Future')]
Execute queries in an order that ensures each query store is triggered after its dependencies.
Parameters¶
-
dependency_graph
:networkx.classes.digraph.DiGraph
Dependency graph of query objects to be stored
Returns¶
-
typing.Dict
Mapping from query nodes to Future objects representing the store tasks
unstored_dependencies_graph¶
unstored_dependencies_graph(query_obj: 'Query') -> networkx.classes.digraph.DiGraph
Produce a dependency graph of the unstored queries on which this query depends. If a dependency is stored, or is in the queue to be stored, the dependencies of that dependency will not be included in the graph.
Parameters¶
-
query_obj
:Query
Query object to produce a dependency graph for.
Returns¶
networkx.classes.digraph.DiGraph
Note
If store() or invalidate_db_cache() is called on any query while this function is executing, the resulting graph may not be correct. The queries listed as dependencies are not guaranteed to be used in the actual running of a query, only to be referenced by it.