flowmachine.features.utilities.histogram_aggregation¶

Class HistogramAggregation¶

HistogramAggregation(*, metric: 'Query', bins: Union[List[float], int], range: Union[Tuple[float, float], NoneType] = None, value_column: str = 'value', censor: bool = True) -> None

Source: flowmachine/features/utilities/histogram_aggregation.py

Compute the histogram of another query.

Attributes¶

cache
column_names
column_names_as_string_list
dependencies
fully_qualified_table_name
index_cols
is_stored
query_id
query_state
query_state_str
table_name

Parameters¶

metric: Query

Query to build histogram over
bins: typing.Union[typing.List[float], int]

Either an integer number of equally spaced bins, or a list of bin edges
range: typing.Union[typing.Tuple[float, float], NoneType], default None

Optionally supply inclusive lower and upper bounds to build the histogram over. By default, the histogram will cover the whole range of the data.
value_column: str, default value

Name of the column in metric to construct the histogram over
censor: bool, default True

Set to False to return results where there are bins with counts below 15

Examples¶

>>>from flowmachine.features import RadiusOfGyration
>>>from flowmachine.features.utilities.histogram_aggregation import HistogramAggregation
>>>radius_of_gyration = RadiusOfGyration("2016-01-01", "2016-01-02")
>>>histogram = HistogramAggregation(metric=radius_of_gyration, bins=5, censor=False)
>>>histogram.head()
       value  lower_edge  upper_edge
0     61    0.000000   70.837717
1    123   70.837717  141.675435
2    192  141.675435  212.513152
3    108  212.513152  283.350869
4     15  283.350869  354.188587

Note

By default, if the count of values for any bin is below 15, then no histogram will be returned.

Methods¶

cache¶

cache

Source: flowmachine/core/query.py

Returns¶

bool

True is caching is switched on.

column_names¶

column_names

Source: flowmachine/features/utilities/histogram_aggregation.py

Returns the column names.

Returns¶

typing.List[str]

List of the column names of this query.

column_names_as_string_list¶

column_names_as_string_list

Source: flowmachine/core/query.py

Get the column names as a comma separated list

Returns¶

str

Comma separated list of column names

dependencies¶

dependencies

Source: flowmachine/core/query.py

Returns¶

set

The set of queries which this one is directly dependent on.

fully_qualified_table_name¶

fully_qualified_table_name

Source: flowmachine/core/query.py

Returns a unique fully qualified name for the query to be stored as under the cache schema, based on a hash of the parameters, class, and subqueries.

Returns¶

str

String form of the table's fqn

index_cols¶

index_cols

Source: flowmachine/core/query.py

A list of columns to use as indexes when storing this query.

Returns¶

ixen: list

By default, returns the location columns if they are present and self.spatial_unit is defined, and the subscriber column.

Examples¶

daily_location("2016-01-01").index_cols
[['name'], '"subscriber"']

is_stored¶

is_stored

Source: flowmachine/core/query.py

Returns¶

bool

True if the table is stored, and False otherwise.

query_id¶

query_id

Source: flowmachine/core/query.py

Generate a uniquely identifying hash of this query, based on the parameters of it and the subqueries it is composed of.

Returns¶

str

query_id hash string

query_state¶

query_state

Source: flowmachine/core/query.py

Return the current query state.

Returns¶

QueryState

The current query state

query_state_str¶

query_state_str

Source: flowmachine/core/query.py

Return the current query state as a string

Returns¶

str

The current query state. The possible values are the ones defined in flowmachine.core.query_state.QueryState.

table_name¶

table_name

Source: flowmachine/core/query.py

Returns a uniquename for the query to be stored as, based on a hash of the parameters, class, and subqueries.

Returns¶

str

String form of the table's fqn