Skip to content

flowmachine.features.utilities.histogram_aggregation

Class HistogramAggregation

HistogramAggregation(*, metric: 'Query', bins: Union[List[float], int], range: Union[Tuple[float, float], NoneType] = None, value_column: str = 'value', censor: bool = True) -> None
Source: flowmachine/features/utilities/histogram_aggregation.py

Compute the histogram of another query.

Attributes

Parameters

  • metric: Query

    Query to build histogram over

  • bins: typing.Union[typing.List[float], int]

    Either an integer number of equally spaced bins, or a list of bin edges

  • range: typing.Union[typing.Tuple[float, float], NoneType], default None

    Optionally supply inclusive lower and upper bounds to build the histogram over. By default, the histogram will cover the whole range of the data.

  • value_column: str, default value

    Name of the column in metric to construct the histogram over

  • censor: bool, default True

    Set to False to return results where there are bins with counts below 15

Examples

>>>from flowmachine.features import RadiusOfGyration
>>>from flowmachine.features.utilities.histogram_aggregation import HistogramAggregation
>>>radius_of_gyration = RadiusOfGyration("2016-01-01", "2016-01-02")
>>>histogram = HistogramAggregation(metric=radius_of_gyration, bins=5, censor=False)
>>>histogram.head()
       value  lower_edge  upper_edge
0     61    0.000000   70.837717
1    123   70.837717  141.675435
2    192  141.675435  212.513152
3    108  212.513152  283.350869
4     15  283.350869  354.188587

Note

By default, if the count of values for any bin is below 15, then no histogram will be returned.

Methods

cache

cache
Source: flowmachine/core/query.py

Returns
  • bool

    True is caching is switched on.

column_names

column_names
Source: flowmachine/features/utilities/histogram_aggregation.py

Returns the column names.

Returns
  • typing.List[str]

    List of the column names of this query.

column_names_as_string_list

column_names_as_string_list
Source: flowmachine/core/query.py

Get the column names as a comma separated list

Returns
  • str

    Comma separated list of column names

dependencies

dependencies
Source: flowmachine/core/query.py

Returns
  • set

    The set of queries which this one is directly dependent on.

fully_qualified_table_name

fully_qualified_table_name
Source: flowmachine/core/query.py

Returns a unique fully qualified name for the query to be stored as under the cache schema, based on a hash of the parameters, class, and subqueries.

Returns
  • str

    String form of the table's fqn

index_cols

index_cols
Source: flowmachine/core/query.py

A list of columns to use as indexes when storing this query.

Returns
  • ixen: list

    By default, returns the location columns if they are present and self.spatial_unit is defined, and the subscriber column.

Examples
daily_location("2016-01-01").index_cols
[['name'], '"subscriber"']

is_stored

is_stored
Source: flowmachine/core/query.py

Returns
  • bool

    True if the table is stored, and False otherwise.

query_id

query_id
Source: flowmachine/core/query.py

Generate a uniquely identifying hash of this query, based on the parameters of it and the subqueries it is composed of.

Returns
  • str

    query_id hash string

query_state

query_state
Source: flowmachine/core/query.py

Return the current query state.

Returns
  • QueryState

    The current query state

query_state_str

query_state_str
Source: flowmachine/core/query.py

Return the current query state as a string

Returns
  • str

    The current query state. The possible values are the ones defined in flowmachine.core.query_state.QueryState.

table_name

table_name
Source: flowmachine/core/query.py

Returns a uniquename for the query to be stored as, based on a hash of the parameters, class, and subqueries.

Returns
  • str

    String form of the table's fqn