Skip to content

flowmachine.features.subscriber.majority_location

Class MajorityLocation

MajorityLocation(*, subscriber_location_weights: flowmachine.core.query.Query, weight_column: str, minimum_total_weight: float = 0.0)
Source: flowmachine/features/subscriber/majority_location.py

A query for producing a list of subscribers along with the location that they visited more than half the time. Takes a 'subscriber location weights' query that includes a 'subscribers' column, location ID column(s) and a column to be used as weighting for locations (e.g. a LocationVisits query). A subscriber will only be assigned a location if that location represents more than half of the total weight for that subscriber. This means that each subscriber can be assigned at most one location.

Attributes

Parameters

  • subscriber_location_weights: flowmachine.core.query.Query

    The query object containing subscribers, locations, and weights.

  • weight_column: str

    The column, when summed, that will produce the count used to threshold the majority

  • minimum_total_weight: float, default 0.0

    If the summed weight for a subscriber is less than minimum_total_weight, that subscriber will only be assigned a location with weight greater than minimum_total_weight/2. This is useful if, for example, subscriber_location_weights is a count of the number of days a location was a subscriber's daily location over one week - if a subscriber was not active every day, their total weight would be less than 7, which would lower the threshold for a majority. Setting minimum_total_weight=7 in this case ensures that a subscriber must have the same daily location on a majority of all days during the week, not just a majority of their active days.

Note

Any rows where weight < 0 in the subscriber_location_weights query will be dropped. This is necessary to ensure the query can return at most one location per subscriber.

Methods

cache

cache
Source: flowmachine/core/query.py

Returns
  • bool

    True is caching is switched on.

column_names

column_names
Source: flowmachine/features/subscriber/majority_location.py

Returns the column names.

Returns
  • typing.List[str]

    List of the column names of this query.

column_names_as_string_list

column_names_as_string_list
Source: flowmachine/core/query.py

Get the column names as a comma separated list

Returns
  • str

    Comma separated list of column names

dependencies

dependencies
Source: flowmachine/core/query.py

Returns
  • set

    The set of queries which this one is directly dependent on.

fully_qualified_table_name

fully_qualified_table_name
Source: flowmachine/core/query.py

Returns a unique fully qualified name for the query to be stored as under the cache schema, based on a hash of the parameters, class, and subqueries.

Returns
  • str

    String form of the table's fqn

index_cols

index_cols
Source: flowmachine/core/query.py

A list of columns to use as indexes when storing this query.

Returns
  • ixen: list

    By default, returns the location columns if they are present and self.spatial_unit is defined, and the subscriber column.

Examples
daily_location("2016-01-01").index_cols
[['name'], '"subscriber"']

is_stored

is_stored
Source: flowmachine/core/query.py

Returns
  • bool

    True if the table is stored, and False otherwise.

query_id

query_id
Source: flowmachine/core/query.py

Generate a uniquely identifying hash of this query, based on the parameters of it and the subqueries it is composed of.

Returns
  • str

    query_id hash string

query_state

query_state
Source: flowmachine/core/query.py

Return the current query state.

Returns
  • QueryState

    The current query state

query_state_str

query_state_str
Source: flowmachine/core/query.py

Return the current query state as a string

Returns
  • str

    The current query state. The possible values are the ones defined in flowmachine.core.query_state.QueryState.

table_name

table_name
Source: flowmachine/core/query.py

Returns a uniquename for the query to be stored as, based on a hash of the parameters, class, and subqueries.

Returns
  • str

    String form of the table's fqn

Class MajorityLocationWithUnlocatable

MajorityLocationWithUnlocatable(*, majority_location: flowmachine.features.subscriber.majority_location.MajorityLocation)
Source: flowmachine/features/subscriber/majority_location.py

A query for producing a list of subscribers along with the location that they visited more than half the time. Similar to MajorityLocation, except that subscribers with no majority location will be included in the query result (with NULL location).

Attributes

Parameters

  • majority_location: flowmachine.features.subscriber.majority_location.MajorityLocation

    MajorityLocation query whose result will be augmented with unlocatable subscribers

Methods

cache

cache
Source: flowmachine/core/query.py

Returns
  • bool

    True is caching is switched on.

column_names

column_names
Source: flowmachine/features/subscriber/majority_location.py

Returns the column names.

Returns
  • typing.List[str]

    List of the column names of this query.

column_names_as_string_list

column_names_as_string_list
Source: flowmachine/core/query.py

Get the column names as a comma separated list

Returns
  • str

    Comma separated list of column names

dependencies

dependencies
Source: flowmachine/core/query.py

Returns
  • set

    The set of queries which this one is directly dependent on.

fully_qualified_table_name

fully_qualified_table_name
Source: flowmachine/core/query.py

Returns a unique fully qualified name for the query to be stored as under the cache schema, based on a hash of the parameters, class, and subqueries.

Returns
  • str

    String form of the table's fqn

index_cols

index_cols
Source: flowmachine/core/query.py

A list of columns to use as indexes when storing this query.

Returns
  • ixen: list

    By default, returns the location columns if they are present and self.spatial_unit is defined, and the subscriber column.

Examples
daily_location("2016-01-01").index_cols
[['name'], '"subscriber"']

is_stored

is_stored
Source: flowmachine/core/query.py

Returns
  • bool

    True if the table is stored, and False otherwise.

query_id

query_id
Source: flowmachine/core/query.py

Generate a uniquely identifying hash of this query, based on the parameters of it and the subqueries it is composed of.

Returns
  • str

    query_id hash string

query_state

query_state
Source: flowmachine/core/query.py

Return the current query state.

Returns
  • QueryState

    The current query state

query_state_str

query_state_str
Source: flowmachine/core/query.py

Return the current query state as a string

Returns
  • str

    The current query state. The possible values are the ones defined in flowmachine.core.query_state.QueryState.

table_name

table_name
Source: flowmachine/core/query.py

Returns a uniquename for the query to be stored as, based on a hash of the parameters, class, and subqueries.

Returns
  • str

    String form of the table's fqn

majority_location

majority_location(*, subscriber_location_weights: flowmachine.core.query.Query, weight_column: str, minimum_total_weight: float = 0.0, include_unlocatable: bool = False) -> Union[flowmachine.features.subscriber.majority_location.MajorityLocation, flowmachine.features.subscriber.majority_location.MajorityLocationWithUnlocatable]
Source: flowmachine/features/subscriber/majority_location.py

A query for producing a list of subscribers along with the location that they visited more than half the time. Takes a 'subscriber location weights' query that includes a 'subscribers' column, location ID column(s) and a column to be used as weighting for locations (e.g. a LocationVisits query). A subscriber will only be assigned a location if that location represents more than half of the total weight for that subscriber. This means that each subscriber can be assigned at most one location. Subscribers for whom there is no single location with an outright majority will either be excluded from the query result (if include_unlocatable==False), or included in the result with NULL value in the location ID column(s) (if include_unlocatable==True).

Parameters

  • subscriber_location_weights: flowmachine.core.query.Query

    The query object containing subscribers, locations, and weights.

  • weight_column: str

    The column in subscriber_location_weights, when summed, that will produce the count used to threshold the majority

  • minimum_total_weight: float, default 0.0

    If the summed weight for a subscriber is less than minimum_total_weight, that subscriber will only be assigned a location with weight greater than minimum_total_weight/2. This is useful if, for example, subscriber_location_weights is a count of the number of days a location was a subscriber's daily location over one week - if a subscriber was not active every day, their total weight would be less than 7, which would lower the threshold for a majority. Setting minimum_total_weight=7 in this case ensures that a subscriber must have the same daily location on a majority of all days during the week, not just a majority of their active days.

  • include_unlocatable: bool, default False

    If True, returns every unique subscriber in the subscriber_location_weights query, with the location column(s) as NULL if no majority is reached. If False, returns only subscribers that have achieved a majority location

Returns

  • typing.Union[flowmachine.features.subscriber.majority_location.MajorityLocation, flowmachine.features.subscriber.majority_location.MajorityLocationWithUnlocatable]

    Majority location query object

Note

Any rows where weight < 0 in the subscriber_location_weights query will be dropped. This is necessary to ensure the query can return at most one location per subscriber.