Skip to content

flowmachine.features.subscriber.entropy

Source: flowmachine/features/subscriber/entropy.py

Calculates various entropy metrics for subscribers with a specified time period.

Class BaseEntropy

BaseEntropy(cache=True)
Source: flowmachine/features/subscriber/entropy.py

Base query for calculating entropy of subscriber features.

Attributes

Methods

_absolute_freq_query

_absolute_freq_query
Source: flowmachine/features/subscriber/entropy.py

_relative_freq_query

_relative_freq_query
Source: flowmachine/features/subscriber/entropy.py

cache

cache
Source: flowmachine/core/query.py

Returns
  • bool

    True is caching is switched on.

column_names

column_names
Source: flowmachine/features/subscriber/entropy.py

Returns the column names.

Returns
  • typing.List[str]

    List of the column names of this query.

column_names_as_string_list

column_names_as_string_list
Source: flowmachine/core/query.py

Get the column names as a comma separated list

Returns
  • str

    Comma separated list of column names

dependencies

dependencies
Source: flowmachine/core/query.py

Returns
  • set

    The set of queries which this one is directly dependent on.

fully_qualified_table_name

fully_qualified_table_name
Source: flowmachine/core/query.py

Returns a unique fully qualified name for the query to be stored as under the cache schema, based on a hash of the parameters, class, and subqueries.

Returns
  • str

    String form of the table's fqn

index_cols

index_cols
Source: flowmachine/core/query.py

A list of columns to use as indexes when storing this query.

Returns
  • ixen: list

    By default, returns the location columns if they are present and self.spatial_unit is defined, and the subscriber column.

Examples
daily_location("2016-01-01").index_cols
[['name'], '"subscriber"']

is_stored

is_stored
Source: flowmachine/core/query.py

Returns
  • bool

    True if the table is stored, and False otherwise.

query_id

query_id
Source: flowmachine/core/query.py

Generate a uniquely identifying hash of this query, based on the parameters of it and the subqueries it is composed of.

Returns
  • str

    query_id hash string

query_state

query_state
Source: flowmachine/core/query.py

Return the current query state.

Returns
  • QueryState

    The current query state

query_state_str

query_state_str
Source: flowmachine/core/query.py

Return the current query state as a string

Returns
  • str

    The current query state. The possible values are the ones defined in flowmachine.core.query_state.QueryState.

table_name

table_name
Source: flowmachine/core/query.py

Returns a uniquename for the query to be stored as, based on a hash of the parameters, class, and subqueries.

Returns
  • str

    String form of the table's fqn

Class ContactEntropy

ContactEntropy(start, stop, *, subscriber_identifier='msisdn', direction: Union[str, flowmachine.features.utilities.direction_enum.Direction] = <Direction.BOTH: 'both'>, hours: Union[Tuple[int, int], NoneType] = None, subscriber_subset=None, tables='all', exclude_self_calls=True)
Source: flowmachine/features/subscriber/entropy.py

Calculates the entropy of counterparts contacted. For instance, if an individual regularly interacts with a few determined counterparts on a predictable way then this user will have a low contact entropy. Entropy is calculated as: -1 * SUM( relative_freq * LN( relative_freq ) ) where relative_freq is the relative frequency of events with a given counterpart. This formula represents a consistent estimate of the true entropy only under certain conditions. Among them, that the relative frequency is a good approximation to the probability that a certain event will occur with a given counterpart. In case of strong correlation between counterparts, this might not be true.

Attributes

Parameters

  • start, stop: str

    iso-format start and stop datetimes

  • subscriber_identifier: {'msisdn', 'imei'}, default 'msisdn'

    Either msisdn, or imei, the column that identifies the subscriber.

  • subscriber_subset: flowmachine.core.Table, flowmachine.core.Query, list, str, default None

    If provided, string or list of string which are msisdn or imeis to limit results to; or, a query or table which has a column with a name matching subscriber_identifier (typically, msisdn), to limit results to.

  • direction: typing.Union[str, flowmachine.features.utilities.direction_enum.Direction], default both

    Whether to consider calls made, received, or both. Defaults to 'both'.

  • hours: typing.Union[typing.Tuple[int, int], NoneType], default None

    Restrict the analysis to only a certain set of hours within each day.

  • tables: list of strings, str, default 'all'

    Can be a string of a single table (with the schema) or a list of these. The keyword all is to select all subscriber tables

  • exclude_self_calls: bool, default True

    Set to false to include calls a subscriber made to themself

Examples

s = ContactEntropy("2016-01-01", "2016-01-07")
s.get_dataframe()
subscriber entropy 2ZdMowMXoyMByY07 0.692461 MobnrVMDK24wPRzB 0.691761 0Ze1l70j0LNgyY4w 0.693147 Nnlqka1oevEMvVrm 0.607693 gPZ7jbqlnAXR3JG5 0.686211 ... ...

Methods

_absolute_freq_query

_absolute_freq_query
Source: flowmachine/features/subscriber/entropy.py

_relative_freq_query

_relative_freq_query
Source: flowmachine/features/subscriber/entropy.py

cache

cache
Source: flowmachine/core/query.py

Returns
  • bool

    True is caching is switched on.

column_names

column_names
Source: flowmachine/features/subscriber/entropy.py

Returns the column names.

Returns
  • typing.List[str]

    List of the column names of this query.

column_names_as_string_list

column_names_as_string_list
Source: flowmachine/core/query.py

Get the column names as a comma separated list

Returns
  • str

    Comma separated list of column names

dependencies

dependencies
Source: flowmachine/core/query.py

Returns
  • set

    The set of queries which this one is directly dependent on.

fully_qualified_table_name

fully_qualified_table_name
Source: flowmachine/core/query.py

Returns a unique fully qualified name for the query to be stored as under the cache schema, based on a hash of the parameters, class, and subqueries.

Returns
  • str

    String form of the table's fqn

index_cols

index_cols
Source: flowmachine/core/query.py

A list of columns to use as indexes when storing this query.

Returns
  • ixen: list

    By default, returns the location columns if they are present and self.spatial_unit is defined, and the subscriber column.

Examples
daily_location("2016-01-01").index_cols
[['name'], '"subscriber"']

is_stored

is_stored
Source: flowmachine/core/query.py

Returns
  • bool

    True if the table is stored, and False otherwise.

query_id

query_id
Source: flowmachine/core/query.py

Generate a uniquely identifying hash of this query, based on the parameters of it and the subqueries it is composed of.

Returns
  • str

    query_id hash string

query_state

query_state
Source: flowmachine/core/query.py

Return the current query state.

Returns
  • QueryState

    The current query state

query_state_str

query_state_str
Source: flowmachine/core/query.py

Return the current query state as a string

Returns
  • str

    The current query state. The possible values are the ones defined in flowmachine.core.query_state.QueryState.

table_name

table_name
Source: flowmachine/core/query.py

Returns a uniquename for the query to be stored as, based on a hash of the parameters, class, and subqueries.

Returns
  • str

    String form of the table's fqn

Class LocationEntropy

LocationEntropy(start, stop, *, spatial_unit: Union[flowmachine.core.spatial_unit.CellSpatialUnit, flowmachine.core.spatial_unit.GeomSpatialUnit] = CellSpatialUnit(), subscriber_identifier='msisdn', hours: Union[Tuple[int, int], NoneType] = None, subscriber_subset=None, tables='all', ignore_nulls=True)
Source: flowmachine/features/subscriber/entropy.py

Calculates the entropy of locations visited. For instance, if an individual regularly makes her/his calls from certain location then this user will have a low location entropy. Entropy is calculated as: -1 * SUM( relative_freq * LN( relative_freq ) ) where relative_freq is the relative frequency of events occurring at a certain location (eg. cell, site, admnistrative region, etc.). This formula represents a consistent estimate of the true entropy only under certain conditions. Among them, that the relative frequency is a good approximation to the probability that a certain event occurs in a given location. In case of strong spatial autocorrelation, this might not be true.

Attributes

Parameters

  • start, stop: str

    iso-format start and stop datetimes

  • spatial_unit: typing.Union[flowmachine.core.spatial_unit.CellSpatialUnit, flowmachine.core.spatial_unit.GeomSpatialUnit], default CellSpatialUnit()

    Spatial unit to which subscriber locations will be mapped. See the docstring of make_spatial_unit for more information.

  • subscriber_identifier: {'msisdn', 'imei'}, default 'msisdn'

    Either msisdn, or imei, the column that identifies the subscriber.

  • subscriber_subset: flowmachine.core.Table, flowmachine.core.Query, list, str, default None

    If provided, string or list of string which are msisdn or imeis to limit results to; or, a query or table which has a column with a name matching subscriber_identifier (typically, msisdn), to limit results to.

  • hours: typing.Union[typing.Tuple[int, int], NoneType], default None

    Restrict the analysis to only a certain set of hours within each day.

  • tables: list of strings, str, default 'all'

    Can be a string of a single table (with the schema) or a list of these. The keyword all is to select all subscriber tables

Examples

s = LocationEntropy("2016-01-01", "2016-01-07")
s.get_dataframe()
subscriber entropy 038OVABN11Ak4W5P 2.832747 09NrjaNNvDanD8pk 3.184784 0ayZGYEQrqYlKw6g 3.072458 0DB8zw67E9mZAPK2 2.838989 0Gl95NRLjW2aw8pW 2.997069 ... ...

Methods

_absolute_freq_query

_absolute_freq_query
Source: flowmachine/features/subscriber/entropy.py

_relative_freq_query

_relative_freq_query
Source: flowmachine/features/subscriber/entropy.py

cache

cache
Source: flowmachine/core/query.py

Returns
  • bool

    True is caching is switched on.

column_names

column_names
Source: flowmachine/features/subscriber/entropy.py

Returns the column names.

Returns
  • typing.List[str]

    List of the column names of this query.

column_names_as_string_list

column_names_as_string_list
Source: flowmachine/core/query.py

Get the column names as a comma separated list

Returns
  • str

    Comma separated list of column names

dependencies

dependencies
Source: flowmachine/core/query.py

Returns
  • set

    The set of queries which this one is directly dependent on.

fully_qualified_table_name

fully_qualified_table_name
Source: flowmachine/core/query.py

Returns a unique fully qualified name for the query to be stored as under the cache schema, based on a hash of the parameters, class, and subqueries.

Returns
  • str

    String form of the table's fqn

index_cols

index_cols
Source: flowmachine/core/query.py

A list of columns to use as indexes when storing this query.

Returns
  • ixen: list

    By default, returns the location columns if they are present and self.spatial_unit is defined, and the subscriber column.

Examples
daily_location("2016-01-01").index_cols
[['name'], '"subscriber"']

is_stored

is_stored
Source: flowmachine/core/query.py

Returns
  • bool

    True if the table is stored, and False otherwise.

query_id

query_id
Source: flowmachine/core/query.py

Generate a uniquely identifying hash of this query, based on the parameters of it and the subqueries it is composed of.

Returns
  • str

    query_id hash string

query_state

query_state
Source: flowmachine/core/query.py

Return the current query state.

Returns
  • QueryState

    The current query state

query_state_str

query_state_str
Source: flowmachine/core/query.py

Return the current query state as a string

Returns
  • str

    The current query state. The possible values are the ones defined in flowmachine.core.query_state.QueryState.

table_name

table_name
Source: flowmachine/core/query.py

Returns a uniquename for the query to be stored as, based on a hash of the parameters, class, and subqueries.

Returns
  • str

    String form of the table's fqn

Class PeriodicEntropy

PeriodicEntropy(start, stop, phase='hour', *, subscriber_identifier='msisdn', direction: Union[str, flowmachine.features.utilities.direction_enum.Direction] = <Direction.BOTH: 'both'>, hours: Union[Tuple[int, int], NoneType] = None, subscriber_subset=None, tables='all')
Source: flowmachine/features/subscriber/entropy.py

Calculates the recurrence period entropy for events, that is the entropy associated with the period in which events take place. For instance, if events regularly occur at a certain time of day, say at 9:00 and 18:00 then this user will have a low period entropy. Entropy is calculated as: -1 * SUM( relative_freq * LN( relative_freq ) ) where relative_freq is the relative frequency of events occurring at a certain period (eg. hour of the day, day of the week, month of the year). This formula represents a consistent estimate of the true entropy only under certain conditions. Among them, that the relative frequency is a good approximation to the probability that a certain event occurs within certain periodic phases. In case of strong autocorrelation, this might not be true.

Attributes

Parameters

  • start, stop: str

    iso-format start and stop datetimes

  • phase: "hour", "epoch", "doy", "dow", "decade", "day", {"century"

    "isodow", "isoyear", "microseconds", "millennium", "milliseconds",     "minute", "month", "quarter", "second", "week", "year"}, default 'hour' The phase of recurrence for which one wishes to calculate the entropy for. See [Postgres manual](https://www.postgresql.org/docs/current/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT) for further info on the allowed phases.
    
  • subscriber_identifier: {'msisdn', 'imei'}, default 'msisdn'

    Either msisdn, or imei, the column that identifies the subscriber.

  • subscriber_subset: flowmachine.core.Table, flowmachine.core.Query, list, str, default None

    If provided, string or list of string which are msisdn or imeis to limit results to; or, a query or table which has a column with a name matching subscriber_identifier (typically, msisdn), to limit results to.

  • direction: typing.Union[str, flowmachine.features.utilities.direction_enum.Direction], default both

    Whether to consider calls made, received, or both. Defaults to 'both'.

  • hours: typing.Union[typing.Tuple[int, int], NoneType], default None

    Restrict the analysis to only a certain set of hours within each day.

  • tables: list of strings, str, default 'all'

    Can be a string of a single table (with the schema) or a list of these. The keyword all is to select all subscriber tables

Examples

s = PeriodicEntropy("2016-01-01", "2016-01-07")
s.get_dataframe()
subscriber entropy 038OVABN11Ak4W5P 2.805374 09NrjaNNvDanD8pk 2.730881 0ayZGYEQrqYlKw6g 2.802434 0DB8zw67E9mZAPK2 2.476354 0Gl95NRLjW2aw8pW 2.788854 ... ...

Methods

_absolute_freq_query

_absolute_freq_query
Source: flowmachine/features/subscriber/entropy.py

_relative_freq_query

_relative_freq_query
Source: flowmachine/features/subscriber/entropy.py

cache

cache
Source: flowmachine/core/query.py

Returns
  • bool

    True is caching is switched on.

column_names

column_names
Source: flowmachine/features/subscriber/entropy.py

Returns the column names.

Returns
  • typing.List[str]

    List of the column names of this query.

column_names_as_string_list

column_names_as_string_list
Source: flowmachine/core/query.py

Get the column names as a comma separated list

Returns
  • str

    Comma separated list of column names

dependencies

dependencies
Source: flowmachine/core/query.py

Returns
  • set

    The set of queries which this one is directly dependent on.

fully_qualified_table_name

fully_qualified_table_name
Source: flowmachine/core/query.py

Returns a unique fully qualified name for the query to be stored as under the cache schema, based on a hash of the parameters, class, and subqueries.

Returns
  • str

    String form of the table's fqn

index_cols

index_cols
Source: flowmachine/core/query.py

A list of columns to use as indexes when storing this query.

Returns
  • ixen: list

    By default, returns the location columns if they are present and self.spatial_unit is defined, and the subscriber column.

Examples
daily_location("2016-01-01").index_cols
[['name'], '"subscriber"']

is_stored

is_stored
Source: flowmachine/core/query.py

Returns
  • bool

    True if the table is stored, and False otherwise.

query_id

query_id
Source: flowmachine/core/query.py

Generate a uniquely identifying hash of this query, based on the parameters of it and the subqueries it is composed of.

Returns
  • str

    query_id hash string

query_state

query_state
Source: flowmachine/core/query.py

Return the current query state.

Returns
  • QueryState

    The current query state

query_state_str

query_state_str
Source: flowmachine/core/query.py

Return the current query state as a string

Returns
  • str

    The current query state. The possible values are the ones defined in flowmachine.core.query_state.QueryState.

table_name

table_name
Source: flowmachine/core/query.py

Returns a uniquename for the query to be stored as, based on a hash of the parameters, class, and subqueries.

Returns
  • str

    String form of the table's fqn