flowmachine.features.subscriber.entropy¶
Source: flowmachine/features/subscriber/entropy.py
Calculates various entropy metrics for subscribers with a specified time period.
Class BaseEntropy¶
BaseEntropy(cache=True)
Base query for calculating entropy of subscriber features.
Attributes¶
Methods¶
_absolute_freq_query¶
_absolute_freq_query
_relative_freq_query¶
_relative_freq_query
cache¶
cache
Returns¶
- 
boolTrue is caching is switched on. 
column_names¶
column_names
Returns the column names.
Returns¶
- 
typing.ListList of the column names of this query. 
column_names_as_string_list¶
column_names_as_string_list
Get the column names as a comma separated list
Returns¶
- 
strComma separated list of column names 
dependencies¶
dependencies
Returns¶
- 
setThe set of queries which this one is directly dependent on. 
fully_qualified_table_name¶
fully_qualified_table_name
Returns a unique fully qualified name for the query to be stored as under the cache schema, based on a hash of the parameters, class, and subqueries.
Returns¶
- 
strString form of the table's fqn 
index_cols¶
index_cols
A list of columns to use as indexes when storing this query.
Returns¶
- 
ixen:listBy default, returns the location columns if they are present and self.spatial_unit is defined, and the subscriber column. 
Examples¶
daily_location("2016-01-01").index_cols
[['name'], '"subscriber"']
is_stored¶
is_stored
Returns¶
- 
boolTrue if the table is stored, and False otherwise. 
query_id¶
query_id
Generate a uniquely identifying hash of this query, based on the parameters of it and the subqueries it is composed of.
Returns¶
- 
strquery_id hash string 
query_state¶
query_state
Return the current query state.
Returns¶
- 
QueryStateThe current query state 
query_state_str¶
query_state_str
Return the current query state as a string
Returns¶
- 
strThe current query state. The possible values are the ones defined in flowmachine.core.query_state.QueryState.
table_name¶
table_name
Returns a uniquename for the query to be stored as, based on a hash of the parameters, class, and subqueries.
Returns¶
- 
strString form of the table's fqn 
Class ContactEntropy¶
ContactEntropy(start, stop, *, subscriber_identifier='msisdn', direction: Union[str, flowmachine.features.utilities.direction_enum.Direction] = <Direction.BOTH: 'both'>, hours: Optional[Tuple[int, int]] = None, subscriber_subset=None, tables='all', exclude_self_calls=True)
Calculates the entropy of counterparts contacted. For instance, if an individual regularly interacts with a few determined counterparts on a predictable way then this user will have a low contact entropy.
Entropy is calculated as:      -1 * SUM( relative_freq * LN( relative_freq ) )  where relative_freq is the relative frequency of events with a given counterpart.  This formula represents a consistent estimate of the true entropy only under certain conditions. Among them, that the relative frequency is a good approximation to the probability that a certain event will occur with a given counterpart. In case of strong correlation between counterparts, this might not be true.
Attributes¶
Parameters¶
- 
start,stop:striso-format start and stop datetimes 
- 
subscriber_identifier:{'msisdn', 'imei'}, default'msisdn'Either msisdn, or imei, the column that identifies the subscriber. 
- 
subscriber_subset:flowmachine.core.Table,flowmachine.core.Query,list,str, defaultNoneIf provided, string or list of string which are msisdn or imeis to limit results to; or, a query or table which has a column with a name matching subscriber_identifier (typically, msisdn), to limit results to. 
- 
direction:typing.Union, defaultbothWhether to consider calls made, received, or both. Defaults to 'both'. 
- 
hours:typing.Optional, defaultNoneRestrict the analysis to only a certain set of hours within each day. 
- 
tables:listofstrings,str, default'all'Can be a string of a single table (with the schema) or a list of these. The keyword all is to select all subscriber tables 
- 
exclude_self_calls:bool, defaultTrueSet to false to include calls a subscriber made to themself 
Examples¶
s = ContactEntropy("2016-01-01", "2016-01-07")
s.get_dataframe()
Methods¶
_absolute_freq_query¶
_absolute_freq_query
_relative_freq_query¶
_relative_freq_query
cache¶
cache
Returns¶
- 
boolTrue is caching is switched on. 
column_names¶
column_names
Returns the column names.
Returns¶
- 
typing.ListList of the column names of this query. 
column_names_as_string_list¶
column_names_as_string_list
Get the column names as a comma separated list
Returns¶
- 
strComma separated list of column names 
dependencies¶
dependencies
Returns¶
- 
setThe set of queries which this one is directly dependent on. 
fully_qualified_table_name¶
fully_qualified_table_name
Returns a unique fully qualified name for the query to be stored as under the cache schema, based on a hash of the parameters, class, and subqueries.
Returns¶
- 
strString form of the table's fqn 
index_cols¶
index_cols
A list of columns to use as indexes when storing this query.
Returns¶
- 
ixen:listBy default, returns the location columns if they are present and self.spatial_unit is defined, and the subscriber column. 
Examples¶
daily_location("2016-01-01").index_cols
[['name'], '"subscriber"']
is_stored¶
is_stored
Returns¶
- 
boolTrue if the table is stored, and False otherwise. 
query_id¶
query_id
Generate a uniquely identifying hash of this query, based on the parameters of it and the subqueries it is composed of.
Returns¶
- 
strquery_id hash string 
query_state¶
query_state
Return the current query state.
Returns¶
- 
QueryStateThe current query state 
query_state_str¶
query_state_str
Return the current query state as a string
Returns¶
- 
strThe current query state. The possible values are the ones defined in flowmachine.core.query_state.QueryState.
table_name¶
table_name
Returns a uniquename for the query to be stored as, based on a hash of the parameters, class, and subqueries.
Returns¶
- 
strString form of the table's fqn 
Class LocationEntropy¶
LocationEntropy(start, stop, *, spatial_unit: Union[flowmachine.core.spatial_unit.CellSpatialUnit, flowmachine.core.spatial_unit.GeomSpatialUnit] = CellSpatialUnit(), subscriber_identifier='msisdn', hours: Optional[Tuple[int, int]] = None, subscriber_subset=None, tables='all', ignore_nulls=True)
Calculates the entropy of locations visited. For instance, if an individual regularly makes her/his calls from certain location then this user will have a low location entropy.
Entropy is calculated as:      -1 * SUM( relative_freq * LN( relative_freq ) )  where relative_freq is the relative frequency of events occurring at a certain location (eg. cell, site, admnistrative region, etc.).  This formula represents a consistent estimate of the true entropy only under certain conditions. Among them, that the relative frequency is a good approximation to the probability that a certain event occurs in a given location. In case of strong spatial autocorrelation, this might not be true.
Attributes¶
Parameters¶
- 
start,stop:striso-format start and stop datetimes 
- 
spatial_unit:typing.Union, defaultCellSpatialUnit()Spatial unit to which subscriber locations will be mapped. See the docstring of make_spatial_unit for more information. 
- 
subscriber_identifier:{'msisdn', 'imei'}, default'msisdn'Either msisdn, or imei, the column that identifies the subscriber. 
- 
subscriber_subset:flowmachine.core.Table,flowmachine.core.Query,list,str, defaultNoneIf provided, string or list of string which are msisdn or imeis to limit results to; or, a query or table which has a column with a name matching subscriber_identifier (typically, msisdn), to limit results to. 
- 
hours:typing.Optional, defaultNoneRestrict the analysis to only a certain set of hours within each day. 
- 
tables:listofstrings,str, default'all'Can be a string of a single table (with the schema) or a list of these. The keyword all is to select all subscriber tables 
Examples¶
s = LocationEntropy("2016-01-01", "2016-01-07")
s.get_dataframe()
Methods¶
_absolute_freq_query¶
_absolute_freq_query
_relative_freq_query¶
_relative_freq_query
cache¶
cache
Returns¶
- 
boolTrue is caching is switched on. 
column_names¶
column_names
Returns the column names.
Returns¶
- 
typing.ListList of the column names of this query. 
column_names_as_string_list¶
column_names_as_string_list
Get the column names as a comma separated list
Returns¶
- 
strComma separated list of column names 
dependencies¶
dependencies
Returns¶
- 
setThe set of queries which this one is directly dependent on. 
fully_qualified_table_name¶
fully_qualified_table_name
Returns a unique fully qualified name for the query to be stored as under the cache schema, based on a hash of the parameters, class, and subqueries.
Returns¶
- 
strString form of the table's fqn 
index_cols¶
index_cols
A list of columns to use as indexes when storing this query.
Returns¶
- 
ixen:listBy default, returns the location columns if they are present and self.spatial_unit is defined, and the subscriber column. 
Examples¶
daily_location("2016-01-01").index_cols
[['name'], '"subscriber"']
is_stored¶
is_stored
Returns¶
- 
boolTrue if the table is stored, and False otherwise. 
query_id¶
query_id
Generate a uniquely identifying hash of this query, based on the parameters of it and the subqueries it is composed of.
Returns¶
- 
strquery_id hash string 
query_state¶
query_state
Return the current query state.
Returns¶
- 
QueryStateThe current query state 
query_state_str¶
query_state_str
Return the current query state as a string
Returns¶
- 
strThe current query state. The possible values are the ones defined in flowmachine.core.query_state.QueryState.
table_name¶
table_name
Returns a uniquename for the query to be stored as, based on a hash of the parameters, class, and subqueries.
Returns¶
- 
strString form of the table's fqn 
Class PeriodicEntropy¶
PeriodicEntropy(start, stop, phase='hour', *, subscriber_identifier='msisdn', direction: Union[str, flowmachine.features.utilities.direction_enum.Direction] = <Direction.BOTH: 'both'>, hours: Optional[Tuple[int, int]] = None, subscriber_subset=None, tables='all')
Calculates the recurrence period entropy for events, that is the entropy associated with the period in which events take place. For instance, if events regularly occur at a certain time of day, say at 9:00 and 18:00 then this user will have a low period entropy.
Entropy is calculated as:      -1 * SUM( relative_freq * LN( relative_freq ) )  where relative_freq is the relative frequency of events occurring at a certain period (eg. hour of the day, day of the week, month of the year).  This formula represents a consistent estimate of the true entropy only under certain conditions. Among them, that the relative frequency is a good approximation to the probability that a certain event occurs within certain periodic phases. In case of strong autocorrelation, this might not be true.
Attributes¶
Parameters¶
- 
start,stop:striso-format start and stop datetimes 
- 
phase:"hour","epoch","doy","dow","decade","day",{"century""isodow", "isoyear", "microseconds", "millennium", "milliseconds", "minute", "month", "quarter", "second", "week", "year"}, default 'hour' The phase of recurrence for which one wishes to calculate the entropy for. See [Postgres manual](https://www.postgresql.org/docs/current/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT) for further info on the allowed phases.
- 
subscriber_identifier:{'msisdn', 'imei'}, default'msisdn'Either msisdn, or imei, the column that identifies the subscriber. 
- 
subscriber_subset:flowmachine.core.Table,flowmachine.core.Query,list,str, defaultNoneIf provided, string or list of string which are msisdn or imeis to limit results to; or, a query or table which has a column with a name matching subscriber_identifier (typically, msisdn), to limit results to. 
- 
direction:typing.Union, defaultbothWhether to consider calls made, received, or both. Defaults to 'both'. 
- 
hours:typing.Optional, defaultNoneRestrict the analysis to only a certain set of hours within each day. 
- 
tables:listofstrings,str, default'all'Can be a string of a single table (with the schema) or a list of these. The keyword all is to select all subscriber tables 
Examples¶
s = PeriodicEntropy("2016-01-01", "2016-01-07")
s.get_dataframe()
Methods¶
_absolute_freq_query¶
_absolute_freq_query
_relative_freq_query¶
_relative_freq_query
cache¶
cache
Returns¶
- 
boolTrue is caching is switched on. 
column_names¶
column_names
Returns the column names.
Returns¶
- 
typing.ListList of the column names of this query. 
column_names_as_string_list¶
column_names_as_string_list
Get the column names as a comma separated list
Returns¶
- 
strComma separated list of column names 
dependencies¶
dependencies
Returns¶
- 
setThe set of queries which this one is directly dependent on. 
fully_qualified_table_name¶
fully_qualified_table_name
Returns a unique fully qualified name for the query to be stored as under the cache schema, based on a hash of the parameters, class, and subqueries.
Returns¶
- 
strString form of the table's fqn 
index_cols¶
index_cols
A list of columns to use as indexes when storing this query.
Returns¶
- 
ixen:listBy default, returns the location columns if they are present and self.spatial_unit is defined, and the subscriber column. 
Examples¶
daily_location("2016-01-01").index_cols
[['name'], '"subscriber"']
is_stored¶
is_stored
Returns¶
- 
boolTrue if the table is stored, and False otherwise. 
query_id¶
query_id
Generate a uniquely identifying hash of this query, based on the parameters of it and the subqueries it is composed of.
Returns¶
- 
strquery_id hash string 
query_state¶
query_state
Return the current query state.
Returns¶
- 
QueryStateThe current query state 
query_state_str¶
query_state_str
Return the current query state as a string
Returns¶
- 
strThe current query state. The possible values are the ones defined in flowmachine.core.query_state.QueryState.
table_name¶
table_name
Returns a uniquename for the query to be stored as, based on a hash of the parameters, class, and subqueries.
Returns¶
- 
strString form of the table's fqn