flowmachine.features.subscriber.entropy¶
Source: flowmachine/features/subscriber/entropy.py
Calculates various entropy metrics for subscribers with a specified time period.
Class BaseEntropy¶
BaseEntropy(cache=True)
Base query for calculating entropy of subscriber features.
Attributes¶
Methods¶
_absolute_freq_query¶
_absolute_freq_query
_relative_freq_query¶
_relative_freq_query
cache¶
cache
Returns¶
-
bool
True is caching is switched on.
column_names¶
column_names
Returns the column names.
Returns¶
-
typing.List
List of the column names of this query.
column_names_as_string_list¶
column_names_as_string_list
Get the column names as a comma separated list
Returns¶
-
str
Comma separated list of column names
dependencies¶
dependencies
Returns¶
-
set
The set of queries which this one is directly dependent on.
fully_qualified_table_name¶
fully_qualified_table_name
Returns a unique fully qualified name for the query to be stored as under the cache schema, based on a hash of the parameters, class, and subqueries.
Returns¶
-
str
String form of the table's fqn
index_cols¶
index_cols
A list of columns to use as indexes when storing this query.
Returns¶
-
ixen
:list
By default, returns the location columns if they are present and self.spatial_unit is defined, and the subscriber column.
Examples¶
daily_location("2016-01-01").index_cols
[['name'], '"subscriber"']
is_stored¶
is_stored
Returns¶
-
bool
True if the table is stored, and False otherwise.
query_id¶
query_id
Generate a uniquely identifying hash of this query, based on the parameters of it and the subqueries it is composed of.
Returns¶
-
str
query_id hash string
query_state¶
query_state
Return the current query state.
Returns¶
-
QueryState
The current query state
query_state_str¶
query_state_str
Return the current query state as a string
Returns¶
-
str
The current query state. The possible values are the ones defined in
flowmachine.core.query_state.QueryState
.
table_name¶
table_name
Returns a uniquename for the query to be stored as, based on a hash of the parameters, class, and subqueries.
Returns¶
-
str
String form of the table's fqn
Class ContactEntropy¶
ContactEntropy(start, stop, *, subscriber_identifier='msisdn', direction: Union[str, flowmachine.features.utilities.direction_enum.Direction] = <Direction.BOTH: 'both'>, hours: Optional[Tuple[int, int]] = None, subscriber_subset=None, tables='all', exclude_self_calls=True)
Calculates the entropy of counterparts contacted. For instance, if an individual regularly interacts with a few determined counterparts on a predictable way then this user will have a low contact entropy.
Entropy is calculated as: -1 * SUM( relative_freq * LN( relative_freq ) ) where relative_freq
is the relative frequency of events with a given counterpart. This formula represents a consistent estimate of the true entropy only under certain conditions. Among them, that the relative frequency is a good approximation to the probability that a certain event will occur with a given counterpart. In case of strong correlation between counterparts, this might not be true.
Attributes¶
Parameters¶
-
start
,stop
:str
iso-format start and stop datetimes
-
subscriber_identifier
:{'msisdn', 'imei'}
, default'msisdn'
Either msisdn, or imei, the column that identifies the subscriber.
-
subscriber_subset
:flowmachine.core.Table
,flowmachine.core.Query
,list
,str
, defaultNone
If provided, string or list of string which are msisdn or imeis to limit results to; or, a query or table which has a column with a name matching subscriber_identifier (typically, msisdn), to limit results to.
-
direction
:typing.Union
, defaultboth
Whether to consider calls made, received, or both. Defaults to 'both'.
-
hours
:typing.Optional
, defaultNone
Restrict the analysis to only a certain set of hours within each day.
-
tables
:list
ofstrings
,str
, default'all'
Can be a string of a single table (with the schema) or a list of these. The keyword all is to select all subscriber tables
-
exclude_self_calls
:bool
, defaultTrue
Set to false to include calls a subscriber made to themself
Examples¶
s = ContactEntropy("2016-01-01", "2016-01-07")
s.get_dataframe()
Methods¶
_absolute_freq_query¶
_absolute_freq_query
_relative_freq_query¶
_relative_freq_query
cache¶
cache
Returns¶
-
bool
True is caching is switched on.
column_names¶
column_names
Returns the column names.
Returns¶
-
typing.List
List of the column names of this query.
column_names_as_string_list¶
column_names_as_string_list
Get the column names as a comma separated list
Returns¶
-
str
Comma separated list of column names
dependencies¶
dependencies
Returns¶
-
set
The set of queries which this one is directly dependent on.
fully_qualified_table_name¶
fully_qualified_table_name
Returns a unique fully qualified name for the query to be stored as under the cache schema, based on a hash of the parameters, class, and subqueries.
Returns¶
-
str
String form of the table's fqn
index_cols¶
index_cols
A list of columns to use as indexes when storing this query.
Returns¶
-
ixen
:list
By default, returns the location columns if they are present and self.spatial_unit is defined, and the subscriber column.
Examples¶
daily_location("2016-01-01").index_cols
[['name'], '"subscriber"']
is_stored¶
is_stored
Returns¶
-
bool
True if the table is stored, and False otherwise.
query_id¶
query_id
Generate a uniquely identifying hash of this query, based on the parameters of it and the subqueries it is composed of.
Returns¶
-
str
query_id hash string
query_state¶
query_state
Return the current query state.
Returns¶
-
QueryState
The current query state
query_state_str¶
query_state_str
Return the current query state as a string
Returns¶
-
str
The current query state. The possible values are the ones defined in
flowmachine.core.query_state.QueryState
.
table_name¶
table_name
Returns a uniquename for the query to be stored as, based on a hash of the parameters, class, and subqueries.
Returns¶
-
str
String form of the table's fqn
Class LocationEntropy¶
LocationEntropy(start, stop, *, spatial_unit: Union[flowmachine.core.spatial_unit.CellSpatialUnit, flowmachine.core.spatial_unit.GeomSpatialUnit] = CellSpatialUnit(), subscriber_identifier='msisdn', hours: Optional[Tuple[int, int]] = None, subscriber_subset=None, tables='all', ignore_nulls=True)
Calculates the entropy of locations visited. For instance, if an individual regularly makes her/his calls from certain location then this user will have a low location entropy.
Entropy is calculated as: -1 * SUM( relative_freq * LN( relative_freq ) ) where relative_freq
is the relative frequency of events occurring at a certain location (eg. cell, site, admnistrative region, etc.). This formula represents a consistent estimate of the true entropy only under certain conditions. Among them, that the relative frequency is a good approximation to the probability that a certain event occurs in a given location. In case of strong spatial autocorrelation, this might not be true.
Attributes¶
Parameters¶
-
start
,stop
:str
iso-format start and stop datetimes
-
spatial_unit
:typing.Union
, defaultCellSpatialUnit()
Spatial unit to which subscriber locations will be mapped. See the docstring of make_spatial_unit for more information.
-
subscriber_identifier
:{'msisdn', 'imei'}
, default'msisdn'
Either msisdn, or imei, the column that identifies the subscriber.
-
subscriber_subset
:flowmachine.core.Table
,flowmachine.core.Query
,list
,str
, defaultNone
If provided, string or list of string which are msisdn or imeis to limit results to; or, a query or table which has a column with a name matching subscriber_identifier (typically, msisdn), to limit results to.
-
hours
:typing.Optional
, defaultNone
Restrict the analysis to only a certain set of hours within each day.
-
tables
:list
ofstrings
,str
, default'all'
Can be a string of a single table (with the schema) or a list of these. The keyword all is to select all subscriber tables
Examples¶
s = LocationEntropy("2016-01-01", "2016-01-07")
s.get_dataframe()
Methods¶
_absolute_freq_query¶
_absolute_freq_query
_relative_freq_query¶
_relative_freq_query
cache¶
cache
Returns¶
-
bool
True is caching is switched on.
column_names¶
column_names
Returns the column names.
Returns¶
-
typing.List
List of the column names of this query.
column_names_as_string_list¶
column_names_as_string_list
Get the column names as a comma separated list
Returns¶
-
str
Comma separated list of column names
dependencies¶
dependencies
Returns¶
-
set
The set of queries which this one is directly dependent on.
fully_qualified_table_name¶
fully_qualified_table_name
Returns a unique fully qualified name for the query to be stored as under the cache schema, based on a hash of the parameters, class, and subqueries.
Returns¶
-
str
String form of the table's fqn
index_cols¶
index_cols
A list of columns to use as indexes when storing this query.
Returns¶
-
ixen
:list
By default, returns the location columns if they are present and self.spatial_unit is defined, and the subscriber column.
Examples¶
daily_location("2016-01-01").index_cols
[['name'], '"subscriber"']
is_stored¶
is_stored
Returns¶
-
bool
True if the table is stored, and False otherwise.
query_id¶
query_id
Generate a uniquely identifying hash of this query, based on the parameters of it and the subqueries it is composed of.
Returns¶
-
str
query_id hash string
query_state¶
query_state
Return the current query state.
Returns¶
-
QueryState
The current query state
query_state_str¶
query_state_str
Return the current query state as a string
Returns¶
-
str
The current query state. The possible values are the ones defined in
flowmachine.core.query_state.QueryState
.
table_name¶
table_name
Returns a uniquename for the query to be stored as, based on a hash of the parameters, class, and subqueries.
Returns¶
-
str
String form of the table's fqn
Class PeriodicEntropy¶
PeriodicEntropy(start, stop, phase='hour', *, subscriber_identifier='msisdn', direction: Union[str, flowmachine.features.utilities.direction_enum.Direction] = <Direction.BOTH: 'both'>, hours: Optional[Tuple[int, int]] = None, subscriber_subset=None, tables='all')
Calculates the recurrence period entropy for events, that is the entropy associated with the period in which events take place. For instance, if events regularly occur at a certain time of day, say at 9:00 and 18:00 then this user will have a low period entropy.
Entropy is calculated as: -1 * SUM( relative_freq * LN( relative_freq ) ) where relative_freq
is the relative frequency of events occurring at a certain period (eg. hour of the day, day of the week, month of the year). This formula represents a consistent estimate of the true entropy only under certain conditions. Among them, that the relative frequency is a good approximation to the probability that a certain event occurs within certain periodic phases. In case of strong autocorrelation, this might not be true.
Attributes¶
Parameters¶
-
start
,stop
:str
iso-format start and stop datetimes
-
phase
:"hour"
,"epoch"
,"doy"
,"dow"
,"decade"
,"day"
,{"century"
"isodow", "isoyear", "microseconds", "millennium", "milliseconds", "minute", "month", "quarter", "second", "week", "year"}, default 'hour' The phase of recurrence for which one wishes to calculate the entropy for. See [Postgres manual](https://www.postgresql.org/docs/current/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT) for further info on the allowed phases.
-
subscriber_identifier
:{'msisdn', 'imei'}
, default'msisdn'
Either msisdn, or imei, the column that identifies the subscriber.
-
subscriber_subset
:flowmachine.core.Table
,flowmachine.core.Query
,list
,str
, defaultNone
If provided, string or list of string which are msisdn or imeis to limit results to; or, a query or table which has a column with a name matching subscriber_identifier (typically, msisdn), to limit results to.
-
direction
:typing.Union
, defaultboth
Whether to consider calls made, received, or both. Defaults to 'both'.
-
hours
:typing.Optional
, defaultNone
Restrict the analysis to only a certain set of hours within each day.
-
tables
:list
ofstrings
,str
, default'all'
Can be a string of a single table (with the schema) or a list of these. The keyword all is to select all subscriber tables
Examples¶
s = PeriodicEntropy("2016-01-01", "2016-01-07")
s.get_dataframe()
Methods¶
_absolute_freq_query¶
_absolute_freq_query
_relative_freq_query¶
_relative_freq_query
cache¶
cache
Returns¶
-
bool
True is caching is switched on.
column_names¶
column_names
Returns the column names.
Returns¶
-
typing.List
List of the column names of this query.
column_names_as_string_list¶
column_names_as_string_list
Get the column names as a comma separated list
Returns¶
-
str
Comma separated list of column names
dependencies¶
dependencies
Returns¶
-
set
The set of queries which this one is directly dependent on.
fully_qualified_table_name¶
fully_qualified_table_name
Returns a unique fully qualified name for the query to be stored as under the cache schema, based on a hash of the parameters, class, and subqueries.
Returns¶
-
str
String form of the table's fqn
index_cols¶
index_cols
A list of columns to use as indexes when storing this query.
Returns¶
-
ixen
:list
By default, returns the location columns if they are present and self.spatial_unit is defined, and the subscriber column.
Examples¶
daily_location("2016-01-01").index_cols
[['name'], '"subscriber"']
is_stored¶
is_stored
Returns¶
-
bool
True if the table is stored, and False otherwise.
query_id¶
query_id
Generate a uniquely identifying hash of this query, based on the parameters of it and the subqueries it is composed of.
Returns¶
-
str
query_id hash string
query_state¶
query_state
Return the current query state.
Returns¶
-
QueryState
The current query state
query_state_str¶
query_state_str
Return the current query state as a string
Returns¶
-
str
The current query state. The possible values are the ones defined in
flowmachine.core.query_state.QueryState
.
table_name¶
table_name
Returns a uniquename for the query to be stored as, based on a hash of the parameters, class, and subqueries.
Returns¶
-
str
String form of the table's fqn