flowmachine.features.subscriber.scores¶
Source: flowmachine/features/subscriber/scores.py
Calculates an event score for each event based on a scoring dictionary.
Class EventScore¶
EventScore(*, start: str, stop: str, spatial_unit: Union[flowmachine.core.spatial_unit.CellSpatialUnit, flowmachine.core.spatial_unit.GeomSpatialUnit, NoneType] = None, hours: Union[str, Tuple[int, int]] = 'all', table: Union[str, List[str]] = 'all', score_hour: List[float] = [-1, -1, -1, -1, -1, -1, -1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, -1, -1, -1], score_dow: Dict[str, float] = {'monday': 1, 'tuesday': 1, 'wednesday': 1, 'thursday': 0, 'friday': -1, 'saturday': -1, 'sunday': -1}, subscriber_identifier: str = 'msisdn', subscriber_subset=None)
Represents an event score class. This class assigns a score to each event based on the hour of the day and the day of the week. The scores can be useful to cluster a set of events based on its signature. Such type of analysis reduces the dimensionality of the problem by projecting a given event pattern onto the real line. This class returns a table with scores averaged across the requested spatial unit per subscriber.
Attributes¶
Parameters¶
-
score_hour:typing.List[float], default[-1, -1, -1, -1, -1, -1, -1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, -1, -1, -1]A length 24 list containing numerical scores between -1 and 1, where entry 0 is midnight.
-
score_dow:typing.Dict[str, float], default{'monday': 1, 'tuesday': 1, 'wednesday': 1, 'thursday': 0, 'friday': -1, 'saturday': -1, 'sunday': -1}A dictionary containing a key for every day of the week, and a numerical score between zero and 1. Keys should be the lowercase, full name of the day.
-
start:striso format date range for the beginning of the time frame, e.g. 2016-01-01 or 2016-01-01 14:03:01
-
stop:strAs above
-
spatial_unit:typing.Union[flowmachine.core.spatial_unit.CellSpatialUnit, flowmachine.core.spatial_unit.GeomSpatialUnit, NoneType], defaultNoneSpatial unit to which subscriber locations will be mapped. See the docstring of make_spatial_unit for more information.
-
hours:typing.Union[str, typing.Tuple[int, int]], defaultallSubset the result within certain hours, e.g. (4,17) This will subset the query only with these hours, but across all specified days. Or set to 'all' to include all hours.
-
table:typing.Union[str, typing.List[str]], defaultallschema qualified name of the table which the analysis is based upon.
-
subscriber_identifier:str, defaultmsisdnEither msisdn, or imei, the column that identifies the subscriber.
-
subscriber_subset:flowmachine.core.Table,flowmachine.core.Query,list,str, defaultNoneIf provided, string or list of string which are msisdn or imeis to limit results to; or, a query or table which has a column with a name matching subscriber_identifier (typically, msisdn), to limit results to.
Examples¶
es = EventScore(start='2016-01-01', stop='2016-01-05',
spatial_unit=make_spatial_unit('versioned-site'))
es.head()
subscriber location_id version score_hour score_dow
3EgqzplqPYDyGRVK DbWg4K 0 0.0 -1.0
G2DQzae1qOa48jK9 EyZykQ 0 1.0 -1.0
148ZaRZe54wPGQ9r nWM8R3 0 -1.0 -1.0
QrAlXqDbXDkNJe3E pdVVV4 0 1.0 0.0
kjGXLy9lWnZ4V6J7 r9KbQy 0 0.0 1.0
...
Methods¶
cache¶
cache
Returns¶
-
boolTrue is caching is switched on.
column_names¶
column_names
Returns the column names.
Returns¶
-
typing.List[str]List of the column names of this query.
column_names_as_string_list¶
column_names_as_string_list
Get the column names as a comma separated list
Returns¶
-
strComma separated list of column names
dependencies¶
dependencies
Returns¶
-
setThe set of queries which this one is directly dependent on.
fully_qualified_table_name¶
fully_qualified_table_name
Returns a unique fully qualified name for the query to be stored as under the cache schema, based on a hash of the parameters, class, and subqueries.
Returns¶
-
strString form of the table's fqn
index_cols¶
index_cols
A list of columns to use as indexes when storing this query.
Returns¶
-
ixen:listBy default, returns the location columns if they are present and self.spatial_unit is defined, and the subscriber column.
Examples¶
daily_location("2016-01-01").index_cols
[['name'], '"subscriber"']
is_stored¶
is_stored
Returns¶
-
boolTrue if the table is stored, and False otherwise.
query_id¶
query_id
Generate a uniquely identifying hash of this query, based on the parameters of it and the subqueries it is composed of.
Returns¶
-
strquery_id hash string
query_state¶
query_state
Return the current query state.
Returns¶
-
QueryStateThe current query state
query_state_str¶
query_state_str
Return the current query state as a string
Returns¶
-
strThe current query state. The possible values are the ones defined in
flowmachine.core.query_state.QueryState.
table_name¶
table_name
Returns a uniquename for the query to be stored as, based on a hash of the parameters, class, and subqueries.
Returns¶
-
strString form of the table's fqn