flowmachine.features.utilities.sets¶
Source: flowmachine/features/utilities/sets.py
Utility classes for subsetting CDRs.
Class SubscriberLocationSubset¶
SubscriberLocationSubset(start, stop, *, min_calls, subscriber_identifier='msisdn', direction: Union[str, flowmachine.features.utilities.direction_enum.Direction] = <Direction.BOTH: 'both'>, spatial_unit: Union[flowmachine.core.spatial_unit.CellSpatialUnit, flowmachine.core.spatial_unit.GeomSpatialUnit, NoneType] = None, hours: Optional[Tuple[int, int]] = None, subscriber_subset=None)
Query to get a subset of users who have made min_calls number of calls within a given region during period of time from start to stop.
Attributes¶
Parameters¶
-
start:datetimeStart time to filter query.
-
stop:datetimeStop time to filter query.
-
spatial_unit:typing.Union, defaultNoneSpatial unit to which subscriber locations will be mapped. See the docstring of make_spatial_unit for more information.
-
min_calls:intminimum number of calls a user must have made within a
-
direction:typing.Union, defaultbothWhether to consider calls made, received, or both. Defaults to 'both'.
-
hours:typing.Optional, defaultNoneRestrict the analysis to only a certain set of hours within each day.
-
subscriber_identifier:{'msisdn', 'imei'}, default'msisdn'Either msisdn, or imei, the column that identifies the subscriber.
-
subscriber_subset:flowmachine.core.Table,flowmachine.core.Query,list,str, defaultNoneIf provided, string or list of string which are msisdn or imeis to limit results to; or, a query or table which has a column with a name matching subscriber_identifier (typically, msisdn), to limit results to.
Examples¶
sls = SubscriberLocationSubset("2016-01-01", "2016-01-07", min_calls=3,
direction="both", spatial_unit=make_spatial_unit("admin", level=3))
sls.head()
subscriber name
038OVABN11Ak4W5P Dolpa
038OVABN11Ak4W5P Mugu
09NrjaNNvDanD8pk Banke
0ayZGYEQrqYlKw6g Dolpa
0DB8zw67E9mZAPK2 Baglung
...
Methods¶
cache¶
cache
Returns¶
-
boolTrue is caching is switched on.
column_names¶
column_names
Returns the column names.
Returns¶
-
typing.ListList of the column names of this query.
column_names_as_string_list¶
column_names_as_string_list
Get the column names as a comma separated list
Returns¶
-
strComma separated list of column names
dependencies¶
dependencies
Returns¶
-
setThe set of queries which this one is directly dependent on.
fully_qualified_table_name¶
fully_qualified_table_name
Returns a unique fully qualified name for the query to be stored as under the cache schema, based on a hash of the parameters, class, and subqueries.
Returns¶
-
strString form of the table's fqn
index_cols¶
index_cols
A list of columns to use as indexes when storing this query.
Returns¶
-
ixen:listBy default, returns the location columns if they are present and self.spatial_unit is defined, and the subscriber column.
Examples¶
daily_location("2016-01-01").index_cols
[['name'], '"subscriber"']
is_stored¶
is_stored
Returns¶
-
boolTrue if the table is stored, and False otherwise.
query_id¶
query_id
Generate a uniquely identifying hash of this query, based on the parameters of it and the subqueries it is composed of.
Returns¶
-
strquery_id hash string
query_state¶
query_state
Return the current query state.
Returns¶
-
QueryStateThe current query state
query_state_str¶
query_state_str
Return the current query state as a string
Returns¶
-
strThe current query state. The possible values are the ones defined in
flowmachine.core.query_state.QueryState.
table_name¶
table_name
Returns a uniquename for the query to be stored as, based on a hash of the parameters, class, and subqueries.
Returns¶
-
strString form of the table's fqn
Class UniqueSubscribers¶
UniqueSubscribers(start: str, stop: str, *, hours: Optional[Tuple[int, int]] = None, table: Union[str, List[str]] = 'all', subscriber_identifier: str = 'msisdn', subscriber_subset: Optional[flowmachine.core.query.Query] = None)
Class representing the set of all unique subscribers in our interactions table.
Attributes¶
Parameters¶
-
start:striso format date range for the beginning of the time frame, e.g. 2016-01-01 or 2016-01-01 14:03:01
-
stop:strAs above
-
hours:typing.Optional, defaultNoneSubset the result within certain hours, e.g. (4,17) This will subset the query only with these hours, but across all specified days. Or set to 'all' to include all hours.
-
table:typing.Union, defaultallTable on which to perform the query. By default it will look at ALL tables, which are any tables with subscriber information in them, specified via subscriber_tables in flowmachine.yml. Otherwise you need to specify a full table (with a schema) such as 'events.calls'.
-
subscriber_identifier:str, defaultmsisdnEither msisdn, or imei, the column that identifies the subscriber.
-
subscriber_subset:typing.Optional, defaultNoneIf provided, string or list of string which are msisdn or imeis to limit results to; or, a query or table which has a column with a name matching subscriber_identifier (typically, msisdn), to limit results to.
Examples¶
UU = UniqueSubscribers('2016-01-01 13:30:30',
'2016-01-02 16:25:00')
UU.as_set()
{'038OVABN11Ak4W5P',
'09NrjaNNvDanD8pk',
'0DB8zw67E9mZAPK2',
...}
Note
- A date without a hours and mins will be interpreted as midnight of that day, so to get data within a single day pass '2016-01-01', '2016-01-02'. * Use 24 hr format! * Will collect only onnet callers.
Methods¶
as_set¶
as_set(self)
Returns all unique subscribers as a set.
cache¶
cache
Returns¶
-
boolTrue is caching is switched on.
column_names¶
column_names
Returns the column names.
Returns¶
-
typing.ListList of the column names of this query.
column_names_as_string_list¶
column_names_as_string_list
Get the column names as a comma separated list
Returns¶
-
strComma separated list of column names
dependencies¶
dependencies
Returns¶
-
setThe set of queries which this one is directly dependent on.
fully_qualified_table_name¶
fully_qualified_table_name
Returns a unique fully qualified name for the query to be stored as under the cache schema, based on a hash of the parameters, class, and subqueries.
Returns¶
-
strString form of the table's fqn
index_cols¶
index_cols
A list of columns to use as indexes when storing this query.
Returns¶
-
ixen:listBy default, returns the location columns if they are present and self.spatial_unit is defined, and the subscriber column.
Examples¶
daily_location("2016-01-01").index_cols
[['name'], '"subscriber"']
is_stored¶
is_stored
Returns¶
-
boolTrue if the table is stored, and False otherwise.
query_id¶
query_id
Generate a uniquely identifying hash of this query, based on the parameters of it and the subqueries it is composed of.
Returns¶
-
strquery_id hash string
query_state¶
query_state
Return the current query state.
Returns¶
-
QueryStateThe current query state
query_state_str¶
query_state_str
Return the current query state as a string
Returns¶
-
strThe current query state. The possible values are the ones defined in
flowmachine.core.query_state.QueryState.
table_name¶
table_name
Returns a uniquename for the query to be stored as, based on a hash of the parameters, class, and subqueries.
Returns¶
-
strString form of the table's fqn