flowmachine.features.utilities.sets¶
Source: flowmachine/features/utilities/sets.py
Utility classes for subsetting CDRs.
Class SubscriberLocationSubset¶
SubscriberLocationSubset(start, stop, *, min_calls, subscriber_identifier='msisdn', direction: Union[str, flowmachine.features.utilities.direction_enum.Direction] = <Direction.BOTH: 'both'>, spatial_unit: Union[flowmachine.core.spatial_unit.CellSpatialUnit, flowmachine.core.spatial_unit.GeomSpatialUnit, NoneType] = None, hours: Optional[Tuple[int, int]] = None, subscriber_subset=None)
Query to get a subset of users who have made min_calls number of calls within a given region during period of time from start to stop.
Attributes¶
Parameters¶
-
start
:datetime
Start time to filter query.
-
stop
:datetime
Stop time to filter query.
-
spatial_unit
:typing.Union
, defaultNone
Spatial unit to which subscriber locations will be mapped. See the docstring of make_spatial_unit for more information.
-
min_calls
:int
minimum number of calls a user must have made within a
-
direction
:typing.Union
, defaultboth
Whether to consider calls made, received, or both. Defaults to 'both'.
-
hours
:typing.Optional
, defaultNone
Restrict the analysis to only a certain set of hours within each day.
-
subscriber_identifier
:{'msisdn', 'imei'}
, default'msisdn'
Either msisdn, or imei, the column that identifies the subscriber.
-
subscriber_subset
:flowmachine.core.Table
,flowmachine.core.Query
,list
,str
, defaultNone
If provided, string or list of string which are msisdn or imeis to limit results to; or, a query or table which has a column with a name matching subscriber_identifier (typically, msisdn), to limit results to.
Examples¶
sls = SubscriberLocationSubset("2016-01-01", "2016-01-07", min_calls=3,
direction="both", spatial_unit=make_spatial_unit("admin", level=3))
sls.head()
subscriber name
038OVABN11Ak4W5P Dolpa
038OVABN11Ak4W5P Mugu
09NrjaNNvDanD8pk Banke
0ayZGYEQrqYlKw6g Dolpa
0DB8zw67E9mZAPK2 Baglung
...
Methods¶
cache¶
cache
Returns¶
-
bool
True is caching is switched on.
column_names¶
column_names
Returns the column names.
Returns¶
-
typing.List
List of the column names of this query.
column_names_as_string_list¶
column_names_as_string_list
Get the column names as a comma separated list
Returns¶
-
str
Comma separated list of column names
dependencies¶
dependencies
Returns¶
-
set
The set of queries which this one is directly dependent on.
fully_qualified_table_name¶
fully_qualified_table_name
Returns a unique fully qualified name for the query to be stored as under the cache schema, based on a hash of the parameters, class, and subqueries.
Returns¶
-
str
String form of the table's fqn
index_cols¶
index_cols
A list of columns to use as indexes when storing this query.
Returns¶
-
ixen
:list
By default, returns the location columns if they are present and self.spatial_unit is defined, and the subscriber column.
Examples¶
daily_location("2016-01-01").index_cols
[['name'], '"subscriber"']
is_stored¶
is_stored
Returns¶
-
bool
True if the table is stored, and False otherwise.
query_id¶
query_id
Generate a uniquely identifying hash of this query, based on the parameters of it and the subqueries it is composed of.
Returns¶
-
str
query_id hash string
query_state¶
query_state
Return the current query state.
Returns¶
-
QueryState
The current query state
query_state_str¶
query_state_str
Return the current query state as a string
Returns¶
-
str
The current query state. The possible values are the ones defined in
flowmachine.core.query_state.QueryState
.
table_name¶
table_name
Returns a uniquename for the query to be stored as, based on a hash of the parameters, class, and subqueries.
Returns¶
-
str
String form of the table's fqn
Class UniqueSubscribers¶
UniqueSubscribers(start: str, stop: str, *, hours: Optional[Tuple[int, int]] = None, table: Union[str, List[str]] = 'all', subscriber_identifier: str = 'msisdn', subscriber_subset: Optional[flowmachine.core.query.Query] = None)
Class representing the set of all unique subscribers in our interactions table.
Attributes¶
Parameters¶
-
start
:str
iso format date range for the beginning of the time frame, e.g. 2016-01-01 or 2016-01-01 14:03:01
-
stop
:str
As above
-
hours
:typing.Optional
, defaultNone
Subset the result within certain hours, e.g. (4,17) This will subset the query only with these hours, but across all specified days. Or set to 'all' to include all hours.
-
table
:typing.Union
, defaultall
Table on which to perform the query. By default it will look at ALL tables, which are any tables with subscriber information in them, specified via subscriber_tables in flowmachine.yml. Otherwise you need to specify a full table (with a schema) such as 'events.calls'.
-
subscriber_identifier
:str
, defaultmsisdn
Either msisdn, or imei, the column that identifies the subscriber.
-
subscriber_subset
:typing.Optional
, defaultNone
If provided, string or list of string which are msisdn or imeis to limit results to; or, a query or table which has a column with a name matching subscriber_identifier (typically, msisdn), to limit results to.
Examples¶
UU = UniqueSubscribers('2016-01-01 13:30:30',
'2016-01-02 16:25:00')
UU.as_set()
{'038OVABN11Ak4W5P',
'09NrjaNNvDanD8pk',
'0DB8zw67E9mZAPK2',
...}
Note
- A date without a hours and mins will be interpreted as midnight of that day, so to get data within a single day pass '2016-01-01', '2016-01-02'. * Use 24 hr format! * Will collect only onnet callers.
Methods¶
as_set¶
as_set(self)
Returns all unique subscribers as a set.
cache¶
cache
Returns¶
-
bool
True is caching is switched on.
column_names¶
column_names
Returns the column names.
Returns¶
-
typing.List
List of the column names of this query.
column_names_as_string_list¶
column_names_as_string_list
Get the column names as a comma separated list
Returns¶
-
str
Comma separated list of column names
dependencies¶
dependencies
Returns¶
-
set
The set of queries which this one is directly dependent on.
fully_qualified_table_name¶
fully_qualified_table_name
Returns a unique fully qualified name for the query to be stored as under the cache schema, based on a hash of the parameters, class, and subqueries.
Returns¶
-
str
String form of the table's fqn
index_cols¶
index_cols
A list of columns to use as indexes when storing this query.
Returns¶
-
ixen
:list
By default, returns the location columns if they are present and self.spatial_unit is defined, and the subscriber column.
Examples¶
daily_location("2016-01-01").index_cols
[['name'], '"subscriber"']
is_stored¶
is_stored
Returns¶
-
bool
True if the table is stored, and False otherwise.
query_id¶
query_id
Generate a uniquely identifying hash of this query, based on the parameters of it and the subqueries it is composed of.
Returns¶
-
str
query_id hash string
query_state¶
query_state
Return the current query state.
Returns¶
-
QueryState
The current query state
query_state_str¶
query_state_str
Return the current query state as a string
Returns¶
-
str
The current query state. The possible values are the ones defined in
flowmachine.core.query_state.QueryState
.
table_name¶
table_name
Returns a uniquename for the query to be stored as, based on a hash of the parameters, class, and subqueries.
Returns¶
-
str
String form of the table's fqn