Skip to content

flowmachine.features.utilities.sets

Source: flowmachine/features/utilities/sets.py

Utility classes for subsetting CDRs.

Class SubscriberLocationSubset

SubscriberLocationSubset(start, stop, *, min_calls, subscriber_identifier='msisdn', direction: Union[str, flowmachine.features.utilities.direction_enum.Direction] = <Direction.BOTH: 'both'>, spatial_unit: Union[flowmachine.core.spatial_unit.CellSpatialUnit, flowmachine.core.spatial_unit.GeomSpatialUnit, NoneType] = None, hours: Union[Tuple[int, int], NoneType] = None, subscriber_subset=None)
Source: flowmachine/features/utilities/sets.py

Query to get a subset of users who have made min_calls number of calls within a given region during period of time from start to stop.

Attributes

Parameters

  • start: datetime

    Start time to filter query.

  • stop: datetime

    Stop time to filter query.

  • spatial_unit: typing.Union[flowmachine.core.spatial_unit.CellSpatialUnit, flowmachine.core.spatial_unit.GeomSpatialUnit, NoneType], default None

    Spatial unit to which subscriber locations will be mapped. See the docstring of make_spatial_unit for more information.

  • min_calls: int

    minimum number of calls a user must have made within a

  • direction: typing.Union[str, flowmachine.features.utilities.direction_enum.Direction], default both

    Whether to consider calls made, received, or both. Defaults to 'both'.

  • hours: typing.Union[typing.Tuple[int, int], NoneType], default None

    Restrict the analysis to only a certain set of hours within each day.

  • subscriber_identifier: {'msisdn', 'imei'}, default 'msisdn'

    Either msisdn, or imei, the column that identifies the subscriber.

  • subscriber_subset: flowmachine.core.Table, flowmachine.core.Query, list, str, default None

    If provided, string or list of string which are msisdn or imeis to limit results to; or, a query or table which has a column with a name matching subscriber_identifier (typically, msisdn), to limit results to.

Examples

sls = SubscriberLocationSubset("2016-01-01", "2016-01-07", min_calls=3,
    direction="both", spatial_unit=make_spatial_unit("admin", level=3))
sls.head()
      subscriber     name
038OVABN11Ak4W5P    Dolpa
038OVABN11Ak4W5P     Mugu
09NrjaNNvDanD8pk    Banke
0ayZGYEQrqYlKw6g    Dolpa
0DB8zw67E9mZAPK2  Baglung
...

Methods

cache

cache
Source: flowmachine/core/query.py

Returns
  • bool

    True is caching is switched on.

column_names

column_names
Source: flowmachine/features/utilities/sets.py

Returns the column names.

Returns
  • typing.List[str]

    List of the column names of this query.

column_names_as_string_list

column_names_as_string_list
Source: flowmachine/core/query.py

Get the column names as a comma separated list

Returns
  • str

    Comma separated list of column names

dependencies

dependencies
Source: flowmachine/core/query.py

Returns
  • set

    The set of queries which this one is directly dependent on.

fully_qualified_table_name

fully_qualified_table_name
Source: flowmachine/core/query.py

Returns a unique fully qualified name for the query to be stored as under the cache schema, based on a hash of the parameters, class, and subqueries.

Returns
  • str

    String form of the table's fqn

index_cols

index_cols
Source: flowmachine/core/query.py

A list of columns to use as indexes when storing this query.

Returns
  • ixen: list

    By default, returns the location columns if they are present and self.spatial_unit is defined, and the subscriber column.

Examples
daily_location("2016-01-01").index_cols
[['name'], '"subscriber"']

is_stored

is_stored
Source: flowmachine/core/query.py

Returns
  • bool

    True if the table is stored, and False otherwise.

query_id

query_id
Source: flowmachine/core/query.py

Generate a uniquely identifying hash of this query, based on the parameters of it and the subqueries it is composed of.

Returns
  • str

    query_id hash string

query_state

query_state
Source: flowmachine/core/query.py

Return the current query state.

Returns
  • QueryState

    The current query state

query_state_str

query_state_str
Source: flowmachine/core/query.py

Return the current query state as a string

Returns
  • str

    The current query state. The possible values are the ones defined in flowmachine.core.query_state.QueryState.

table_name

table_name
Source: flowmachine/core/query.py

Returns a uniquename for the query to be stored as, based on a hash of the parameters, class, and subqueries.

Returns
  • str

    String form of the table's fqn

Class UniqueSubscribers

UniqueSubscribers(start: str, stop: str, *, hours: Union[Tuple[int, int], NoneType] = None, table: Union[str, List[str]] = 'all', subscriber_identifier: str = 'msisdn', subscriber_subset: Union[flowmachine.core.query.Query, NoneType] = None)
Source: flowmachine/features/utilities/sets.py

Class representing the set of all unique subscribers in our interactions table.

Attributes

Parameters

  • start: str

    iso format date range for the beginning of the time frame, e.g. 2016-01-01 or 2016-01-01 14:03:01

  • stop: str

    As above

  • hours: typing.Union[typing.Tuple[int, int], NoneType], default None

    Subset the result within certain hours, e.g. (4,17) This will subset the query only with these hours, but across all specified days. Or set to 'all' to include all hours.

  • table: typing.Union[str, typing.List[str]], default all

    Table on which to perform the query. By default it will look at ALL tables, which are any tables with subscriber information in them, specified via subscriber_tables in flowmachine.yml. Otherwise you need to specify a full table (with a schema) such as 'events.calls'.

  • subscriber_identifier: str, default msisdn

    Either msisdn, or imei, the column that identifies the subscriber.

  • subscriber_subset: typing.Union[flowmachine.core.query.Query, NoneType], default None

    If provided, string or list of string which are msisdn or imeis to limit results to; or, a query or table which has a column with a name matching subscriber_identifier (typically, msisdn), to limit results to.

Examples

UU = UniqueSubscribers('2016-01-01 13:30:30',
                     '2016-01-02 16:25:00')
UU.as_set()
{'038OVABN11Ak4W5P',
 '09NrjaNNvDanD8pk',
 '0DB8zw67E9mZAPK2',
 ...}

Note

  • A date without a hours and mins will be interpreted as midnight of that day, so to get data within a single day pass '2016-01-01', '2016-01-02'. * Use 24 hr format! * Will collect only onnet callers.

Methods

as_set

as_set(self)
Source: flowmachine/features/utilities/sets.py

Returns all unique subscribers as a set.

cache

cache
Source: flowmachine/core/query.py

Returns
  • bool

    True is caching is switched on.

column_names

column_names
Source: flowmachine/features/utilities/sets.py

Returns the column names.

Returns
  • typing.List[str]

    List of the column names of this query.

column_names_as_string_list

column_names_as_string_list
Source: flowmachine/core/query.py

Get the column names as a comma separated list

Returns
  • str

    Comma separated list of column names

dependencies

dependencies
Source: flowmachine/core/query.py

Returns
  • set

    The set of queries which this one is directly dependent on.

fully_qualified_table_name

fully_qualified_table_name
Source: flowmachine/core/query.py

Returns a unique fully qualified name for the query to be stored as under the cache schema, based on a hash of the parameters, class, and subqueries.

Returns
  • str

    String form of the table's fqn

index_cols

index_cols
Source: flowmachine/core/query.py

A list of columns to use as indexes when storing this query.

Returns
  • ixen: list

    By default, returns the location columns if they are present and self.spatial_unit is defined, and the subscriber column.

Examples
daily_location("2016-01-01").index_cols
[['name'], '"subscriber"']

is_stored

is_stored
Source: flowmachine/core/query.py

Returns
  • bool

    True if the table is stored, and False otherwise.

query_id

query_id
Source: flowmachine/core/query.py

Generate a uniquely identifying hash of this query, based on the parameters of it and the subqueries it is composed of.

Returns
  • str

    query_id hash string

query_state

query_state
Source: flowmachine/core/query.py

Return the current query state.

Returns
  • QueryState

    The current query state

query_state_str

query_state_str
Source: flowmachine/core/query.py

Return the current query state as a string

Returns
  • str

    The current query state. The possible values are the ones defined in flowmachine.core.query_state.QueryState.

table_name

table_name
Source: flowmachine/core/query.py

Returns a uniquename for the query to be stored as, based on a hash of the parameters, class, and subqueries.

Returns
  • str

    String form of the table's fqn