flowmachine.features.utilities.group_values¶
Source: flowmachine/features/utilities/group_values.py
Utility class that allows the subscriber to iterate through arbitrary groups of fields and apply a python function to the results.
Class GroupValues¶
GroupValues(group, value, start, stop, **kwargs)
Query representing groups of a certain columns with the values of other columns as an array.
Attributes¶
Parameters¶
-
group
:or list
ofstrings
,str
Name of the column(s) that should be grouped e.g. msisdn_from
-
value
:or list
ofstrings
,str
Name of the column(s) that should be returned as an array
-
start
,stop
:str
start and stop times of the analysis, in ISO-format
-
kwargs
:dict
Passed to flowmachine EventTableSubset
Examples¶
gv = GroupValues('msisdn_from', 'datetime')
for g,v in gv:
print((g, str(max(v))))
('SubscriberA', 2016-01-01 23:00:01)
('Subscriberb', 2016-01-01 22:12:04)
...
Note
- In the case when the subscriber passes more than one group or more than one values the results will be an iterator of the following form: - (group1, group2, array(value1), array(value2)) - This class is mostly used through the method
ColumnMap
which maps a subscriber defined python function to the output of the iterator.
Methods¶
ColumnMap¶
ColumnMap(self, fn)
Maps a function to each of the returned arrays, and returns an iterator over the results.
Examples¶
def highest_min(date_list):
return max([x.minute for x in date_list])
gv = GroupValues('msisdn_from', 'datetime')
cm = gv.ColumnMap(highest_min)
for c in cm:
print(c)
('BKMy1nYEZpnoEA7G', 58)
('DzpZJ2EaVQo2X5vM', 56)
('Zv4W9eak2QN1M5A7', 55)
('NQV3J52PeYgbLm2w', 54)
...
cache¶
cache
Returns¶
-
bool
True is caching is switched on.
column_names¶
column_names
Returns the column names.
Returns¶
-
typing.List
List of the column names of this query.
column_names_as_string_list¶
column_names_as_string_list
Get the column names as a comma separated list
Returns¶
-
str
Comma separated list of column names
dependencies¶
dependencies
Returns¶
-
set
The set of queries which this one is directly dependent on.
fully_qualified_table_name¶
fully_qualified_table_name
Returns a unique fully qualified name for the query to be stored as under the cache schema, based on a hash of the parameters, class, and subqueries.
Returns¶
-
str
String form of the table's fqn
index_cols¶
index_cols
A list of columns to use as indexes when storing this query.
Returns¶
-
ixen
:list
By default, returns the location columns if they are present and self.spatial_unit is defined, and the subscriber column.
Examples¶
daily_location("2016-01-01").index_cols
[['name'], '"subscriber"']
is_stored¶
is_stored
Returns¶
-
bool
True if the table is stored, and False otherwise.
query_id¶
query_id
Generate a uniquely identifying hash of this query, based on the parameters of it and the subqueries it is composed of.
Returns¶
-
str
query_id hash string
query_state¶
query_state
Return the current query state.
Returns¶
-
QueryState
The current query state
query_state_str¶
query_state_str
Return the current query state as a string
Returns¶
-
str
The current query state. The possible values are the ones defined in
flowmachine.core.query_state.QueryState
.
table_name¶
table_name
Returns a uniquename for the query to be stored as, based on a hash of the parameters, class, and subqueries.
Returns¶
-
str
String form of the table's fqn