flowmachine.features.utilities.combine_first¶
Class CombineFirst¶
CombineFirst(*, first_query: flowmachine.core.query.Query, other_query: flowmachine.core.query.Query, join_columns: Union[str, Collection[str]], combine_columns: Union[str, Collection[str]])
Given two queries 'first_query' and 'other_query', fill null or missing values in the result of 'first_query' using those in the result of 'other_query'. Values that will be filled include rows that are present in 'other_query' but not 'first_query', and rows present in both queries but where fields in 'first_query' have the value NULL. Somewhat analogous to pandas.DataFrame.combine_first(), except that here we specify the columns on which the queries will be (full outer) joined.
Attributes¶
Parameters¶
-
first_query
:flowmachine.core.query.Query
Query whose nulls will be filled
-
other_query
:flowmachine.core.query.Query
Query whose values will be used to fill nulls in first_query
-
join_columns
:typing.Union[str, typing.Collection[str]]
Names of columns on which queries will be joined
-
combine_columns
:typing.Union[str, typing.Collection[str]]
Names of columns in which null values will be filled
Note
Relevant column names are assumed to be the same in both queries (i.e. nulls in column 'col1' of first_query are filled with values from column 'col1' of other_query)
Methods¶
cache¶
cache
Returns¶
-
bool
True is caching is switched on.
column_names¶
column_names
Returns the column names.
Returns¶
-
typing.List[str]
List of the column names of this query.
column_names_as_string_list¶
column_names_as_string_list
Get the column names as a comma separated list
Returns¶
-
str
Comma separated list of column names
dependencies¶
dependencies
Returns¶
-
set
The set of queries which this one is directly dependent on.
fully_qualified_table_name¶
fully_qualified_table_name
Returns a unique fully qualified name for the query to be stored as under the cache schema, based on a hash of the parameters, class, and subqueries.
Returns¶
-
str
String form of the table's fqn
index_cols¶
index_cols
A list of columns to use as indexes when storing this query.
Returns¶
-
ixen
:list
By default, returns the location columns if they are present and self.spatial_unit is defined, and the subscriber column.
Examples¶
daily_location("2016-01-01").index_cols
[['name'], '"subscriber"']
is_stored¶
is_stored
Returns¶
-
bool
True if the table is stored, and False otherwise.
query_id¶
query_id
Generate a uniquely identifying hash of this query, based on the parameters of it and the subqueries it is composed of.
Returns¶
-
str
query_id hash string
query_state¶
query_state
Return the current query state.
Returns¶
-
QueryState
The current query state
query_state_str¶
query_state_str
Return the current query state as a string
Returns¶
-
str
The current query state. The possible values are the ones defined in
flowmachine.core.query_state.QueryState
.
table_name¶
table_name
Returns a uniquename for the query to be stored as, based on a hash of the parameters, class, and subqueries.
Returns¶
-
str
String form of the table's fqn