flowmachine.features.utilities.combine_first¶
Class CombineFirst¶
CombineFirst(*, first_query: flowmachine.core.query.Query, other_query: flowmachine.core.query.Query, join_columns: Union[str, Collection[str]], combine_columns: Union[str, Collection[str]])
Given two queries 'first_query' and 'other_query', fill null or missing values in the result of 'first_query' using those in the result of 'other_query'. Values that will be filled include rows that are present in 'other_query' but not 'first_query', and rows present in both queries but where fields in 'first_query' have the value NULL. Somewhat analogous to pandas.DataFrame.combine_first(), except that here we specify the columns on which the queries will be (full outer) joined.
Attributes¶
Parameters¶
-
first_query
:flowmachine.core.query.Query
Query whose nulls will be filled
-
other_query
:flowmachine.core.query.Query
Query whose values will be used to fill nulls in first_query
-
join_columns
:typing.Union
Names of columns on which queries will be joined
-
combine_columns
:typing.Union
Names of columns in which null values will be filled
Note
Relevant column names are assumed to be the same in both queries (i.e. nulls in column 'col1' of first_query are filled with values from column 'col1' of other_query)
Methods¶
cache¶
cache
Returns¶
-
bool
True is caching is switched on.
column_names¶
column_names
Returns the column names.
Returns¶
-
typing.List
List of the column names of this query.
column_names_as_string_list¶
column_names_as_string_list
Get the column names as a comma separated list
Returns¶
-
str
Comma separated list of column names
dependencies¶
dependencies
Returns¶
-
set
The set of queries which this one is directly dependent on.
fully_qualified_table_name¶
fully_qualified_table_name
Returns a unique fully qualified name for the query to be stored as under the cache schema, based on a hash of the parameters, class, and subqueries.
Returns¶
-
str
String form of the table's fqn
index_cols¶
index_cols
A list of columns to use as indexes when storing this query.
Returns¶
-
ixen
:list
By default, returns the location columns if they are present and self.spatial_unit is defined, and the subscriber column.
Examples¶
daily_location("2016-01-01").index_cols
[['name'], '"subscriber"']
is_stored¶
is_stored
Returns¶
-
bool
True if the table is stored, and False otherwise.
query_id¶
query_id
Generate a uniquely identifying hash of this query, based on the parameters of it and the subqueries it is composed of.
Returns¶
-
str
query_id hash string
query_state¶
query_state
Return the current query state.
Returns¶
-
QueryState
The current query state
query_state_str¶
query_state_str
Return the current query state as a string
Returns¶
-
str
The current query state. The possible values are the ones defined in
flowmachine.core.query_state.QueryState
.
table_name¶
table_name
Returns a uniquename for the query to be stored as, based on a hash of the parameters, class, and subqueries.
Returns¶
-
str
String form of the table's fqn