flowmachine.core.table¶
Source: flowmachine/core/table.py
Simple utility class that represents arbitrary tables in the database.
Class Table¶
Table(name: Union[str, NoneType] = None, schema: Union[str, NoneType] = None, columns: Union[Iterable[str], NoneType] = None)
Provides an interface to query any table by name and (optionally) schema.
Attributes¶
Parameters¶
-
name
:typing.Union[str, NoneType]
, defaultNone
Name of the table, may be fully qualified
-
schema
:typing.Union[str, NoneType]
, defaultNone
Optional if name is fully qualified
-
columns
:typing.Union[typing.Iterable[str], NoneType]
, defaultNone
Optional list of columns
Examples¶
t = Table(name="calls", schema="events")
t.head()
id outgoing datetime duration 0 5wNJA-PdRJ4-jxEdG-yOXpZ True 2016-01-01 22:38:06+00:00 3393.0
1 5wNJA-PdRJ4-jxEdG-yOXpZ False 2016-01-01 22:38:06+00:00 3393.0
2 ZYK4w-9aAD2-NN7ev-MRnBp True 2016-01-01 07:05:47+00:00 4533.0
3 ZYK4w-9aAD2-NN7ev-MRnBp False 2016-01-01 07:05:47+00:00 4533.0
4 mQjOy-5eVrm-Ll5eE-P4V27 True 2016-01-01 10:18:31+00:00 422.0
...
t = Table(name="calls", schema="events", columns=["id", "duration"])
t.head()
id duration
0 5wNJA-PdRJ4-jxEdG-yOXpZ 3393.0
1 5wNJA-PdRJ4-jxEdG-yOXpZ 3393.0
2 ZYK4w-9aAD2-NN7ev-MRnBp 4533.0
3 ZYK4w-9aAD2-NN7ev-MRnBp 4533.0
4 mQjOy-5eVrm-Ll5eE-P4V27 422.0
...
Methods¶
estimated_rowcount¶
estimated_rowcount(self, include_children=True)
Parameters¶
-
include_children
:bool
Set to false to exclude the rows of child tables
Returns¶
-
int
An estimate of the number of rows in this table.
get_query¶
get_query(self)
Returns a string representing an SQL query. The string will point to the database cache of this query if it exists.
Returns¶
-
str
SQL query string.
get_table¶
get_table(self)
If this Query is stored, return a Table object referencing the stored version. If it is not stored, raise an exception.
Returns¶
-
flowmachine.core.Table
The stored version of this Query as a Table object
has_children¶
has_children(self)
Returns¶
-
bool
True if this table has subtables
invalidate_db_cache¶
invalidate_db_cache(self, name=None, schema=None, cascade=True, drop=False)
Helper function for store, optionally drops this table, and (by default) any cached tables that depend on it, as well as removing them from the cache metadata table. Parameters ------ name : str Name of the table schema : str Schema of the table cascade : bool Set to False to remove only this table from cache drop : bool Set to True to drop the table in addition to removing from cache
random_sample¶
random_sample(self, sampling_method="random_ids", **params)
Draws a random sample from this table.
Parameters¶
-
sampling_method
:{'system', 'system_rows', 'bernoulli', 'random_ids'}
, default'random_ids'
Specifies the method used to select the random sample. 'system_rows': performs block-level sampling by randomly sampling each physical storage page of the underlying relation. This sampling method is guaranteed to provide a sample of the specified size 'system': performs block-level sampling by randomly sampling each physical storage page for the underlying relation. This sampling method is not guaranteed to generate a sample of the specified size, but an approximation. This method may not produce a sample at all, so it might be worth running it again if it returns an empty dataframe. 'bernoulli': samples directly on each row of the underlying relation. This sampling method is slower and is not guaranteed to generate a sample of the specified size, but an approximation 'random_ids': samples rows by randomly sampling the row number.
-
size
:optional
,int
The number of rows to draw. Exactly one of the 'size' or 'fraction' arguments must be provided.
-
fraction
:optional
,float
Fraction of rows to draw. Exactly one of the 'size' or 'fraction' arguments must be provided.
-
estimate_count
:bool
, defaultTrue
Whether to estimate the number of rows in the table using information contained in the
pg_class
or whether to perform an actual count in the number of rows. -
seed
:optional
,float
Optionally provide a seed for repeatable random samples. If using random_ids method, seed must be between -/+1. Not available in combination with the system_rows method.
Returns¶
-
Random
A special query object which contains a random sample from this table
Note
Random samples may only be stored if a seed is supplied.
subset¶
subset(self, col, subset)
Subsets one of the columns to a specified subset of values
Parameters¶
-
col
:str
Name of the column to subset, e.g. subscriber, cell etc.
-
subset
:list
List of values to subset to
Returns¶
Subset
cache¶
cache
Returns¶
-
bool
True is caching is switched on.
column_names¶
column_names
Returns the column names.
Returns¶
-
typing.List[str]
List of the column names of this query.
column_names_as_string_list¶
column_names_as_string_list
Get the column names as a comma separated list
Returns¶
-
str
Comma separated list of column names
dependencies¶
dependencies
Returns¶
-
set
The set of queries which this one is directly dependent on.
fully_qualified_table_name¶
fully_qualified_table_name
Returns a unique fully qualified name for the query to be stored as under the cache schema, based on a hash of the parameters, class, and subqueries.
Returns¶
-
str
String form of the table's fqn
index_cols¶
index_cols
A list of columns to use as indexes when storing this query.
Returns¶
-
ixen
:list
By default, returns the location columns if they are present and self.spatial_unit is defined, and the subscriber column.
Examples¶
daily_location("2016-01-01").index_cols
[['name'], '"subscriber"']
is_stored¶
is_stored
Returns¶
-
bool
True if the table is stored, and False otherwise.
query_id¶
query_id
Generate a uniquely identifying hash of this query, based on the parameters of it and the subqueries it is composed of.
Returns¶
-
str
query_id hash string
query_state¶
query_state
Return the current query state.
Returns¶
-
QueryState
The current query state
query_state_str¶
query_state_str
Return the current query state as a string
Returns¶
-
str
The current query state. The possible values are the ones defined in
flowmachine.core.query_state.QueryState
.
table_name¶
table_name
Returns a uniquename for the query to be stored as, based on a hash of the parameters, class, and subqueries.
Returns¶
-
str
String form of the table's fqn