Skip to content

flowmachine.core.table

Source: flowmachine/core/table.py

Simple utility class that represents arbitrary tables in the database.

Class Table

Table(name=None, schema=None, columns=None)
Source: flowmachine/core/table.py

Provides an interface to query any table by name and (optionally) schema.

Attributes

Parameters

  • name: str

    Name of the table, may be fully qualified

  • schema: str

    Optional if name is fully qualified

  • columns: str

    Optional list of columns

Examples

t = Table(name="calls", schema="events")
t.head()
                            id outgoing                  datetime  duration      0  5wNJA-PdRJ4-jxEdG-yOXpZ     True 2016-01-01 22:38:06+00:00    3393.0
1  5wNJA-PdRJ4-jxEdG-yOXpZ    False 2016-01-01 22:38:06+00:00    3393.0
2  ZYK4w-9aAD2-NN7ev-MRnBp     True 2016-01-01 07:05:47+00:00    4533.0
3  ZYK4w-9aAD2-NN7ev-MRnBp    False 2016-01-01 07:05:47+00:00    4533.0
4  mQjOy-5eVrm-Ll5eE-P4V27     True 2016-01-01 10:18:31+00:00     422.0
...
t = Table(name="calls", schema="events", columns=["id", "duration"])
t.head()
                        id  duration
0  5wNJA-PdRJ4-jxEdG-yOXpZ    3393.0
1  5wNJA-PdRJ4-jxEdG-yOXpZ    3393.0
2  ZYK4w-9aAD2-NN7ev-MRnBp    4533.0
3  ZYK4w-9aAD2-NN7ev-MRnBp    4533.0
4  mQjOy-5eVrm-Ll5eE-P4V27     422.0
...

Methods

estimated_rowcount

estimated_rowcount(self, include_children=True)
Source: flowmachine/core/table.py

Parameters
  • include_children: bool

    Set to false to exclude the rows of child tables

Returns
  • int

    An estimate of the number of rows in this table.

get_query

get_query(self)
Source: flowmachine/core/table.py

Returns a string representing an SQL query. The string will point to the database cache of this query if it exists.

Returns
  • str

    SQL query string.

get_table

get_table(self)
Source: flowmachine/core/table.py

If this Query is stored, return a Table object referencing the stored version. If it is not stored, raise an exception.

Returns
  • flowmachine.core.Table

    The stored version of this Query as a Table object

has_children

has_children(self)
Source: flowmachine/core/table.py

Returns
  • bool

    True if this table has subtables

invalidate_db_cache

invalidate_db_cache(self, name=None, schema=None, cascade=True, drop=False)
Source: flowmachine/core/table.py

Helper function for store, optionally drops this table, and (by default) any cached tables that depend on it, as well as removing them from the cache metadata table. Parameters ------ name : str Name of the table schema : str Schema of the table cascade : bool Set to False to remove only this table from cache drop : bool Set to True to drop the table in addition to removing from cache

random_sample

random_sample(self, sampling_method="random_ids", **params)
Source: flowmachine/core/table.py

Draws a random sample from this table.

Parameters
  • sampling_method: {'system', 'system_rows', 'bernoulli', 'random_ids'}, default 'random_ids'

    Specifies the method used to select the random sample. 'system_rows': performs block-level sampling by randomly sampling each physical storage page of the underlying relation. This sampling method is guaranteed to provide a sample of the specified size 'system': performs block-level sampling by randomly sampling each physical storage page for the underlying relation. This sampling method is not guaranteed to generate a sample of the specified size, but an approximation. This method may not produce a sample at all, so it might be worth running it again if it returns an empty dataframe. 'bernoulli': samples directly on each row of the underlying relation. This sampling method is slower and is not guaranteed to generate a sample of the specified size, but an approximation 'random_ids': samples rows by randomly sampling the row number.

  • size: optional, int

    The number of rows to draw. Exactly one of the 'size' or 'fraction' arguments must be provided.

  • fraction: optional, float

    Fraction of rows to draw. Exactly one of the 'size' or 'fraction' arguments must be provided.

  • estimate_count: bool, default True

    Whether to estimate the number of rows in the table using information contained in the pg_class or whether to perform an actual count in the number of rows.

  • seed: optional, float

    Optionally provide a seed for repeatable random samples. If using random_ids method, seed must be between -/+1. Not available in combination with the system_rows method.

Returns
  • Random

    A special query object which contains a random sample from this table

Note

Random samples may only be stored if a seed is supplied.

subset

subset(self, col, subset)
Source: flowmachine/core/table.py

Subsets one of the columns to a specified subset of values

Parameters
  • col: str

    Name of the column to subset, e.g. subscriber, cell etc.

  • subset: list

    List of values to subset to

Returns
  • Subset

cache

cache
Source: flowmachine/core/query.py

Returns
  • bool

    True is caching is switched on.

column_names

column_names
Source: flowmachine/core/table.py

Returns the column names.

Returns
  • typing.List[str]

    List of the column names of this query.

column_names_as_string_list

column_names_as_string_list
Source: flowmachine/core/query.py

Get the column names as a comma separated list

Returns
  • str

    Comma separated list of column names

dependencies

dependencies
Source: flowmachine/core/query.py

Returns
  • set

    The set of queries which this one is directly dependent on.

fully_qualified_table_name

fully_qualified_table_name
Source: flowmachine/core/table.py

Returns a unique fully qualified name for the query to be stored as under the cache schema, based on a hash of the parameters, class, and subqueries.

Returns
  • str

    String form of the table's fqn

index_cols

index_cols
Source: flowmachine/core/query.py

A list of columns to use as indexes when storing this query.

Returns
  • ixen: list

    By default, returns the location columns if they are present and self.spatial_unit is defined, and the subscriber column.

Examples
daily_location("2016-01-01").index_cols
[['name'], '"subscriber"']

is_stored

is_stored
Source: flowmachine/core/table.py

Returns
  • bool

    True if the table is stored, and False otherwise.

query_id

query_id
Source: flowmachine/core/query.py

Generate a uniquely identifying hash of this query, based on the parameters of it and the subqueries it is composed of.

Returns
  • str

    query_id hash string

query_state

query_state
Source: flowmachine/core/query.py

Return the current query state.

Returns
  • QueryState

    The current query state

query_state_str

query_state_str
Source: flowmachine/core/query.py

Return the current query state as a string

Returns
  • str

    The current query state. The possible values are the ones defined in flowmachine.core.query_state.QueryState.

table_name

table_name
Source: flowmachine/core/query.py

Returns a uniquename for the query to be stored as, based on a hash of the parameters, class, and subqueries.

Returns
  • str

    String form of the table's fqn