Changelog¶
All notable changes to FlowKit will be documented in this file.
The format is based on Keep a Changelog.
Unreleased¶
Added¶
Changed¶
Fixed¶
Removed¶
1.22.0¶
Added¶
- FlowETL sensor NRowsPresentSensorwhich checks for a specified minimum number of rows.
Changed¶
- ForeignStagingTableOperatorwill now error if the underlying file cannot be read or the command returns an error. #5763
- Flowmachine now requires SQLAlchemy >= 2.0.0 #6066
1.21.1¶
Added¶
Changed¶
- Upgraded Python dependencies
Fixed¶
Removed¶
1.21.0¶
Added¶
- Added new FlowDB tables infrastructure.cell_infoandinfrastructure.cells_table_versionsto keep track of changes to the cell info over time (note: the new tables have not yet replacedinfrastructure.cellsas the source of cell information for FlowKit queries). #6184
1.20.0¶
Changed¶
- Updated flowpyter-task to 1.1.0
Removed¶
- Removed AutoFlow. #6394
1.19.1¶
Added¶
- Added flowpyter-task to FlowETL container
1.19.0¶
Added¶
- FlowETL now updates a new table events.location_idseach time a new day of CDR data is ingested, to record the first and last date that each location ID appears in the data. #5376
- New FlowETL QA check "count_locatable_events", which counts the number of added rows with location ID corresponding to a cell with a known location. #5289
- flowkit_jwt_generator is now published as a wheel via pypi
1.18.4¶
Changed¶
- docker-composehas been replaced with- docker composein the makefile; this might break builds on machines that haven't updated their docker in a while.
Fixed¶
- SQLAlchemy version installed in the FlowMachine docker image is now compatible with the flowmachine library. #6052
1.18.3¶
Added¶
- Quickstart script now supports arbitrary countries via EXAMPLE_COUNTRYenv var. #5796
- FlowDB's maximum locks per transaction setting can now be controlled using the MAX_LOCKS_PER_TRANSACTIONenv var. #5157
Changed¶
- Increased FlowDB's default maximum locks per transaction to 365 * 5 * 4 * (1 + 4). #5157
Fixed¶
- Null values in first column of first row of ingested data no longer cause flowetl to skip ingestion #5090
1.18.2¶
Fixed¶
- Fixed migrations being missing from the built FlowAuth docker images #5818
1.18.1¶
Added¶
- Added Alembic support via flask-migrateto Flowauth #5799
1.18.0¶
Added¶
- Added views etl.ingested_state,etl.available_datesandetl.deduped_post_etl_queriesin FlowDB, for convenient extraction of relevant information from the ETL tables. #5641
- Added MajorityLocationWithUnlocatablequery class andmajority_locationfunction. #5720
Changed¶
- Important; tokens issued by previous versions of Flowauth are not compatible with this version. Users will need to regenerate tokens using the updated Flowauth.
- Move from groupstorolesin flowauth; see here for full details. #5613
- Changed AIRFLOW__CORE__SQL_ALCHEMY_CONNenv var toAIRFLOW__DATABASE__SQL_ALCHEMY_CONN
- RoleScopePicker component redesigned and reimplemented.
- Docs now recommend creating a separate bind mount for airflow scheduler logs, and include this in the secrets quickstart. #3622
- jwttokens now use- subinstead of- identityfor- JWT_IDENTITY_CLAIM.
- A majority_locationquery withinclude_unlocatable=Truewill now include rows for all subscribers in thesubscriber_location_weightssub-query, including those for whom all weights are negative (previously subscribers with only negative weights were excluded).
Fixed¶
- Fixed a potential deadlock when using a small connection pool and store-ing queries
- AutoFlow can now be run in a docker container with non-default user. #5574
- Passing an empty list of events tables when creating a query now raises ValueError: Empty tables list.instead of aMissingDateError. #436
- Flowmachine now looks at only the most recent state (per CDR type per CDR date) in etl.etl_recordsto determine available dates. #5641
- It is now possible to run API queries that include multiple different aggregation units (e.g. joined_spatial_aggregatewithdisplacementmetric). #4649
- Demo roles can now be used in worked_examples. #5735
Removed¶
- Removed the include_unlocatableparameter fromMajorityLocationclass (themajority_locationfunction should be used instead ifinclude_unlocatableis required). #5720
1.17.1¶
Added¶
- Added get_aggregation_unitserver action, for getting the aggregation unit associated with a query specification. #5141
Changed¶
- nocturnal_eventsnow expects a- night_hoursparameter with nested sub-fields- start_hourand- end_hour, instead of two parameters- night_start_hourand- night_end_hour.
- Spatial units with a mapping table now only include cells that appear in the mapping table. #5360
Fixed¶
- Invalid sub-query specs nested within a modal_locationspec now raise appropriate validation errors, instead of being masked by internal flowmachine server errors. #4816
1.17.0¶
Added¶
Changed¶
- Action Needed Airflow updated to version 2.3.3; backup flowetl_db before applying update #4940
- Tables created under the cache schema in FlowDB will automatically be set to be owned by the flowmachineuser. #4714
- Query.explainwill now explain the query even where it is already stored. #1285
- unstored_dependencies_graphno longer blocks until dependencies are in a determinate state. #4949
- In and out flows no longer return location columns with to/from suffix.
- FlowDB now always creates a role named flowmachine.
- Flowmachine will set the state of a query being stored to cancelled if interrupted while the store is running.
- Flowmachine now supports sqlalchemy >=1.4 #5140
Fixed¶
- Flowmachine now makes the built in flowmachinerole owner of cache tables as a post-action when a query isstored. #4714
- TopupBalance now returns the weighted mode when requested instead of weighted median #1412
- Fixed in and out flow geojson for multicolumn location types #5132
- quick_start.shshould no longer raise a misleading error if- ssis not installed. #3151
Removed¶
- use_file_flux_sensorremoved entirely. #2812
- Model,- ModelResultand- Louvainhave been removed. #5168
1.16.0¶
Added¶
- Most frequent locations is now available via FlowAPI. #3165
- Total active periods is now available via FlowAPI.
- Made hour of day slicing available via FlowAPI. #3165
- Added visited on most days reference location query. #4267
- Added unique value from query list query. #4486
- Added mixin for exposing start_date and end_date internally as datetime objects #4497
- Added CombineFirstandCoalescedLocationqueries. #4524
- Added MajorityLocationquery. #4522
- Added join_typeparam toFlowsclass. #4539
- Added PerSubscriberAggregatequery. #4559
- Added FlowETL QA checks 'count_imeis', 'count_imsis', 'count_locatable_location_ids', 'count_null_imeis', 'count_null_imsis', 'count_null_location_ids', 'max_msisdns_per_imei', 'max_msisdns_per_imsi', 'count_added_rows_outgoing', 'count_null_counterparts', 'count_null_durations', 'count_onnet_msisdns_incoming', 'count_onnet_msisdns_outgoing', 'count_onnet_msisdns', 'max_duration' and 'median_duration'. #4552
- Added FilteredReferenceLocationquery, which returns only rows where a subscriber visited a reference location the required number of times. #4584
- Added LabelledSpatialAggregatequery and redaction, which sub-aggregates by subscriber labels. #4668
- Added MobilityClassificationquery, to classify subscribers by mobility type based on a sequence of locations. #4666
- Exposed CoalescedLocationvia FlowAPI, in the specific case where the fallback location is aFilteredReferenceLocationquery. #4585
- Added LabelledFlowsquery, which returns flows disaggregated by label #4679
- Exposed LabelledSpatialAggregateandLabelledFlowsvia FlowAPI, with aMobilityClassificationquery accepted as the 'labels' parameter. #4669
- Added RedactedLabelledAggregateand subclasses for redacting labelled data (see ADR 0011). #4671
Changed¶
- Harmonised FlowAPI parameter names for start and end dates. They are now all start_dateandend_date
- Further improvements to token display in FlowAuth. #1124
- Increased the FlowDB quickstart container's timeout to 15 minutes. #782
- Unionand- Query.unionnow accept a variable number of queries to concatenate. #4565
Fixed¶
- Autoflow's prefect version is now current. #2544
- FlowMachine server will now successfully remove cache for queries defined in an interactive flowmachine session during cleanup. #4008
1.15.0¶
Added¶
- FlowETL flux check can be turned off by setting use_flux_sensor=Falseincreate_dag. #3603
Changed¶
- The use_file_flux_sensorargument tocreate_dagis deprecated. To use the table-based flux check in a file-based DAG, setuse_flux_sensor='table'.
- Improvements to token display in FlowAuth. #2812
1.14.6¶
Added¶
- A list of additional paths to FlowETL QA checks can now be supplied to create_dagandget_qa_checks. #3484
- FlowETL docker container now includes the upgrade check script for Airflow 2.0.0.
Fixed¶
- Additional FlowETL QA checks in the dags folder are now picked up. #3484
- Quickstart will no longer raise a warning about unset Autoflow related environment variables. #2118
1.14.5¶
Fixed¶
- FlowETL QA checks with template sections conditional on the cdr_typeargument now render correctly. #3479
1.14.4¶
Fixed¶
- Fixed FlowClient ignoring custom SSL certificates #3344
1.14.3¶
Fixed¶
- Fixed FlowETL not using the randomly generated secret key to secure sessions with the web interface if one is not explicitly provided using AIRFLOW__WEBSERVER__SECRET_KEY. #3244
1.14.2¶
Fixed¶
- Reinstated tabs navigation in the docs #3238
- Removed $from code snippets in developer docs #3224
- FlowETL now randomly generates a secret key to secure sessions with the web interface if one is not explicitly provided using AIRFLOW__WEBSERVER__SECRET_KEY. #3244
1.14.1¶
Fixed¶
- Docs displaying None where they shouldn't
1.14.0¶
Added¶
- Previously run, or currently running queries can now be referenced as a subscriber subset via FlowAPI. #1009
- total_network_objects, location_introversion, and unique_subscriber_counts now also accept subscriber subsets.
- The validity window for FlowAuth 2factor codes can now be configured using the TWO_FACTOR_VALID_WINDOWenv variable. #3203
Changed¶
- get_cached_query_objects_ordered_by_scoreis now a generator. #3116
- Flowclient now uses httpx instead of requests, for improved async performance and http2 support. #1789
Fixed¶
- FlowAPI now correctly logs all query run, poll, and retrieval requests for matching with FlowMachine. #3071
- Links in the installation docs are now generated correctly. #3152
1.13.0¶
Changed¶
- When creating a file-based DAG using create_dag, you can now use the slower, table based method of checking whether the file is being written. #2857
1.12.0¶
Added¶
- The issuer name can now be set for FlowAuth's 2factor authentication using the FLOWAUTH_TWO_FACTOR_ISSUERenvironment variable.
- 
FlowAPI's internal port can now be set using the FLOWAPI_PORTenvironment variable, but continues to default to9090. #2723With thanks to JIPS for supporting this work. - FlowETL's default port can now be set using the FLOWETL_PORTenvironment variable, but continues to default to8080. #2724With thanks to JIPS for supporting this work. 
Changed¶
- Test and synthetic DFS data now uses the same pool of subscribers as CDR data. #2713
With thanks to JIPS for supporting this work.
1.11.1¶
Added¶
- FlowDB's SQL synthetic data generator now uses the WorldPop project's 2016 population raster for the country chosen as the basis for generating data.
1.11.0¶
Added¶
- Queries run through FlowAPI can now be run on only a subset of the available CDR types, by supplying an event_typesparameter. #2631
- FlowETL now includes QA checks for the earliest and latest timestamps in the ingested data. #2627
Fixed¶
- The FlowETL 'count_duplicates' QA check now correctly counts the number of duplicate rows. #2651
1.10.0¶
Added¶
- 
FlowDB's SQL synthetic data generator can now generate events for any country, not just Nepal. To generate synthetic data for a different country, supply the COUNTRYenvironment variable when starting the container, and a valid GADM GID code for the region to simulate a disaster.
Changed¶
- FlowMachine's docker container now uses Python 3.8
- FlowAPI's docker container now uses Python 3.8
- FlowAuth's docker container now uses Python 3.8
- AutoFlow's docker container now uses Python 3.8
- FlowDB's SQL synthetic data generator now uses GADM 3.6 boundaries.
- FlowAuth and FlowAPI now exchange tokens with compressed claims. #2625
Fixed¶
- FlowAuth will no longer fail to start if there are directories with names the same as the SSL certificate secrets.
1.9.4¶
Changed¶
- JoinToLocationis cacheable only if the joined query is also cacheable.
1.9.3¶
Changed¶
- SubscriberLocationsare no longer cacheable using FlowMachine.
Fixed¶
- Fixed cache shrinking failing when large numbers of tables have been written. #2462
- Fixed FlowAuth's MySQL support.
1.9.2¶
Fixed¶
- Added missing bridge table arguments to Several FlowClient methods.
1.9.1¶
Added¶
- FlowAuth now supports MySQL as a database backend.
- FlowKit now allows the use of bridge tables to manually specify linkages between cells and geometries.
Fixed¶
- FlowAuth no longer errors after a period of inactivity due to timed out database connections. #2382
1.9.0¶
Added¶
- Added new FlowAPI aggregates; unique_visitor_counts,active_at_reference_location_counts,unmoving_counts,unmoving_at_reference_location_counts,trips_od_matrix, andconsecutive_trips_od_matrix
- Added new Flows type query to FlowAPI unique_locations, which produces the paired regional connectivity COVID-19 indicator
- Added FlowClient function unique_locations_spec, which can be used on either side of aflowsquery
- Added FlowClient functions: unique_visitor_counts,active_at_reference_location_counts,unmoving_counts,unmoving_at_reference_location_counts,trips_od_matrix, andconsecutive_trips_od_matrix. #2333
- FlowClient now has an asyncio API. Use connect_asyncinstead ofconnectto create anASyncConnection, andawaitmethods onAPIQueryobjects. #2199
Fixed¶
- Fixed FlowMachine server becoming deadlocked under load. #2390
1.8.0¶
Added¶
- Added subscriber metrics: ActiveAtReferenceLocation,Unmoving,UnmovingAtReferenceLocationandUniqueLocations
- Added location metrics and their Redacted*equivalents:- UniqueVisitorCounts
- UnmovingAtReferenceLocationCounts(COVID-19 equivalent)
- ActiveAtReferenceLocationCounts
- UnmovingCount(COVID-19 equivalent)
- TripsODMatrix(COVID-19 equivalent)
- ConsecutiveTripsODMatrix(COVID-19 equivalent) See https://covid19.flowminder.org for more detail on how Flowminder is supporting the global COVID-19 response.
 
1.7.0¶
Changed¶
- FlowETL is now based on the official apache-airflow docker image. As a result, you should now bind mount your host dags directory to /opt/airflow/dags, and your logs directory to/opt/airflow/logs.
1.6.1¶
Fixed¶
- FlowMachine server will now ignore values for the FLOWMACHINE_SERVER_THREADPOOL_SIZEenvironment variable which can't be cast toint. #2304
1.6.0¶
Added¶
- histogram_aggregateadded to FlowAPI and FlowClient. Allows the user to obtain a histogram over a per-subscriber metric. #1076
1.5.1¶
Added¶
- FlowClient now displays a progress bar when waiting for a query to ready, indicating how many parts of that query still need to be run.
1.5.0¶
Added¶
- Added a flowclient Queryclass to represent a FlowKit query #1980.
- Added method flowclient.Connection.update_token, to replace the API token for an existing connection.
Changed¶
- The names of flowclient functions for generating query specifications have been renamed to <previous_name>_spec(e.g.flowclient.modal_locationis nowflowclient.modal_location_spec).
- flowclient.get_statusnow returns- "not_running"(instead of raising- FileNotFoundError) if a query is not running or completed.
- Flowclient functions location_event_counts_spec,meaningful_locations_aggregate_spec,meaningful_locations_between_label_od_matrix_spec,meaningful_locations_between_dates_od_matrix_spec,flows_spec,unique_subscriber_counts_spec,location_introversion_spec,total_network_objects_spec,aggregate_network_objects_spec,spatial_aggregate_specandjoined_spatial_aggregate_spechave moved to theflowclient.aggregatessubmodule.
1.4.0¶
Added¶
- FlowAPI can now return results in CSV and GeoJSON format, FlowClient now supports getting GeoJSON formatted results. #2003
1.3.3¶
Added¶
- FlowAPI now reports the proportion of subqueries cached for a query when polling. #1202
- FlowClient now logs info messages with the proportion of subqueries cached for a query when polling. #1202
Fixed¶
- Fixed the display of deeply nested permissions for flows in FlowAuth. #2110
1.3.2¶
Fixed¶
- Fixed tokens which used the FlowAuth demo data not being accepted by FlowAPI. #2108
1.3.1¶
Changed¶
- Flowmachine now uses an enum for interaction direction parameters (but will still accept them as strings). #357
Removed¶
- Removed unused aggregates, results and features schemas from FlowDB. #587
1.3.0¶
Added¶
- Improved UI for API permissions in FlowAuth.
Changed¶
- The format of user claims expected has changed from a dictionary, to string based format. FlowAPI now expects the claims key of any token to contain a list of scope strings.
- Permissions for joined spatial aggregates can now be set at a finer level in FlowAuth, to allow administrators to grant access only to specific combinations of query types at different aggregation units.
- FlowAuth no longer requires administrators to manually configure API routes, and will extract them from a FlowAPI server's open api specification.
- FlowAuth now uses structlog for log messages.
- FlowAPI no longer mandates a top level aggregation_unitfield in query specifications.
- FlowClient's flowsandmodal_locationfunctions no longer require an aggregation unit.
Removed¶
- The poll type permission has been removed, and is implicitly granted by both read and get_result rights.
- FlowAuth no longer allows administrators to specify the name of a FlowAPI server, and will instead use the name specified in the server's open api specification.
1.2.1¶
Fixed¶
- Queries which have been removed Flowmachine's cache, or cancelled can now be rerun. #1898
1.2.0¶
Added¶
- FlowMachine can now use multiple FlowDB backends, redis instances or execution pools via the flowmachine.connectionsorflowmachine.core.context.contextcontext managers. #391
- flowmachine.core.connection.Connectionnow has a- conn_idattribute, which is unique per database host. #391
Changed¶
- flowmachine.connectno longer returns a- Connectionobject. The connection should be accessed via- flowmachine.core.context.get_db(). #391
- connection,- redis, and- threadpoolare no longer available as attributes of- Query, and should be accessed via- flowmachine.core.context.get_db(),- flowmachine.core.context.get_redis()and- flowmachine.core.context.get_executor(). #391
Removed¶
- Removed Query.connection,Query.redis, andQuery.threadpool. #391
1.1.1¶
Added¶
- Added a worked example to demonstrate using joined spatial aggregate queries. #1938
1.1.0¶
Changed¶
- Connection.available_datesis now a property and returns results based on the- etl.etl_recordstable. #1873
Fixed¶
- Fixed the run action blocking the FlowMachine server in some scenarios. #1256
Removed¶
- Removed tablesandcolumnsmethods from theConnectionclass in FlowMachine
- Removed the inspectorattribute from theConnectionclass in FlowMachine
1.0.0¶
Added¶
- FlowMachine now periodically prunes the cache to below the permitted cache size. #1307
    The frequency of this pruning is configurable using the FLOWMACHINE_CACHE_PRUNING_FREQUENCYenvironment variable to Flowmachine, and queries are excluded from being removed by the automatic shrinker based on thecache_protected_periodconfig key within FlowDB.
- FlowDB now includes Paul Ramsey's OGR foreign data wrapper, for easy loading of GIS data. #1512
- FlowETL now allows all configuration options to be set using docker secrets. #1515
- Added a new component, AutoFlow, to automate running Jupyter notebooks when new data is added to FlowDB. #1570
- FLOWETL_INTEGRATION_TESTS_SAVE_AIRFLOW_LOGSenvironment variable added to allow copying the Airflow logs in FlowETL integration tests into the /mounts/logs directory for debugging. #1019
- Added new IterativeMedianFilterquery to Flowmachine, which applies an iterative median filter to the output of another query. #1339
- FlowDB now includes the TDS foreign data wrapper. #1729
- Added contributing and support instructions. #1791
- New FlowETL module installable via pip to aid in ETL dag creation.
Changed¶
- FlowDB is now built on PostgreSQL 12 #1396 and PostGIS 3.
- FlowETL is now built on Airflow 10.1.6.
- FlowETL now defaults to disabling Airflow's REST API, and enables RBAC for the webui. #1516
- FlowETL now requires that the FLOWETL_AIRFLOW_ADMIN_USERNAMEandFLOWETL_AIRFLOW_ADMIN_PASSWORDenvironment variables be set, which specify the default web ui account. #1516
- FlowAPI will no longer return a result for rows in spatial aggregate, joined spatial aggregate, flows, total events, meaningful locations aggregate, meaningful locations od, or unique subscriber count where the aggregate would contain less than 16 sims. #1026
- FlowETL now requires that AIRFLOW__CORE__SQL_ALCHEMY_CONNbe provided as an environment variable or secret. #1702, #1703
- FlowAuth now records last used two-factor authentication codes in an expiring cache, which supports either a file-based, or redis backend. #1173
- AutoFlow now uses Bundler to manage Ruby dependencies.
- The end_dateparameter offlowclient.modal_location_from_datesnow refers to the day after the final date included in the range, so is now consistent with other queries that have start/end date parameters. #819
- Date intervals in AutoFlow date stencils are now interpreted as half-open intervals (i.e. including start date, excluding end date), for consistency with date ranges elsewhere in FlowKit.
- flowmachineuser now has read access to ETL metadata tables in FlowDB
Fixed¶
- Quickstart should no longer fail on systems which do not include the netstattool. #1472
- Fixed an error that prevented FlowAuth admin users from resetting users' passwords using the FlowAuth UI. #1635
- The 'Cancel' button on the FlowAuth 'New User' form no longer submits the form. #1636
- FlowAuth backend now sends a meaningful 400 response when trying to create a user with an empty password. #1637
- Usernames of deleted users can now be re-used as usernames for new users. #1638
- RedactedJoinedSpatialAggregate now only redacts rows with too few subscribers. #1747
- FlowDB now uses a more conservative default setting for tcp_keepalives_idleof 10 minutes, to avoid connections being killed after 15 minutes when running in a docker swarm. #1771
- Aggregation units and api routes can now be added to servers. #1815
- Fixed several issues with FlowETL. #1529 #1499 #1498 #1497
Removed¶
- Removed pg_cron.
0.9.1¶
Added¶
- Added new DistanceSeriesquery to Flowmachine, which produces per-subscriber time series of distance from a reference point. #1313
- Added new ImputedDistanceSeriesquery to Flowmachine, which produces contiguous per-subscriber time series of distance from a reference point by filling in gaps using the rolling median. #1337
Changed¶
Fixed¶
- The FlowETL config file is now always validated, avoiding runtime errors if a config setting is wrong or missing. #1375
- FlowETL now only creates DAGs for CDR types which are present in the config, leading to a better user experience in the Airflow UI. #1376
- The concurrencysettings in the FlowETL config are no longer ignored. #1378
- The FlowETL deployment example has been updated so that it no longer fails due to a missing foreign data wrapper for the available CDR dates. #1379
- Fixed error when editing a user in FlowAuth who did not have two factor enabled. #1374
- Fixed not being able to enable a newly added api route on existing servers in FlowAuth. #1373
Removed¶
- The default_argssection in the FlowETL config file has been removed. #1377
0.9.0¶
Added¶
- FlowAuth now makes version information available at /versionand displays it in the web ui. #835
- FlowETL now comes with a deployment example (in flowetl/deployment_example/). #1126
- FlowETL now allows to run supplementary post-ETL queries. #989
- Random sampling is now exposed via the API, for all non-aggregated query kinds. #1007
- New aggregate added to FlowMachine - HistogramAggregation, which constructs histograms over the results of other queries. #1075
- New IntereventIntervalquery class - returns stats over the gap between events as a time interval.
- Added submodule flowmachine.core.dependency_graph, which contains functions related to creating or using query dependency graphs (previously these were inutils.py).
- New config option sql_find_available_datesin FlowETL to provide SQL code to determine the available dates. #1295
Changed¶
- FlowDB is now based on PostgreSQL 11.5 and PostGIS 2.5.3
- When running queries through FlowAPI, the query's dependencies will also be cached by default. This behaviour can be switched off by setting FLOWMACHINE_SERVER_DISABLE_DEPENDENCY_CACHING=true. #1152
- NewSubscribersnow takes a pair of- UniqueSubscribersqueries instead of the arguments to them
- Flowmachine's default random sampling method is now random_idsrather than the non-reproduciblesystem_rows. #1263
- IntereventPeriodnow returns stats over the gap between events in fractional time units, instead of time intervals. #1265
- Attempting to store a query that does not have a standard table name (e.g. EventTableSubsetor unseeded random sample) will now raise anUnstorableQueryErrorinstead ofValueError.
- In the FlowETL deployment example, the external ingestion database is now set up separately from the FlowKit components and connected to FlowDB via a docker overlay network. #1276
- The md5attribute of theQueryclass has been renamed toquery_id#1288.
- DistanceMatrixno longer returns duplicate rows for the lon-lat spatial unit.
- Previously, Displacementdefaulted to returningNaNfor subscribers who have a location in the reference location but were not seen in the time period for the displacement query. These subscribers are no longer returned unless thereturn_subscribers_not_seenargument is set toTrue.
- PopulationWeightedOpportunitiesis now available under- flowmachine.features.location, instead of- flowmachine.models
- PopulationWeightedOpportunitiesno longer supports erroring with incomplete per-location departure rate vectors and will instead omit any locations not included from the results
- PopulationWeightedOpportunitiesno longer requires use of the- run()method
Fixed¶
- Quickstart will no longer fail if it has been run previously with a different FlowDB data size and not explicitly shut down. #900
Removed¶
- Flowmachine's subscriber_locations_clusterfunction has been removed - useHartiganClusterorMeaningfulLocationsdirectly.
- FlowAPI no longer supports the non-reproducible random sampling method system_rows. #1263
0.8.0¶
Added¶
- FlowAPI's 'joined_spatial_aggregate' endpoint now exposes event counts. #992
- FlowAPI's 'joined_spatial_aggregate' endpoint now exposes top-up amount. #967
- FlowAPI's 'joined_spatial_aggregate' endpoint now exposes nocturnal events. #1025
- FlowAPI's 'joined_spatial_aggregate' endpoint now exposes top-up balance. #968
- FlowAPI's 'joined_spatial_aggregate' endpoint now exposes displacement. #1010
- FlowAPI's 'joined_spatial_aggregate' endpoint now exposes pareto interactions. #1012
- FlowETL now supports ingesting from a postgres table in addition to CSV files. #1027
- FLOWETL_RUNTIME_CONFIGenvironment variable added to control which DAG definitions the FlowETL integration tests should use (valid values: "testing", "production").
- FLOWETL_INTEGRATION_TESTS_DISABLE_PULLING_DOCKER_IMAGESenvironment variable added to allow running the FlowETL integration tests against locally built docker images during development.
- FlowAPI's 'joined_spatial_aggregate' endpoint now exposes handset. #1011 and #1029
- JoinedSpatialAggregatenow supports "distr" stats which computes outputs the relative distribution of the passed metrics.
- Added SubscriberHandsetCharacteristicto FlowMachine
- FlowAuth now supports optional two-factor authentication #121
Changed¶
- The flowdb containers for test_data and synthetic_data were split into two separate containers and quick_start.sh downloads the docker-compose files to a new temporary directory on each run. #843
- Flowmachine now returns more informative error messages when query parameter validation fails. #1055
Removed¶
- TESTINGenvironment variable was removed (previously used by the FlowETL integration tests).
- Removed SubscriberPhoneTypefrom FlowMachine to avoid redundancy.
0.7.0¶
Added¶
- PRIVATE_JWT_SIGNING_KEYenvironment variable/secret added to FlowAuth, which should be a PEM encoded RSA private key, optionally base64 encoded if supplied as an environment variable.
- PUBLIC_JWT_SIGNING_KEYenvironment variable/secret added to FlowAPI, which should be a PEM encoded RSA public key, optionally base64 encoded if supplied as an environment variable.
- The dev provisioning Ansible playbook now automatically generates an SSH key pair for the flowkituser. #892
- Added new classes to represent spatial units in FlowMachine.
- Added a Geographyquery class, to get geography data for a spatial unit.
- FlowAPI's 'joined_spatial_aggregate' endpoint now exposes unique location counts.#949
- FlowAPI's 'joined_spatial_aggregate' endpoint now exposes subscriber degree.#969
- Flowdb now contains an auxiliary table to record outcomes of queries that can be run as part of the regular ETL process #988
Changed¶
- The quick-start script now only pulls the docker images for the services that are actually started up. #898
- FlowAuth and FlowAPI are now linked using an RSA keypair, instead of per-server shared secrets. #89
- Location-related FlowMachine queries now take a spatial_unitparameter instead oflevel.
- The quick-start script now uses the environment variable GIT_REVISIONto control the version to be deployed.
- Create token page permission and spatial aggregation checkboxes are now hidden by default.#834
- The flowetl mounted directories archive, dump, ingest, quarantinewere replaced with a singlefilesdirectory and files are no longer moved. #946
- FlowDB's postgresql has been updated to 11.4, which addresses several bugs and one major vulnerability.
Fixed¶
- When creating a new token in FlowAuth, the expiry now always shows the year, seconds till expiry, and timezone. #260
- Distances in Displacementare now calculated with longitude and latitude the corrcet way around. #913
- The quick-start script now works correctly with branches. #902
- Fixed location_event_countsfailing to work when specifying a subset of event types #1015
- FlowAPI will now show the correct version in the API spec, flowmachine and flowclient will show the correct versions in the worked examples. #818
Removed¶
- 
Removed cell_mappings.py,get_columns_for_levelandBadLevelError.
- 
JWT_SECRET_KEYhas been removed in favour of RSA keys.
- The FlowDB tables infrastructure.countriesandinfrastructure.operatorshave been removed. #958
0.6.4¶
Added¶
- Buttons to copy token to clipboard and download token as file added to token list page. #704
- Two new worked examples: "Cell Towers Per Region" and "Unique Subscriber Counts". #633, #634
Changed¶
- The FLOWDB_DEBUGenvironment variable has been renamed toFLOWDB_ENABLE_POSTGRES_DEBUG_MODE.
- FlowAuth will now automatically set up the database when started without needing to trigger via the cli.
- FlowAuth now requires that at least one administrator account is created by providing env vars or secrets for:- FLOWAUTH_ADMIN_PASSWORD
- FLOWAUTH_ADMIN_USERNAME
 
Fixed¶
- The FLOWDB_DEBUGenvironment variable used to have no effect. This has been fixed. #811
- Previously, queries could be stuck in an executing state if writing their cache metadata failed, they will now correctly show as having errored. #833
- Fixed an issue where Tableobjects could be in an inconsistent cache state after resetting cache #832
- FlowAuth's docker container can now be used with a Postgres backing database. #825
- FlowAPI now starts up successfully when following the "Secrets Quickstart" instructions in the docs. #836
- The command to generate an SSL certificate in the "Secrets Quickstart" section in the docs has been fixed and made more robust #837
- FlowAuth will no longer try to initialise the database or create demo data multiple times when running under uwsgi with multiple workers #844
- Fixed issue of Multiple tokens don't line up on FlowAuth "Tokens" page #849
Removed¶
- The FLOWDB_SERVICESenvironment variable has been removed from the toplevel Makefile, so that nowDOCKER_SERVICESis the only environment variable that controls which services are spun up when runningmake up. #827
0.6.3¶
Added¶
- FlowKit's worked examples are now Dockerized, and available as part of the quick setup script #614
- Skeleton for Airflow based ETL system added with basic ETL DAG specification and tests.
- The docs now contain information about required versions of installation prerequisites #703
- FlowAPI now requires the FLOWAPI_IDENTIFIERenvironment variable to be set, which contains the name used to identify this FlowAPI server when generating tokens in FlowAuth #727
- flowmachine.utils.calculate_dependency_graphnow includes the- Queryobjects in the- query_objectfield of the graph's nodes dictionary #767
- Architectural Decision Records (ADR) have been added and are included in the auto-generated docs #780
- Added FlowDB environment variables SHARED_BUFFERS_SIZEandEFFECTIVE_CACHE_SIZE, to allow manually setting the Postgres configuration parametersshared_buffersandeffective_cache_size.
- The function print_dependency_tree()now takes an optional argumentshow_storedto display information whether dependent queries have been stored or not #804
- A new function plot_dependency_graph()has been added which allows to conveniently plot and visualise a dependency graph for use in Jupyter notebooks (this requires IPython and pygraphviz to be installed) #786
Changed¶
- Parameter names in flowmachine.connect()have been renamed as follows to be consistent with the associated environment variables #728:- db_port -> flowdb_port
- db_user -> flowdb_user
- db_pass -> flowdb_password
- db_host -> flowdb_host
- db_connection_pool_size -> flowdb_connection_pool_size
- db_connection_pool_overflow -> flowdb_connection_pool_overflow
 
- FlowAPI and FlowAuth now expect an audience key to be present in tokens #727
- Dependent queries are now only included once in the md5 calculation of a given query (in particular, it changes the query ids compared to previous FlowKit versions).
- Error is displayed in the add user form of Flowauth if username is alredy exists. #690
- Error is displayed in the add group form of Flowauth if group name already exists. #709
- FlowAuth's add new server page now shows helper text for bad inputs. #749
- The class SubscriberSubsetterBasein FlowMachine no longer inherits fromQuery#740 (this changes the query ids compared to previous FlowKit versions).
Fixed¶
- FlowClient docs rendered to website now show the options available for arguments that require a string from some set of possibilities #695.
- The Flowmachine loggers are now initialised only once when flowmachine is imported, with a call to connect()only changing the log level #691
- The FERNET_KEY environment variable for FlowAuth is now named FLOWAUTH_FERNET_KEY
- The quick-start script now correctly aborts if one of the FlowKit services doesn't fully start up #745
- The maps in the worked examples docs pages now appear in any browser
- Example invocations of generate-jwtare no longer uncopyable due to line wrapping #778
- API parameter intervalforlocation_event_countsqueries is now correctly passed to the underlying FlowMachine query object #807.
0.6.2¶
Added¶
- Added a new module, flowkit-jwt-generator, which generates test JWT tokens for use with FlowAPI #564
- A new Ansible playbook was added in deployment/provision-dev.yml. In addition to the standard provisioning this installs pyenv, Python 3.7, pipenv and clones the FlowKit repository, which is useful for development purposes.
- Added a 'quick start' setup script for trying out a complete FlowKit system #688.
Changed¶
- FlowAPI's available_datesendpoint now always returns available dates for all event types and does not accept JSON
- Hints are now displayed in the add user form of FlowAuth if the form is not completed #679
- Error messages are now displayed when generating a new token in FlowAuth if the token's name is invalid #799
- The Ansible playbooks in deployment/now allow configuring the username and password for the FlowKit user account.
- Default compose file no longer includes build blocks, these have been moved to docker-compose-build.yml.
Fixed¶
- FlowDB synthetic data container no longer silently fails to generate data if data generator is not set #654
0.6.1¶
Fixed¶
- Fixed TotalNetworkObjectsraising an error when run with a lat-long level #108
- Radius of gyration no longer incorrectly appears as a top level api query
0.6.0¶
Added¶
- Added new flowclient API entrypoint, aggregate_network_objects, to access equivalent flowmachine query #601
- FlowAPI now exposes the API spec at the spec/openapi.jsonendpoint, and an interactive version of the spec at thespec/redocendpoint
- Added Makefile target make up-no_build, to spin up all containers without building the images
- Added resync_redis_with_cachefunction to cache utils, to allow administrators to align redis with FlowDB #636
- Added new flowclient API entrypoint, radius_of_gyration, to access (with simplified parameters) equivalent flowmachine queryRadiusOfGyration#602
Changed¶
- The periodargument toTotalNetworkObjectsin FlowMachine has been renamedtotal_by
- The periodargument tototal_network_objectsin FlowClient has been renamedtotal_by
- The byargument toAggregateNetworkObjectsin FlowMachine has been renamed toaggregate_by
- The stop_dateargument to themodal_location_from_datesandmeaningful_locations_*functions in FlowClient has been renamedend_date#470
- get_result_by_query_idnow accepts a- poll_intervalargument, which allows polling frequency to be changed
- The startandstopargument toEventTableSubsetare now mandatory.
- RadiusOfGyrationnow returns a- valuecolumn instead of an- rogcolumn
- TotalNetworkObjectsand- AggregateNetworkObjectsnow return a- valuecolumn, rather than- statistic_name
- All environment variables are now in a single development_environmentfile in the project root, development environment setup has been simplified
- Default FlowDB users for FlowMachine and FlowAPI have changed from "analyst" and "reporter" to "flowmachine" and "flowapi", respectively
- Docs and integration tests now use top level compose file
- The following environment variables have been renamed:- FLOWMACHINE_SERVER(FlowAPI) ->- FLOWMACHINE_HOST
- FM_PASSWORD(FlowDB),- FLOWDB_PASS(FlowMachine) ->- FLOWMACHINE_FLOWDB_PASSWORD
- API_PASSWORD(FlowDB),- FLOWDB_PASS(FlowAPI) ->- FLOWAPI_FLOWDB_PASSWORD
- FM_USER(FlowDB),- FLOWDB_USER(FlowMachine) ->- FLOWMACHINE_FLOWDB_USER
- API_USER(FlowDB),- FLOWDB_USER(FlowAPI) ->- FLOWAPI_FLOWDB_USER
- LOG_LEVEL(FlowMachine) ->- FLOWMACHINE_LOG_LEVEL
- LOG_LEVEL(FlowAPI) ->- FLOWAPI_LOG_LEVEL
- DEBUG(FlowDB) ->- FLOWDB_DEBUG
- DEBUG(FlowMachine) ->- FLOWMACHINE_SERVER_DEBUG_MODE
 
- The following Docker secrets have been renamed:- FLOWAPI_DB_USER->- FLOWAPI_FLOWDB_USER
- FLOWAPI_DB_PASS->- FLOWAPI_FLOWDB_PASSWORD
- FLOWMACHINE_DB_USER->- FLOWMACHINE_FLOWDB_USER
- FLOWMACHINE_DB_PASS->- FLOWMACHINE_FLOWDB_PASSWORD
- POSTGRES_PASSWORD_FILE->- POSTGRES_PASSWORD
- REDIS_PASSWORD_FILE->- REDIS_PASSWORD
 
- statusenum in FlowDB renamed to- etl_status
- reset_cachenow requires a redis client argument
Fixed¶
- Fixed being unable to add new users or servers when running FlowAuth with a Postgres database #622
- Resetting the cache using reset_cachewill now reset the state of queries in redis as well #650
- Fixed modestatistic forAggregateNetworkObjects#651
Removed¶
- Removed docker-compose-dev.yml, and docker-compose files indocs/,flowdb/tests/andintegration_tests/.
- Removed Dockerfile-devDockerfiles
- Removed ENVdefaults from the FlowMachine Dockerfile
- Removed POSTGRES_DBenvironment variable from FlowDB Dockerfile, database name is now hardcoded asflowdb
0.5.3¶
Added¶
- Added new spatial_aggregateAPI endpoint and FlowClient function #599
- Added new flowclient API entrypoint, total_network_objects(), to access (with simplified parameters) equivalent flowmachine query #581
- Added new flowclient API entrypoint, location_introversion(), to access (with simplified parameters) equivalent flowmachine query #577
- Added new flowclient API entrypoint, unique_subscriber_counts(), to access (with simplified parameters) equivalent flowmachine query #562
- New schema aggregatesand tableaggregates.aggregateshave been created for maintaining a record of the process and completion of scheduled aggregates.
- New joined_spatial_aggregateAPI endpoint and FlowClient function #600
Changed¶
- daily_locationand- modal_locationquery types are no longer accepted as top-level queries, and must be wrapped using- spatial_aggregate
- JoinedSpatialAggregateno longer accepts positional arguments
- JoinedSpatialAggregatenow supports "avg", "max", "min", "median", "mode", "stddev" and "variance" stats
Fixed¶
- total_network_objectsno longer returns results from- AggregateNetworkObjects#603
0.5.2¶
Fixed¶
- Fixed #514, which would cause the client to hang after submitting a query that couldn't be created
- Fixed #575, so that events at midnight are now considered to be happening on the following day
0.5.1¶
Added¶
- Added HandsetStatsto FlowMachine.
- Added new ContactReferenceLocationStatsquery class to FlowMachine.
- A new zmq message get_available_dateswas added to the flowmachine server, along with the/available_datesendpoint in flowapi and the functionget_available_dates()in flowclient. These allow to determine the dates that are available in the database for the supported event types.
Changed¶
- FlowMachine's debugging logs are now from a single logger (flowmachine.debug) and include the submodule in the submodule field instead of using it as the logger name
- FlowMachine's query run logger now uses the logger name flowmachine.query_run_log
- FlowAPI's access, run and debug loggers are now named flowapi.access,flowapi.queryandflowapi.debug
- FlowAPI's access and run loggers, and FlowMachine's query run logger now log to stdout instead of stderr
- Passwords for Redis and FlowDB must now be explicitly provided to flowmachine via argument to connect, env var, or secret
Removed¶
- FlowMachine and FlowAPI no longer support logging to a file
0.5.0¶
Added¶
- The flowmachine python library is now pip installable (pip install flowmachine)
- The flowmachine server now supports additional actions: get_available_queries,get_query_schemas,ping.
- Flowdb now contains a new dfsschema and associated tables to process mobile money transactions. In addition,flowdb_testdatacontains sample data for DFS transactions.
- The docs now include three worked examples of CDR analysis using FlowKit.
- Flowmachine now supports calculating the total amount of various DFS metrics (transaction amount,
    commission, fee, discount) per aggregation unit during a given date range. These metrics are also
    exposed in FlowAPI via the query kind dfs_metric_total_amount.
Changed¶
- The JSON structure when setting queries running via flowapi or the flowmachine server has changed:
    query parameters are now "inlined" alongside the query_kindkey, rather than nested using a separateparamskey. Example:- previously: {"query_kind": "daily_location", "params": {"date": "2016-01-01", "aggregation_unit": "admin3", "method": "last"}},
- now: {"query_kind": "daily_location", "date": "2016-01-01", "aggregation_unit": "admin3", "method": "last"}
 
- previously: 
- The JSON structure of zmq reply messages from the flowmachine server was changed.
    Replies now have the form: {"status": "[success|error]", "msg": "...", "payload": {...}.
- The flowmachine server action get_sqlwas renamed toget_sql_for_query_result.
- The parameter daily_location_methodwas renamed tomethod.
0.4.3¶
Added¶
- When running integration tests locally, normally pytest will automatically spin up servers for flowmachine and flowapi as part of the test setup.
    This can now be disabled by setting the environment variable FLOWKIT_INTEGRATION_TESTS_DISABLE_AUTOSTART_SERVERS=TRUE.
- The integration tests now use the environment variables FLOWAPI_HOST,FLOWAPI_PORTto determine how to connect to the flowapi server.
- A new data generator has been added to the synthetic data container which supports more data types, simple disaster simulation, and more plausible behaviours as well as increased performance
Changed¶
- FlowAPI now reports queued/running status for queries instead of just accepted
- The following environment variables have been renamed:- DB_USER->- FLOWDB_USER
- DB_USER->- FLOWDB_HOST
- DB_PASS->- FLOWDB_PASS
- DB_PW->- FLOWDB_PASS
- API_DB_USER->- FLOWAPI_DB_USER
- API_DB_PASS->- FLOWAPI_DB_PASS
- FM_DB_USER->- FLOWMACHINE_DB_USER
- FM_DB_PASS->- FLOWMACHINE_DB_PASS
 
- Added numerator_directiontoProportionEventTypeto allow for proportion of directed events.
Fixed¶
- Server no longer loses track of queries under heavy load
- TopUpBalancesno longer always uses entire topups table
Removed¶
- The environment variable DB_NAMEhas been removed.
0.4.2¶
Changed¶
- MDSVolumeno longer allows specifying the table, and will always use the- mdstable.
- All FlowMachine logs are now in structured json form
- FlowAPI now uses structured logs for debugging messages
0.4.1¶
Added¶
- Added TopUpAmount,TopUpBalancequery classes to FlowMachine.
- Added PerLocationEventStats,PerContactEventStatsto FlowMachine
Removed¶
- Removed TotalSubscriberEventsfrom FlowMachine as it is superseded byEventCount.
0.4.0¶
Added¶
- Dockerised development setup, with support for live reload of flowmachineandflowapiafter source code changes.
- Pre-commit hook for Python formatting with black.
- Added new IntereventPeriod,ContactReciprocal,ProportionContactReciprocal,ProportionEventReciprocal,ProportionEventTypeandMDSVolumequery classes to FlowMachine.
Changed¶
- CustomQuerynow requires column names to be specified
- Query classes are now required to declare the column names they return via the column_namesproperty
- FlowAPI now reports whether a query is queued or running when polling
- FlowDB test data and synthetic data images are now available from their own Docker repos (Flowminder/flowdb-testdata, Flowminder/flowdb-synthetic-data)
- Changed query class name from NocturnalCallstoNocturnalEvents.
Fixed¶
- FlowAPI is now an installable python module
Removed¶
- Query objects can no longer be recalculated to cache and must be explicitly removed first
- Arbitrary Flowmaths
- EdgeListquery type
- Removes query class ProportionOutgoingas it becomes redundant with the the introduction ofProportionEventType.
0.3.0¶
Added¶
- API route for retrieving geography data from FlowDB
- Aggregated meaningful locations are now available via FlowAPI
- Origin-destination matrices between meaningful locations are now available via FlowAPI
- Added new MeaningfulLocations,MeaningfulLocationsAggregateandMeaningfulLocationsODquery classes to FlowMachine
Changed¶
- Constructors for HartiganCluster,LabelEventScore,EventScoreandCallDaysnow have different signatures
- Restructured and extended documentation; added high-level overview and more targeted information for different types of users
0.2.2¶
Added¶
- Support for running FlowDB as an arbitrary user via docker's --userflag
Removed¶
- Support for setting the uid and gid of the postgres user when building FlowDB
0.2.1¶
Fixed¶
- Fixed being unable to build if the port used by git://is not open
0.2.0¶
Added¶
- Added utilities for managing and inspecting the query cache
0.1.2¶
Changed¶
- FlowDB now requires a password to be set for the flowdb superuser
0.1.1¶
Added¶
- Support for password protected redis
Changed¶
- Changed the default redis image to bitnami's redis (to enable password protection)
0.1.0¶
Added¶
- Added structured logging of access attempts, query running, and data access
- Added CHANGELOG.md
- Added support for Postgres JIT in FlowDB
- Added total location events metric to FlowAPI and FlowClient
- Added ETL bookkeeping schema to FlowDB
Changed¶
- Added changelog update to PR template
- Increased default shared memory size for FlowDB containers
Fixed¶
- Fixed being unable to delete groups in FlowAuth
- Fixed make upnot working with defaults
0.0.5¶
Added¶
- Added Python 3.6 support for FlowClient