Skip to content

Mobile Data Usage

Analysis using FlowMachine directly

In this worked example we assume the role of an analyst within the MNO who has been granted access to use FlowMachine directly without going through FlowAPI. Our aim is to investigate how the number of mobile data session (MDS) events varies with the time of day.

The Jupyter notebook for this worked example can be downloaded here, or can be run using the quick start setup.

Load FlowMachine and connect to FlowDB

We start by importing the FlowMachine library. We also import geopandas and mapboxgl, which we will use later to to visualise the data.

import flowmachine
from flowmachine.core import make_spatial_unit
import os
import numpy as np
import geopandas as gpd
import mapboxgl
from mapboxgl.utils import (
    create_color_stops,
    create_weight_stops,
    df_to_geojson,
)

Next, we connect FlowMachine to FlowDB. The following configuration options should either be set as environment variables or passed as arguments to flowmachine.connect():

Variable name Argument name Purpose
FLOWMACHINE_FLOWDB_USER flowdb_user Your username for connectiong to FlowDB
FLOWMACHINE_FLOWDB_PASSWORD flowdb_password Your password for connectiong to FlowDB
REDIS_PASSWORD redis_password The password for the Redis instance
FLOWDB_PORT flowdb_port Port on which FlowDB is accessible
REDIS_PORT redis_port Port on which Redis is accessible

Other configuration options can also be set; see the FlowMachine documentation for more details.

flowmachine.connect()
FlowMachine version: 1.9.4
Flowdb running on: localhost:5432/flowdb (connecting user: flowmachine)

Get MDS event counts

We create a TotalLocationEvents query to calculate the number of MDS events per cell tower, at hourly intervals over the first 7 days of 2016.

data_events_query = flowmachine.features.TotalLocationEvents(
    start="2016-01-01",
    stop="2016-01-08",
    table="events.mds",
    spatial_unit=make_spatial_unit("versioned-cell"),
    interval="hour",
)

Then we call the get_dataframe method to run this query and get the result as a pandas DataFrame.

data_events = data_events_query.get_dataframe()

Next, we sum over the seven days to get total hourly counts per cell tower location.

events_per_hour = (
    data_events.groupby(["lon", "lat", "hour"]).sum().reset_index()
)

Visualise data events on a heatmap

We can easily view the total MDS event count per hour using the plot method of events_per_hour.

%matplotlib inline

events_per_hour.groupby("hour").sum().plot(y="value")
<matplotlib.axes._subplots.AxesSubplot at 0x7fc5e4c67790>

We can use the Mapbox GL library to display a heatmap of our MDS event counts for a particular hour.

Note: Mapbox requires an access token, which should be set as the environment variable MAPBOX_ACCESS_TOKEN. Note that this is only required for producing the Mapbox visualisations, which is completely separate from FlowKit.

hour_to_show = 0

mapbox_token = os.environ["MAPBOX_ACCESS_TOKEN"]

events_per_hour_geodataframe = gpd.GeoDataFrame(
    events_per_hour,
    geometry=gpd.points_from_xy(events_per_hour.lon, events_per_hour.lat),
)

heatmap_viz = mapboxgl.HeatmapViz(
    events_per_hour_geodataframe[
        events_per_hour_geodataframe.hour == hour_to_show
    ].__geo_interface__,
    access_token=mapbox_token,
    weight_property="value",
    weight_stops=create_weight_stops(np.geomspace(0.01, 1000, 9)),
    color_stops=create_color_stops(np.linspace(0.01, 1, 9), colors="RdPu"),
    radius_stops=[[0, 0], [5.5, 25], [15, 300]],  # increase radius with zoom
    opacity=0.8,
    below_layer="waterway-label",
    center=(84.1, 28.4),
    zoom=5.5,
)

heatmap_viz.show()