Methods for clustering point collections. Methods available are designed to work with infrastructure elements, but can be used to any other point collection.
LocationCluster( point_collection="sites", location_identifier="id", geometry_identifier="geom_point", method="kmeans", distance_tolerance=1, density_tolerance=5, number_of_clusters=5, date=None, aggregate=False, return_no_cluster=True, )
Class for computing clusters of points using different algorithms. This class was designed to work with infrastructure elements (i.e. towers/sites), but can also be used with other point collection as long as that is a table in the database. This class currently implements three methods: K-means, DBSCAN, and Area. K-means is a clustering algorithm that clusters together points based on the point's distance to a point representing the centroid of the cluster. The algorithm has two steps: (a) point allocation and (b) centroid re-calculation. In (a) it allocates points to the centroid in which they are closest to. In (b) it moves the centroid to the mean location of the distances to all its members. The process will continue until (b) causes the centroid to stop moving, resulting in a Voronoi tesselation. For more information, refer to the Wikipedia entry on K-means clustering: * https://en.wikipedia.org/wiki/K-means_clustering The following resource is also very informative: * https://www.naftaliharris.com/blog/visualizing-k-means-clustering/ DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a clustering algorithm that uses the maximum distance between points (denoted by ε) as an inclusion criteria to a cluster. Cluster have to contain a minimum number of members (denoted by density) to be considered valid, otherwise no cluster is assigned to a given member. If any members from a given cluster has a distance ε to an outside point, that point will be subsequently included to the cluster. This process runs continuously until all points are evaluated. Scientific reference for this algorithm is found at: Ester, Martin; Kriegel, Hans-Peter; Sander, Jörg; Xu, Xiaowei (1996). Simoudis, Evangelos; Han, Jiawei; Fayyad, Usama M., eds. "A density-based algorithm for discovering clusters in large spatial databases with noise". Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96). AAAI Press. pp. 226–231 Available at: http://www.lsi.upc.edu/~bejar/amlt/material_art/ DM%20clustring%20DBSCAN%20kdd-96.pdf This reference is also useful for understanding how the algorithm works: * https://www.naftaliharris.com/blog/visualizing-dbscan-clustering/
Table to use take point collection from. This parameter may accept dataframes with geographic in the future.
Location identifier from the point table to use. This identifier must be unique to each location.
Geometry column to use in computations.
Method to use in clustring. Please seek each method's reference for algorithmic information on how they work. Current implementations are: *
kmeans: Uses a K-means algorithm to select clusters. This method requires the parameter
dbscan: Uses the DBSCAN algorithm to select clusters. This method requires the parameters
area: Clusters points that are wihin a certain area. Similar to DBSCAN, but without a density value, that is clusters can be of any size. This method requires the parameter
Radius area in km. Area is approximated using a WGS84 degree to meter conversion of (distance_tolerance * 1000) / 111195. This can include a maximum error of ~0.1%.
Minimum number of members that a cluster must have in order to exist. If members of a possible cluster do not meet this criteria, they will not be assigned a cluster. See the
Number of clusters to create with the K-means algorithm.
point_collectionis either 'sites' or 'cells' use this parameter to determing which version of those infrastructure elements to use. If the default None is used the current date will be used.
If used, the a dataframe will be returned with the a convex hull geometry per cluster id alongside its centroid. This can be used in conjuction with the LocationArea() to create new area representations.
If used results will include members that have not been assigned to a cluster. If this parameter is used in conjunction with the
aggregateparameter, elements with no cluster will be ignored. We do not recommend using this in conjunction with that parameter.
The DBSCAN implementation method code has originally been sourced from Dan Baston's website (implementer of the method in PostGIS) -- the K-mean implementation is a derivation of the DBSCAN implementation: * http://www.danbaston.com/posts/2016/06/02/ dbscan-clustering-in-postgis.html The Area method code has originally been drawn from the GISStackExchange page: * https://gis.stackexchange.com/questions/ 11567/spatial-clustering-with-postgis
True is caching is switched on.
Returns the column names.
List of the column names of this query.
Get the column names as a comma separated list
Comma separated list of column names
The set of queries which this one is directly dependent on.
Returns a unique fully qualified name for the query to be stored as under the cache schema, based on a hash of the parameters, class, and subqueries.
String form of the table's fqn
A list of columns to use as indexes when storing this query.
By default, returns the location columns if they are present and self.spatial_unit is defined, and the subscriber column.
daily_location("2016-01-01").index_cols [['name'], '"subscriber"']
True if the table is stored, and False otherwise.
Generate a uniquely identifying hash of this query, based on the parameters of it and the subqueries it is composed of.
query_id hash string
Return the current query state.
The current query state
Return the current query state as a string
The current query state. The possible values are the ones defined in
Returns a uniquename for the query to be stored as, based on a hash of the parameters, class, and subqueries.
String form of the table's fqn