MODIFICATION OF DBSCAN ALGORITHM USING HYBRID METHODS FOR CLUSTERS BORDER DETECTION TO PROCESS STREAMING DATA
Скачать PDF
Annotation: This article proposes a new approach to solving the clustering problem with cutting off outliers, uninformative
anomalous data and other information noise for streaming data in the feature space of any dimension and with
the memory of all processed data points. To implement this task, an original modification of the DBSCAN
algorithm was developed, using a hybrid approach to finding the boundaries of clusters of arbitrary shape and
determining whether each of the data points is located inside or outside such a boundary. During the
development, both machine learning technologies and mathematical methods were used, in particular, the
method of calculating the convex hull of a finite set of points in the n-dimensional Quickhull space. The resulting
algorithm consists of several blocks that are activated depending on the nature of the distribution of data
received from the input stream. The application of the developed algorithm guarantees the creation of a closed
cluster boundary of arbitrary shape. Using the adaptive frame splitting mechanism, it allows clustering of data
of different dimensions and large volumes, with the memory of all incoming points.As a result, the authors
managed to create a modification of the DBSCAN algorithm for streaming data that is efficient in terms of
execution speed and memory usage. To illustrate the efficiency, gain of the developed algorithm modification in
comparison with the classic DBSCAN variant, a calculated assessment of performance and memory
requirements was carried out. The correctness of the estimates obtained has been confirmed experimentally.
The presented modification of the DBSCAN algorithm for streaming data not only is able to get an overall
performance gain with lower memory requirements compared to the classic DBSCAN algorithm, but also has
functional advantages associated with the ability to work efficiently with streaming data in the presence of
information noise. These advantages make the presented modification of the DBSCAN algorithm useful for
solving complex problems in streaming data processing systems, such as searching for correlations and
anomalies in statistical indicators of distributed data collection systems or for detecting stable states of queuing
models used in logistics and transport.
Keywords: machine learning, unsupervised learning, hybrid algorithm, clustering, DBSCAN, streaming data, convex
hull, density based, cluster borders, adaptive data frame, dealing with noise, performance check
Page numbers: 36-57.
For citation: Mitin G.V., Panov A.V.
Modification of dbscan algorithm using hybrid methods for clusters border detection to process streaming data // Electronic Scientific Journal IT-Standard. – 2023. – No. 4. – pp. 36-57.