Annotation

MODIFICATION OF DBSCAN ALGORITHM USING HYBRID METHODS FOR CLUSTERS BORDER DETECTION TO PROCESS STREAMING DATA
Скачать PDF
Annotation: This article proposes a new approach to solving the clustering problem with cutting off outliers, uninformative anomalous data and other information noise for streaming data in the feature space of any dimension and with the memory of all processed data points. To implement this task, an original modification of the DBSCAN algorithm was developed, using a hybrid approach to finding the boundaries of clusters of arbitrary shape and determining whether each of the data points is located inside or outside such a boundary. During the development, both machine learning technologies and mathematical methods were used, in particular, the method of calculating the convex hull of a finite set of points in the n-dimensional Quickhull space. The resulting algorithm consists of several blocks that are activated depending on the nature of the distribution of data received from the input stream. The application of the developed algorithm guarantees the creation of a closed cluster boundary of arbitrary shape. Using the adaptive frame splitting mechanism, it allows clustering of data of different dimensions and large volumes, with the memory of all incoming points.As a result, the authors managed to create a modification of the DBSCAN algorithm for streaming data that is efficient in terms of execution speed and memory usage. To illustrate the efficiency, gain of the developed algorithm modification in comparison with the classic DBSCAN variant, a calculated assessment of performance and memory requirements was carried out. The correctness of the estimates obtained has been confirmed experimentally. The presented modification of the DBSCAN algorithm for streaming data not only is able to get an overall performance gain with lower memory requirements compared to the classic DBSCAN algorithm, but also has functional advantages associated with the ability to work efficiently with streaming data in the presence of information noise. These advantages make the presented modification of the DBSCAN algorithm useful for solving complex problems in streaming data processing systems, such as searching for correlations and anomalies in statistical indicators of distributed data collection systems or for detecting stable states of queuing models used in logistics and transport.
Page numbers: 36-57.
For citation: Mitin G.V., Panov A.V. Modification of dbscan algorithm using hybrid methods for clusters border detection to process streaming data // Electronic Scientific Journal IT-Standard. – 2023. – No. 4. – pp. 36-57.