DENSITY BASED SINGLE DIMENSION STREAM DATA CLUSTERING
Скачать PDF
Annotation: The article considers an original approach to clustering of single-dimensional streaming data, based on the
principles of density based clustering. This allows to work in conditions of information noise in order to cut off
outliers and uninformative anomalous data. To implement this approach, an algorithm was developed consisting
of several functional blocks and involving the search for single-dimensional cluster boundaries using machine
learning technologies, which effectively uses information about the appearance of new clusters, preserving only
significant data elements, which has a positive effect on the requirements for computing resources.
To further improve the efficiency of the proposed algorithm, an approach of adaptive splitting of data from the
input stream into frames of various sizes with subsequent processing based on a heuristic approach that takes
into account the features of single-dimensional feature space and the cumulative nature of information about the
presence of clusters is applied. The resulting algorithm demonstrates high efficiency in terms of data processing
speed and memory usage. Its computational complexity tends to be linear over time. The authors also managed to
achieve high clustering quality indicators, evaluated according to the criteria of compactness and separability of
clusters, which are universal for any clustering algorithms based on the density of data distribution in the feature
space. These advantages were confirmed by an experiment on 20 sets of test data, the results of which are also
presented in the framework of this work. The presented algorithm occupies a rare niche of algorithms for
clustering streaming data in conditions of information noise, optimized for working with one-dimensional data.
Individually, each of the tasks of clustering streaming data and clustering one-dimensional data has been
considered by the scientific community for quite a long time, however, their totality remains without due
attention, despite the obvious benefits, for example, for solving problems of searching for stable states or clearing
anomalous and noise values when analyzing one-dimensional signals, sensor readings, etc.
Keywords: machine learning, unsupervised learning, clustering, streaming data, information technologies, density based,
single dimension clustering, adaptive data frame, dealing with noise
Page numbers: 18-33.
For citation: Mitin G.V., Panov A.V. Density based single dimension stream data clustering // Electronic Scientific Journal IT-Standard. – 2024. – No. 1. – pp. 18-33.