SCENARIOS STUDY OF DATA PREPROCESSING AND EMBEDDING
INITIALIZATION IN IMPLEMENTATION OF THE PACMAP ALGORITHM
Скачать PDF
Annotation: Objectives. This paper examines the problem of nonlinear data dimensionality reduction using the PaCMAP algorithm. The goal of the study is to explore the different scenarios of data preprocessing and embedding initialization when implementing the PaCMAP algorithm. Methods. The basic version of the PaCMAP algorithm uses the PCA algorithm, a linear dimensionality reduction algorithm, for data preprocessing and embedding initialization. This paper examines and explores various scenarios of data preprocessing and embedding initialization using 11 linear and nonlinear dimensionality reduction algorithms within the PaCMAP algorithm, in terms of loss function minimization. Results. Experimental studies on the test and real-world datasets demonstrate the advantages of several dimensionality reduction algorithms when included in the scenarios of data preprocessing and embedding initialization compared to the PCA algorithm. The best results (in terms of loss function minimization) on the examined datasets were obtained, in particular, using the UMAP, MSD, and SE algorithms. However, the use of the MSD algorithm within the PaCMAP algorithm is accompanied by significant time costs. Conclusions. A number of linear and nonlinear dimensionality reduction algorithms offer advantages (in terms of loss function minimization) over the PCA algorithm when included in the scenarios of data preprocessing and embedding initialization of the PaCMAP algorithm. Using the PCA algorithm within the PaCMAP algorithm minimizes its implementation time, while using the MSD algorithm within the PaCMAP algorithm results in the maximum implementation time.
Keywords: dimensionality reduction algorithm, PaCMAP, dataset, visualization, data preprocessing, embedding initialization, loss function.
Page numbers: 102-123.
For citation: Andrianova E.G., Demidov N.A. Scenarios study of data preprocessing and embedding
initialization in implementation of the pacmap algorithm // Electronic Scientific Journal IT-Standard. – 2025. – No. 4. – pp. 102-123.