Scientific journal

ISSN 1814-2400

INFORMATION SCIENCE AND CONTROL SYSTEMS

Lapko A. V., Lapko V. A.

NONPARAMETRIC PATTERN RECOGNITION ALGORITHM IN TESTING HYPOTHESES ABOUT THE IDENTITY OF THE LAWS OF MULTDIMENSIONAL RANDOM VARIABLES DISTRIBUTION UNDER LARGE VOLUME OF STATISTICAL DATA

A new method for testing the hypotheses about distribution of large volume of multidimensional statistical data is considered. The possibility of replacing the testing of hypothesis of the identity of the two laws of multidimensional random variables distributions with testing the hypothesis of the pattern recognition error being equal to 0.5 is substantiated. To test this hypothesis, we use the method of confidence estimation of the probability of error in pattern recognition or the Kolmogorov criterion. The learning sample is formed on the basis of statistical data of the distribution laws compared. In conditions of large volumes of statistical data, the synthesis of a nonparametric pattern recognition algorithm is based on regression estimates of the probability densities of the random variables distribution in classes. The proposed pattern recognition algorithms reduce the size of the learn sample by decomposing the random variables value area. The method of selecting optimal parameters for the decomposition of the independent random variables value area is considered. The closest analog of the proposed approach is the Pearson criterion. The efficiency of the obtained results is confirmed by their application in the analysis of remote sensing data.

Keywords: statistical hypotheses testing, multidimensional random variables, nonparametric pattern recognition algorithm, kernel probability density estimation, discretization of the random variables value area, Pearson's criterion