Display options
Share it on

Anal Chem. 2020 Feb 04;92(3):2656-2664. doi: 10.1021/acs.analchem.9b04611. Epub 2020 Jan 13.

Detection of Outliers in Projection-Based Modeling.

Analytical chemistry

Oxana Ye Rodionova, Alexey L Pomerantsev

Affiliations

  1. N. N. Semenov Federal Research Center for Chemical Physics , RAS , Kosygin str. 4 , 119991 Moscow , Russia.

PMID: 31880430 DOI: 10.1021/acs.analchem.9b04611

Abstract

Previously, we have introduced an approach for calculation of the full object distance in the frame of Principal Component Analysis that can be applied to data exploration and classification. Now, a similar approach has been developed for regression problems in which a total distance can be calculated for every sample in projection modeling. Based on the total distance, a threshold for outlier detection has been developed by means of a data-driven estimation of the degrees of freedom and scaling parameters for the partial distances in the projection models. A joint threshold is used as a basis for a sequential outlier detection procedure. The iterative nature of the procedure helps to overcome masking effect in outliers, and a backward step eliminates swamping effects. Two real examples are used for illustration. The first dataset represents capsules filled with specially prepared mixtures of an active pharmaceutical ingredient and a number of excipients. This dataset is used to illustrate the behavior of possible outliers in the regression model and their corresponding locations in the X- and XY-distance plots. The second dataset consists of spectra of 135 whole wheat samples used for the prediction of protein, gluten, and moisture content. This dataset is used for a demonstration of the step-by-step application of the sequential procedure for outlier detection.

Publication Types