A CLASSIFICATION OF OUTLIERS IN TRANSFORMED VARIABLES
Abstract
The diagnostic of outliers is very essential since of their responsibility for producing large interpretative problems in linear regression analysis and nonlinear regression analysis. There has been a lot of work accomplished in identifying outliers in linear but not in nonlinear regression. In practice, it is often the case that the assumption of linear regression is violated, such as when highly influential outliers exist in the dataset, which will adversely impact the validity of the statistical analysis. Finding outliers is important because they are responsible for invalid inferences and inaccurate predictions as they have a larger impact on the computed values of various estimations. The outliers must be divided into vertical outliers (VO), good leverage points (GLP), and bad leverage points (BLP) since only the vertical outliers and bad leverage have an undue effect on parameter estimations. We compare several outlier detection techniques using a robust diagnostic plot to correctly classify good and bad leverage points and vertical outliers, by decreasing both masking and swamping effects for both the untransformed variables and transformed variables. The main idea is to detect of outliers before transformation (original data) and after transformation. The results of generation study and numerical indicate that modified generalized DIFFITS (different of fit) against the Diagnostic Robust Generalized Potential (MGDFF-DRGP) successfully detect outliers in the data
Downloads
References
B. P. Durbin, J. S. Hardin, D. M. Hawkins, and D. M. Rocke, “A variance-stabilizing transformation for gene-expression microarray data,” Bioinformatics, vol. 18, no. suppl_1, pp. S105–S110, 2002.
P. J. Bickel and K. A. Doksum, “An analysis of transformations revisited,” J. Am. Stat. Assoc., vol. 76, no. 374, pp. 296–311, 1981.
I. Yeo and R. A. Johnson, “A new family of power transformations to improve normality or symmetry,” Biometrika, vol. 87, no. 4, pp. 954–959, 2000.
S. Chatterjee and A. S. Hadi, “Influential observations, high leverage points, and outliers in linear regression,” Stat. Sci., vol. 1, no. 3, pp. 379–393, 1986.
P. J. Rousseeuw and B. C. Van Zomeren, “Unmasking multivariate outliers and leverage points,” J. Am. Stat. Assoc., vol. 85, no. 411, pp. 633–639, 1990.
A. Bagheri and H. Midi, “Diagnostic plot for the identification of high leverage collinearity-influential observations,” SORT-Statistics Oper. Res. Trans., pp. 51–70, 2015.
M. R. Norazan, H. Midi, and A. Imon, “Estimating regression coefficients using weighted bootstrap with probability,” WSEAS Trans. Math., vol. 8, no. 7, pp. 362–371, 2009.
D. A. Belsley, E. Kuh, and R. E. Welsch, Regression diagnostics: Identifying influential data and sources of collinearity. John Wiley & Sons, 2005.
P. Rousseeuw and A. Leroy, “Robust regression and outlier detection: Wiley Interscience,” New York, 1987.
A. C. Atkinson, “Fast very robust methods for the detection of multiple outliers,” J. Am. Stat. Assoc., vol. 89, no. 428, pp. 1329–1339, 1994.
A. H. M. Rahmatullah Imon, “Identifying multiple influential observations in linear regression,” J. Appl. Stat., vol. 32, no. 9, pp. 929–946, 2005.
G. Pison and S. Van Aelst, “Diagnostic plots for robust multivariate methods,” J. Comput. Graph. Stat., vol. 13, no. 2, pp. 310–329, 2004.
R. T. Ahmad and S. S. Ismaeel, “A Nonlinear Transformation Methods Using Covid-19 Data in the Kurdistan Region,” in 2022 International Conference on Computer Science and Software Engineering (CSASE), 2022, pp. 207–211.
M. S. Bartlett, “The use of transformations,” Biometrics, vol. 3, no. 1, pp. 39–52, 1947.
G. E. P. Box and D. R. Cox, “An analysis of transformations,” J. R. Stat. Soc. Ser. B, vol. 26, no. 2, pp. 211–243, 1964.
W. N. Venables and B. D. Ripley, “Modern applied statistics with S. 4th Springer,” New York, vol. 118, 2002.
J. Fox and S. Weisberg, “An R companion to applied regression. Sage,” Thousand Oaks, 2011.
S. S. Shapiro and M. B. Wilk, “An analysis of variance test for normality (complete samples),” Biometrika, vol. 52, no. 3/4, pp. 591–611, 1965.
T. S. Breusch and A. R. Pagan, “A simple test for heteroscedasticity and random coefficient variation,” Econom. J. Econom. Soc., pp. 1287–1294, 1979.
Z. Yang, “A modified family of power transformations,” Econ. Lett., vol. 92, no. 1, pp. 14–19, 2006.
B. F. J. Manly, “Exponential data transformations,” J. R. Stat. Soc. Ser. D (The Stat., vol. 25, no. 1, pp. 37–42, 1976.
A. C. Atkinson, “Plots, transformations and regression; an introduction to graphical methods of diagnostic regression analysis,” 1985.
A. Hossein Riazoshams, B. Midi Habshah, and C. Mohamad Bakri Adam, “On the outlier detection in nonlinear regression,” World Acad. Sci. Eng. Technol., vol. 36, no. 12, pp. 264–270, 2009.
A. S. Hadi, “A new measure of overall potential influence in linear regression,” Comput. Stat. Data Anal., vol. 14, no. 1, pp. 1–27, 1992.
A. Bagheri and H. Midi, “On the performance of the measure for diagnosing multiple high leverage collinearity-reducing observations,” Math. Probl. Eng., vol. 2012, 2012.
A. M. Leroy and P. J. Rousseeuw, “Robust regression and outlier detection,” Wiley Ser. Probab. Math. Stat., 1987.
P. J. Rousseeuw, “Multivariate estimation with high breakdown point,” Math. Stat. Appl., vol. 8, no. 283–297, p. 37, 1985.
M. Habshah, M. R. Norazan, and A. H. M. Rahmatullah Imon, “The performance of diagnostic-robust generalized potentials for the identification of multiple high leverage points in linear regression,” J. Appl. Stat., vol. 36, no. 5, pp. 507–520, 2009.
M. Alguraibawi, H. Midi, and A. H. M. Imon, “A new robust diagnostic plot for classifying good and bad high leverage points in a multiple linear regression model,” Math. Probl. Eng., vol. 2015, 2015.
H. Midi, M. Sani, S. S. Ismaeel, and J. Arasan, “Fast Improvised Influential Distance for the Identification of Influential Observations in Multiple Linear Regression,” Sains Malaysiana, vol. 50, no. 7, pp. 2085–2094, 2021.
G. Manimannan, M. Salomi, R. L. Priya, and R. Saranraj, “Detecting Outliers using R Package in Fitting Data with Linear and Nonlinear Regression Models,” Int. J. Sci. Innov. Math. Res., vol. 8, no. 4, pp. 1–13, 2020, doi: 10.20431/2347-3142.0804001.
“COVID-19: Dashboard - GOV.KRD.” https://gov.krd/coronavirus-en/dashboard/ (accessed Jul. 07, 2022).
R. Taha and S. Saied, “General Letters in Mathematics ( GLM ) Estimating Regression Coefficients using Bootstrap with application to Covid-19 Data,” vol. 12, no. 2, pp. 96–104, 2022, doi: 10.31559/glm2022.12.2.6.
It is the policy of the Journal of Duhok University to own the copyright of the technical contributions. It publishes and facilitates the appropriate re-utilize of the published materials by others. Photocopying is permitted with credit and referring to the source for individuals use.
Copyright © 2017. All Rights Reserved.



