Authors :
Sanjana Rajamani; Seena Thomas
Volume/Issue :
Volume 8 - 2023, Issue 10 - October
Google Scholar :
https://tinyurl.com/267js72n
Scribd :
https://tinyurl.com/y6r3c2w4
DOI :
https://doi.org/10.5281/zenodo.10053050
Abstract :
It is an essential part of research to find ways
to impute the missing values in a data set. The
missingness is unavoidable as it could be due to
natural or non-natural reasons. Missing information is
inevitable in longitudinal or multilevel studies, and can
result in biased estimates, loss of power, variability and
inaccuracy in results. For this study a complete data
which showed the resistance scores of intellectually
disabled children on giving behavioral skilltraining was
considered in order to compare the variousimputation
techniques. The secondary data collected was
longitudinal in nature. The resistance score was noted
beforethe training and at four different time points after
the training. A random missingness was created under
varying percentages in the complete data (5%, 10%,
15%, 20%, 30%) using the MAR mechanism. The
obtained values after imputation were compared with
full data using a linear mixed model. Various models
built under the multiple imputation and machine
learning techniques for imputing different features
which are used to predict the resistance score, using the
coefficients taken from the real data and the same
mechanism was implemented for simulated data as well.
The methods based on machine learning techniques were
the most suited for the imputation of missing values and
led to a significant enhancement of prognosis accuracy
when compared to multiple imputation techniques using
linear mixed models.
Keywords :
Multiple Imputation, MAR Mechanisms, Machine Learning Techniques, Linear Mixed Effect Model.
It is an essential part of research to find ways
to impute the missing values in a data set. The
missingness is unavoidable as it could be due to
natural or non-natural reasons. Missing information is
inevitable in longitudinal or multilevel studies, and can
result in biased estimates, loss of power, variability and
inaccuracy in results. For this study a complete data
which showed the resistance scores of intellectually
disabled children on giving behavioral skilltraining was
considered in order to compare the variousimputation
techniques. The secondary data collected was
longitudinal in nature. The resistance score was noted
beforethe training and at four different time points after
the training. A random missingness was created under
varying percentages in the complete data (5%, 10%,
15%, 20%, 30%) using the MAR mechanism. The
obtained values after imputation were compared with
full data using a linear mixed model. Various models
built under the multiple imputation and machine
learning techniques for imputing different features
which are used to predict the resistance score, using the
coefficients taken from the real data and the same
mechanism was implemented for simulated data as well.
The methods based on machine learning techniques were
the most suited for the imputation of missing values and
led to a significant enhancement of prognosis accuracy
when compared to multiple imputation techniques using
linear mixed models.
Keywords :
Multiple Imputation, MAR Mechanisms, Machine Learning Techniques, Linear Mixed Effect Model.