Missing data
There are several ways to deal with missing data, including imputation or removal. Handle missing data in Excel using the XLSTAT add-on statistical software.
Which are the types of missing values?
There are three types of missing values (Allison, 2001): data missing completely at random (MCAR), data missing at random (MAR) and data not missing at random (NMAR).
Data is missing completely at random (MCAR) if the event that leads to a missing data is independent of observable variables and of unobservable parameters. It should occur entirely at random. When data are MCAR, the analyses performed on the data are unbiased.
Data is missing at random (MAR) when the event that leads to a missing data is related to a particular variable, but it is not related to the value of the variable that has missing data. This is the most common case.
Data is not missing at random (NMAR) when data is missing for a particular reason. An example of this is the filtered questions in a questionnaire (the question is only intended for some respondents, the others are missing).
Options for handling missing values with XLSTAT
Imputation methods
Most XLSTAT functions (anova, pca, regression, etc) include options to handle missing data. However, only few approaches are available. This tool allows you to complete or clean your dataset using advanced missing value treatment methods.
The methods available in this tool correspond to the MCAR and MAR cases.
Different methods are available depending on your needs and data:
For quantitative data, XLSTAT allows you to:
Remove observations with missing values.
Use a mean imputation method.
Use a nearest neighbor approach.
Replace missing values by a given numeric value.
Use the NIPALS algorithm.
Use an MCMC multiple imputation algorithm.
Use the EM (Expectation Maximization) algorithm for data following a multivariate normal distribution.
For qualitative data, XLSTAT allows you to:
Remove the observations with missing value.
Use a mode imputation method.
Use a nearest neighbor approach.
Replace missing values by a given textual value.
Use the NIPALS algorithm.
Multiple Correspondence Analysis for missing data
The MCA results option in the Missing data dialog box helps you better understand the patterns of missing values within a data set. To accomplish this, a multiple correspondence analysis (MCA) is performed.
In the output sheet, a factorial map is displayed that illustrates the links between variables with missing data and those without missing data. For each variable, modality '0' represents the present data while modality '1' models the missing data.