Skip to main content

assumptions of probability test

Assumptions for Statistical Tests As we can see throughout this website, most of the statistical tests we perform are based on a set of assumptions. When these assumptions are violated the results of the analysis can be misleading or completely erroneous. Typical assumptions are:  Normality: Data have a normal distribution (or at least is symmetric)  Homogeneity of variances: Data from multiple groups have the same variance  Linearity: Data have a linear relationship  Independence: Data are independent We explore in detail what it means for data to be normally distributed in Normal Distribution, but in general it means that the graph of the data has the shape of a bell curve. Such data is symmetric around its mean and has kurtosis equal to zero. In Testing for Normality and Symmetry we provide tests to determine whether data meet this assumption. Some tests (e.g. ANOVA) require that the groups of data being studied have the same variance. In Homogeneity of Variances we provide some tests for determining whether groups of data have the same variance. Some tests (e.g. Regression) require that there be a linear correlation between the dependent and independent variables. Generally linearity can be tested graphically using scatter diagrams or via other techniques explored in Correlation, Regression and Multiple Regression. We touch on the notion of independence in Definition 3 of Basic Probability Concepts. In general, data are independent when there is no correlation between them (see Correlation). Many tests require that data be randomly sampled with each data element selected independently of data previously selected. E.g. if we measure the monthly weight of 10 people over the course of 5 months, these 50 observations are not independent since repeated measurements from the same people are not independent. Also the IQ of 20 married couples doesn’t constitute 40 independent observations. Almost all of the most commonly used statistical tests rely of the adherence to some distribution function (such as the normal distribution). Such tests are called parametrictests. Sometimes when one of the key assumptions of such a test is violated, a non-parametric test can be used instead. Such tests don’t rely on a specific probability distribution function (see Non-parametric Tests). Another approach for addressing problems with assumptions is by transforming the data (see Transformations). It’s safe to say that most people who use statistics are more familiar with parametric analyses than nonparametric analyses. Nonparametric tests are also called distribution-free tests because they don’t assume that your data follow a specific distribution. You may have heard that you should use nonparametric tests when your data don’t meet the assumptions of the parametric test, especially the assumption about normally distributed data. That sounds like a nice and straightforward way to choose, but there are additional considerations. In this post, I’ll help you determine when you should use a: • Parametric analysis to test group means. • Nonparametric analysis to test group medians. In particular, I'll focus on an important reason to use nonparametric tests that I don’t think gets mentioned often enough! Hypothesis Tests of the Mean and Median Nonparametric tests are like a parallel universe to parametric tests. The table shows related pairs of hypothesis tests that Minitab statistical software offers. Parametric tests (means) Nonparametric tests (medians) 1-sample t test 1-sample Sign, 1-sample Wilcoxon 2-sample t test Mann-Whitney test One-Way ANOVA Kruskal-Wallis, Mood’s median test Factorial DOE with one factor and one blocking variable Friedman test Reasons to Use Parametric Tests Reason 1: Parametric tests can perform well with skewed and nonnormal distributions This may be a surprise but parametric tests can perform well with continuous data that are nonnormal if you satisfy the sample size guidelines in the table below. These guidelines are based on simulation studies conducted by statisticians here at Minitab. To learn more about these studies, read our Technical Papers. Parametric analyses Sample size guidelines for nonnormal data 1-sample t test Greater than 20 2-sample t test Each group should be greater than 15 One-Way ANOVA • If you have 2-9 groups, each group should be greater than 15. • If you have 10-12 groups, each group should be greater than 20. Reason 2: Parametric tests can perform well when the spread of each group is different While nonparametric tests don’t assume that your data follow a normal distribution, they do have other assumptions that can be hard to meet. For nonparametric tests that compare groups, a common assumption is that the data for all groups must have the same spread (dispersion). If your groups have a different spread, the nonparametric tests might not provide valid results. On the other hand, if you use the 2-sample t test or One-Way ANOVA, you can simply go to the Options subdialog and uncheck Assume equal variances. VoilĂ , you’re good to go even when the groups have different spreads! Reason 3: Statistical power Parametric tests usually have more statistical power than nonparametric tests. Thus, you are more likely to detect a significant effect when one truly exists. Reasons to Use Nonparametric Tests Reason 1: Your area of study is better represented by the median This is my favorite reason to use a nonparametric test and the one that isn’t mentioned often enough! The fact that you can perform a parametric test with nonnormal data doesn’t imply that the mean is the best measure of the central tendency for your data. For example, the center of a skewed distribution, like income, can be better measured by the median where 50% are above the median and 50% are below. If you add a few billionaires to a sample, the mathematical mean increases greatly even though the income for the typical person doesn’t change. When your distribution is skewed enough, the mean is strongly affected by changes far out in the distribution’s tail whereas the median continues to more closely reflect the center of the distribution. For these two distributions, a random sample of 100 from each distribution produces means that are significantly different, but medians that are not significantly different. Two of my colleagues have written excellent blog posts that illustrate this point: • Michelle Paret: Using the Mean in Data Analysis: It’s Not Always a Slam-Dunk • Redouane Kouiden: The Non-parametric Economy: What Does Average Actually Mean? Reason 2: You have a very small sample size If you don’t meet the sample size guidelines for the parametric tests and you are not confident that you have normally distributed data, you should use a nonparametric test. When you have a really small sample, you might not even be able to ascertain the distribution of your data because the distribution tests will lack sufficient power to provide meaningful results. In this scenario, you’re in a tough spot with no valid alternative. Nonparametric tests have less power to begin with and it’s a double whammy when you add a small sample size on top of that! Reason 3: You have ordinal data, ranked data, or outliers that you can’t remove Typical parametric tests can only assess continuous data and the results can be significantly affected by outliers. Conversely, some nonparametric tests can handle ordinal data, ranked data, and not be seriously affected by outliers. Be sure to check the assumptions for the nonparametric test because each one has its own data requirements. If you have Likert data and want to compare two groups, read my post Best Way to Analyze Likert Item Data: Two Sample T-Test versus Mann-Whitney. Closing Thoughts It’s commonly thought that the need to choose between a parametric and nonparametric test occurs when your data fail to meet an assumption of the parametric test. This can be the case when you have both a small sample size and nonnormal data. However, other considerations often play a role because parametric tests can often handle nonnormal data. Conversely, nonparametric tests have strict assumptions that you can’t disregard. The decision often depends on whether the mean or median more accurately represents the center of your data’s distribution. • If the mean accurately represents the center of your distribution and your sample size is large enough, consider a parametric test because they are more powerful. • If the median better represents the center of your distribution, consider the nonparametric test even when you have a large sample. Finally, if you have a very small sample size, you might be stuck using a nonparametric test. Please, collect more data next time if it is at all possible! As you can see, the sample size guidelines aren’t really that large. Your chance of detecting a significant effect when one exists can be very small when you have both a small sample size and you need to use a less efficient nonparametric test! Data Not Normal? Try Letting It Be, with a Nonparametric Hypothesis Test Eston Martz 22 August, 2016 So the data you nurtured, that you worked so hard to format and make useful, failed the normality test. Time to face the truth: despite your best efforts, that data set is never going to measure up to the assumption you may have been trained to fervently look for. Your data's lack of normality seems to make it poorly suited for analysis. Now what? Take it easy. Don't get uptight. Just let your data be what they are, go to the Stat menu in Minitab Statistical Software, and choose "Nonparametrics." If you're stymied by your data's lack of normality, nonparametric statistics might help you find answers. And if the word "nonparametric" looks like five syllables' worth of trouble, don't be intimidated—it's just a big word that usually refers to "tests that don't assume your data follow a normal distribution." In fact, nonparametric statistics don't assume your data follow any distribution at all. The following table lists common parametric tests, their equivalent nonparametric tests, and the main characteristics of each. Nonparametric analyses free your data from the straitjacket of the normality assumption. So choosing a nonparametric analysis is sort of like removing your data from a stifling, conformist environment, and putting it into a judgment-free, groovy idyll, where your data set can just be what it is, with no hassles about its unique and beautiful shape. How cool is that, man? Can you dig it? Of course, it's not quite that carefree. Just like the 1960s encompassed both Woodstock and Altamont, so nonparametric tests offer both compelling advantages and serious limitations. Advantages of Nonparametric Tests Both parametric and nonparametric tests draw inferences about populations based on samples, but parametric tests focus on sample parameters like the mean and the standard deviation, and make various assumptions about your data—for example, that it follows a normal distribution, and that samples include a minimum number of data points. In contrast, nonparametric tests are unaffected by the distribution of your data. Nonparametric tests also accommodate many conditions that parametric tests do not handle, including small sample sizes, ordered outcomes, and outliers. Consequently, they can be used in a wider range of situations and with more types of data than traditional parametric tests. Many people also feel that nonparametric analyses are more intuitive. Drawbacks of Nonparametric Tests But nonparametric tests are not completely free from assumptions—they do require data to be an independent random sample, for example. And nonparametric tests aren't a cure-all. For starters, they typically have less statistical power than parametric equivalents. Power is the probability that you will correctly reject the null hypothesis when it is false. That means you have an increased chance making a Type II error with these tests. In practical terms, that means nonparametric tests are less likely to detect an effect or association when one really exists. So if you want to draw conclusions with the same confidence level you'd get using an equivalent parametric test, you will need larger sample sizes. Nonparametric tests are not a one-size-fits-all solution for non-normal data, but they can yield good answers in situations that parametric statistics just won't work. Is Parametric or Nonparametric the Right Choice for You? I've briefly outlined differences between parametric and nonparametric hypothesis tests, looked at which tests are equivalent, and considered some of their advantages and disadvantages. If you're waiting for me to tell you which direction you should choose...well, all I can say is, "It depends..." But I can give you some established rules of thumb to consider when you're looking at the specifics of your situation. Keep in mind that nonnormal data does not immediately disqualify your data for a parametric test. What's your sample size? As long as a certain minimum sample size is met, most parametric tests will be robust to the normality assumption. For example, the Assistant in Minitab (which uses Welch's t-test) points out that while the 2-sample t-test is based on the assumption that the data are normally distributed, this assumption is not critical when the sample sizes are at least 15. And Bonnett's 2-sample standard deviation test performs well for nonnormal data even when sample sizes are as small as 20. In addition, while they may not require normal data, many nonparametric tests have other assumptions that you can’t disregard. For example, the Kruskal-Wallis test assumes your samples come from populations that have similar shapes and equal variances. And the 1-sample Wilcoxon test does not assume a particular population distribution, but it does assume the distribution is symmetrical. In most cases, your choice between parametric and nonparametric tests ultimately comes down to sample size, and whether the center of your data's distribution is better reflected by the mean or the median. • If the mean accurately represents the center of your distribution and your sample size is large enough, a parametric test offers you better accuracy and more power. • If your sample size is small, you'll likely need to go with a nonparametric test. But if the median better represents the center of your distribution, a nonparametric test may be a better option even for a large sample Comparison Chart BASIS FOR COMPARISON PARAMETRIC TEST NONPARAMETRIC TEST Meaning A statistical test, in which specific assumptions are made about the population parameter is known as parametric test. A statistical test used in the case of non-metric independent variables, is called non-parametric test. Basis of test statistic Distribution Arbitrary Measurement level Interval or ratio Nominal or ordinal Measure of central tendency Mean Median Information about population Completely known Unavailable Applicability Variables Variables and Attributes Correlation test Pearson Spearman Comparison Chart BASIS FOR COMPARISON PARAMETRIC TEST NONPARAMETRIC TEST Meaning A statistical test, in which specific assumptions are made about the population parameter is known as parametric test. A statistical test used in the case of non-metric independent variables, is called non-parametric test. Basis of test statistic Distribution Arbitrary Measurement level Interval or ratio Nominal or ordinal Measure of central tendency Mean Median Information about population Completely known Unavailable Applicability Variables Variables and Attributes Correlation test Pearson Spearman Definition of Parametric Test The parametric test is the hypothesis test which provides generalisations for making statements about the mean of the parent population. A t-test based on Student’s t-statistic, which is often used in this regard. The t-statistic rests on the underlying assumption that there is the normal distribution of variable and the mean in known or assumed to be known. The population variance is calculated for the sample. It is assumed that the variables of interest, in the population are measured on an interval scale. Definition of Nonparametric Test The nonparametric test is defined as the hypothesis test which is not based on underlying assumptions, i.e. it does not require population’s distribution to be denoted by specific parameters. The test is mainly based on differences in medians. Hence, it is alternately known as the distribution-free test. The test assumes that the variables are measured on a nominal or ordinal level. It is used when the independent variables are non-metric. Key Differences Between Parametric and Nonparametric Tests The fundamental differences between parametric and nonparametric test are discussed in the following points: 1. A statistical test, in which specific assumptions are made about the population parameter is known as the parametric test. A statistical test used in the case of non-metric independent variables is called nonparametric test. 2. In the parametric test, the test statistic is based on distribution. On the other hand, the test statistic is arbitrary in the case of the nonparametric test. 3. In the parametric test, it is assumed that the measurement of variables of interest is done on interval or ratio level. As opposed to the nonparametric test, wherein the variable of interest are measured on nominal or ordinal scale. 4. In general, the measure of central tendency in the parametric test is mean, while in the case of the nonparametric test is median. 5. In the parametric test, there is complete information about the population. Conversely, in the nonparametric test, there is no information about the population. 6. The applicability of parametric test is for variables only, whereas nonparametric test applies to both variables and attributes. 7. For measuring the degree of association between two quantitative variables, Pearson’s coefficient of correlation is used in the parametric test, while spearman’s rank correlation is used in the nonparametric test. Hypothesis Tests Hierarchy Equivalent Tests PARAMETRIC TEST NON-PARAMETRIC TEST Independent Sample t Test Mann-Whitney test Paired samples t test Wilcoxon signed Rank test One way Analysis of Variance (ANOVA) Kruskal Wallis Test One way repeated measures Analysis of Variance Friedman's ANOVA Conclusion To make a choice between parametric and the nonparametric test is not easy for a researcher conducting statistical analysis. For performing hypothesis, if the information about the population is completely known, by way of parameters, then the test is said to be parametric test whereas, if there is no knowledge about population and it is needed to test the hypothesis on population, then the test conducted is considered as the nonparametric test Read more: http://keydifferences.com/difference-between-parametric-and-nonparametric-test.html#ixzz4y7JhptCc

Comments