Describing Variables and Making Comparisons
After a concept is operationalized and the data is put into a variable, it is important that we be able to describe that variable to other people. To accomplish this, we use several measures that we call summary statistics, or univariate measures. There are different summary statistics for each type of variable: interval and categorical. Additionally, there are different summary statistics for both types of categorical variables: ordinal and nominal. Stata 8 is a very useful tool for gathering summary statistics, the key is knowing what statistics to get for each type of variable.
An interval, or continuous, variable is one in which the data is not placed into groups. Rather it is a scale, so it makes sense to say that a value of 60 is twice that of 30, or that 20 is half of 40. The distance between values in an interval variable is exactly the same for each gap (1 year, pound, dollar, etc). Examples of an interval variable can include age, weight, income, temperature, and so on.
A categorical, or discrete, variable is one in which the data are placed into groups. There are two types of categorical variables: nominal and ordinal. Nominal variables are ones where the data are placed into groups based on name alone, there is no order to them. Thus, a question asking a person's gender places them into a nominal group - either male or female. It doesn't matter if the male is coded with a 1 or if the female is, because the order of the groups doesn't matter.
However, in ordinal variables the order does matter. Political ideology is a common ordinal variable. Its groups are: 1) strong liberal 2) weak liberal 3) moderate 4) weak conservative 5) strong conservative. There is an order to these groups; as the group numbers go up, the respondent gets more conservative. However, the difference in the groups is not set; the difference between a weak liberal (2) and a moderate (3) is not equal to the difference between a weak conservative (4) and a strong conservative (5). Nor does it make sense to say that a weak conservative (4) is twice as conservative as a weak liberal (2).
For each variable, you will want to report its "average" value; in statistics, the concept of an average value is referred to as a variable's central tendency. You will also want to mention the dispersion, or spread, of a variable. Additionally, you should include the number of cases, maximum value, and minimum value for each variable.
For a nominal variable, the central tendency is the variable's mode, or the category with the highest number of cases in it. The dispersion isn't given by a single number, rather you can describe the dispersion by graphing the variable and seeing if each category has an equal number of cases (fully dispersed), if all the data is concentrated in one group (no dispersion), or somewhere in between.
For an ordinal variable, the central tendency is given by the median, the point where half of the data is above it and half of the data is below it. See pages 9-10 of Module 1 to find the median. The dispersion of an ordinal variable is the same as that of a nominal one.
The central tendency of an interval variable can be given by either
its mean or median. The standard deviation is the measure of an interval
variable's spread. This is a number which says that two-thirds of the
data will fall within one standard deviation of the mean and ninety-five
percent will fall within two standard deviations of the mean. Thus, a
larger standard deviation means more dispersion for the variable.
-Randy Owen