statistics
, the science of collecting, analyzing, presenting, and interpreting
data
. Governmental needs for
census
data as well as information about a variety of economic activities provided much of the early
impetus
for the field of statistics. Currently the need to turn the large amounts of data available in many applied fields into useful information has stimulated both theoretical and practical developments in statistics.
Data are the facts and figures that are collected, analyzed, and summarized for presentation and interpretation. Data may be classified as either quantitative or qualitative.
Quantitative data measure either how much or how many of something, and
qualitative data provide labels, or names, for categories of like items. For example, suppose that a particular study is interested in characteristics such as age, gender, marital status, and annual income for a sample of 100 individuals. These characteristics would be called the
variables of the study
, and
data
values for each of the variables would be associated with each individual. Thus, the data values of 28, male, single, and $30,000 would be recorded for a 28-year-old single male with an annual income of $30,000. With 100 individuals and 4 variables, the data set would have 100 × 4 = 400 items. In this example, age and annual income are quantitative variables; the corresponding data values indicate how many years and how much money for each individual. Gender and marital status are qualitative variables. The labels male and female provide the qualitative data for gender, and the labels single, married, divorced, and widowed indicate marital status.
Sample survey methods are used to collect data from observational studies, and experimental design methods are used to collect data from experimental studies. The area of descriptive statistics is concerned primarily with methods of presenting and interpreting data using graphs, tables, and numerical summaries. Whenever statisticians use data from a sample?i.e., a subset of the population?to make statements about a population, they are performing
statistical inference
. Estimation and
hypothesis testing
are procedures used to make statistical
inferences
. Fields such as health care,
biology
,
chemistry
,
physics
, education,
engineering
, business, and
economics
make extensive use of statistical
inference
.
Methods of
probability
were developed initially for the
analysis
of gambling games. Probability plays a key role in statistical inference; it is used to provide measures of the quality and precision of the inferences. Many of the methods of statistical inference are described in this article. Some of these methods are used primarily for single-variable studies, while others, such as
regression
and
correlation
analysis, are used to make inferences about relationships among two or more variables.
Britannica Quiz
Define It: Math Terms
Descriptive statistics are tabular, graphical, and numerical summaries of data. The purpose of descriptive statistics is to
facilitate
the presentation and interpretation of data. Most of the statistical presentations appearing in newspapers and magazines are descriptive in nature. Univariate methods of descriptive statistics use data to
enhance
the understanding of a single variable; multivariate methods focus on using statistics to understand the relationships among two or more variables. To illustrate methods of descriptive statistics, the previous example in which data were collected on the age, gender, marital status, and annual income of 100 individuals will be examined.
Tabular methods
The most commonly used tabular summary of data for a single variable is a
frequency distribution
. A frequency distribution shows the number of data values in each of several nonoverlapping classes. Another tabular summary, called a
relative frequency distribution, shows the fraction, or
percentage
, of data values in each class. The most common tabular summary of data for two variables is a
cross tabulation, a two-variable
analogue
of a frequency distribution.
Get a Britannica Premium subscription and gain access to exclusive content.
Subscribe Now
For a qualitative variable, a frequency distribution shows the number of data values in each qualitative category. For instance, the variable gender has two categories: male and female. Thus, a frequency distribution for gender would have two nonoverlapping classes to show the number of males and females. A relative frequency distribution for this variable would show the fraction of individuals that are male and the fraction of individuals that are female.
Constructing a frequency distribution for a quantitative variable requires more care in defining the classes and the division points between
adjacent
classes. For instance, if the age data of the example above ranged from 22 to 78 years, the following six nonoverlapping classes could be used: 20?29, 30?39, 40?49, 50?59, 60?69, and 70?79. A frequency distribution would show the number of data values in each of these classes, and a relative frequency distribution would show the fraction of data values in each.
A cross tabulation is a two-way table with the rows of the table representing the classes of one variable and the columns of the table representing the classes of another variable. To construct a cross tabulation using the variables gender and age, gender could be shown with two rows, male and female, and age could be shown with six columns corresponding to the age classes 20?29, 30?39, 40?49, 50?59, 60?69, and 70?79. The entry in each cell of the table would specify the number of data values with the gender given by the row heading and the age given by the column heading. Such a cross tabulation could be helpful in understanding the relationship between gender and age.
A number of graphical methods are available for describing data. A
bar graph
is a graphical device for depicting qualitative data that have been summarized in a frequency distribution. Labels for the categories of the qualitative variable are shown on the horizontal axis of the graph. A bar above each label is constructed such that the height of each bar is proportional to the number of data values in the category. A bar graph of the marital status for the 100 individuals in the above example is shown in
Figure 1
. There are 4 bars in the graph, one for each class. A
pie chart
is another graphical device for summarizing qualitative data. The size of each slice of the pie is proportional to the number of data
values
in the corresponding class. A pie chart for the marital status of the 100 individuals is shown in
Figure 2
.
A
histogram
is the most common graphical presentation of
quantitative data
that have been summarized in a frequency distribution. The values of the quantitative variable are shown on the horizontal axis. A rectangle is drawn above each class such that the base of the rectangle is equal to the width of the class interval and its height is proportional to the number of data values in the class.