In medical research, social science, and biology, a cross-sectional study (also known as a cross-sectional analysis, transverse study, prevalence study) is a type of observational study that analyzes data from a population, or a representative subset, at a specific point in time—that is, cross-sectional data.
In economics, cross-sectional studies typically involve the use of cross-sectional regression, in order to sort out the existence and magnitude of causal effects of one independent variable upon a dependent variable of interest at a given point in time. They differ from time series analysis, in which the behavior of one or more economic aggregates is traced through time.
In medical research, cross-sectional studies differ from case-control studies in that they aim to provide data on the entire population under study, whereas case-control studies typically include only individuals who have developed a specific condition and compare them with a matched sample, often a tiny minority, of the rest of the population. Cross-sectional studies are descriptive studies (neither longitudinal nor experimental). Unlike case-control studies, they can be used to describe, not only the odds ratio, but also absolute risks and relative risks from prevalences (sometimes called prevalence risk ratio, or PRR). They may be used to describe some feature of the population, such as prevalence of an illness, but cannot prove cause and effect. Longitudinal studies differ from both in making a series of observations more than once on members of the study population over a period of time.
Cross-sectional studies involve data collected at a defined time. They are often used to assess the prevalence of acute or chronic conditions, but cannot be used to answer questions about the causes of disease or the results of intervention. Cross-sectional data cannot be used to infer causality because temporality is not known. They may also be described as censuses. Cross-sectional studies may involve special data collection, including questions about the past, but they often rely on data originally collected for other purposes. They are moderately expensive, and are not suitable for the study of rare diseases. Difficulty in recalling past events may also contribute bias.
The use of routinely collected data allows large cross-sectional studies to be made at little or no expense. This is a major advantage over other forms of epidemiological study. A natural progression has been suggested from cheap cross-sectional studies of routinely collected data which suggest hypotheses, to case-control studies testing them more specifically, then to cohort studies and trials which cost much more and take much longer, but may give stronger evidence. In a cross-sectional survey, a specific group is looked at to see if an activity, say alcohol consumption, is related to the health effect being investigated, say cirrhosis of the liver. If alcohol use is correlated with cirrhosis of the liver, this would support the hypothesis that alcohol use may be associated with cirrhosis.
Routine data may not be designed to answer the specific question.
Routinely collected data does not normally describe which variable is the cause and which the effect. Cross-sectional studies using data originally collected for other purposes are often unable to include data on confounding factors, other variables that affect the relationship between the putative cause and effect. For example, data only on present alcohol consumption and cirrhosis would not allow the role of past alcohol use, or of other causes, to be explored. Cross-sectional studies are very susceptible to recall bias.
Most case-control studies collect specifically designed data on all participants, including data fields designed to allow the hypothesis of interest to be tested. However, in issues where strong personal feelings may be involved, specific questions may be a source of bias. For example, past alcohol consumption may be incorrectly reported by an individual wishing to reduce their personal feelings of guilt. Such bias may be less in routinely collected statistics, or effectively eliminated if the observations are made by third parties, for example taxation records of alcohol by area.
Weaknesses of aggregated data
Cross-sectional studies can contain individual-level data (one record per individual, for example, in national health surveys). However, in modern epidemiology it may be impossible to survey the entire population of interest, so cross-sectional studies often involve secondary analysis of data collected for another purpose. In many such cases, no individual records are available to the researcher, and group-level information must be used. Major sources of such data are often large institutions like the Census Bureau or the Centers for Disease Control in the United States. Recent census data is not provided on individuals, for example in the UK individual census data is released only after a century. Instead data is aggregated, usually by administrative area. Inferences about individuals based on aggregate data are weakened by the ecological fallacy. Also consider the potential for committing the “atomistic fallacy” where assumptions about aggregated counts are made based on the aggregation of individual level data (such as averaging census tracts to calculate a county average). For example, it might be true that there is no correlation between infant mortality and family income at the city level, while still being true that there is a strong relationship between infant mortality and family income at the individual level. All aggregate statistics are subject to compositional effects, so that what matters is not only the individual-level relationship between income and infant mortality, but also the proportions of low, middle, and high income individuals in each city. Because case-control studies are usually based on individual-level data, they do not have this problem.
In economics, cross-sectional analysis has the advantage of avoiding various complicating aspects of the use of data drawn from various points in time, such as serial correlation of residuals. It also has the advantage that the data analysis itself does not need an assumption that the nature of the relationships between variables is stable over time, though this comes at the cost of requiring caution if the results for one time period are to be assumed valid at some different point in time.
An example of cross-sectional analysis in economics is the regression of money demand—the amounts that various people hold in highly liquid financial assets—at a particular time upon their income, total financial wealth, and various demographic factors. Each data point is for a particular individual or family, and the regression is conducted on a statistical sample drawn at one point in time from the entire population of individuals or families. In contrast, an intertemporal analysis of money demand would use data on an entire country’s holdings of money at each of various points in time, and would regress that on contemporaneous (or near-contemporaneous) income, total financial wealth, and some measure of interest rates. The cross-sectional study has the advantage that it can investigate the effects of various demographic factors (age, for example) on individual differences; but it has the disadvantage that it cannot find the effect of interest rates on money demand, because in the cross-sectional study at a particular point in time all observed units are faced with the same current level of interest rates.
- ^Schmidt, CO; Kohlmann, T (2008). “When to use the odds ratio or the relative risk?”. International Journal of Public Health. 53 (3): 165–167. doi:10.1007/s00038-008-7068-3. PMID 19127890.
- ^Lee, James (1994). “Odds Ratio or Relative Risk for Cross-Sectional Data?”. International Journal of Epidemiology. 23 (1): 201–3. doi:10.1093/ije/23.1.201. PMID 8194918.