Does the CES-D measure a continuum from depression to happiness? Comparing substantive and artifactual models

https://doi.org/10.1016/j.psychres.2010.02.003Get rights and content

Abstract

The Center for Epidemiologic Studies-Depression scale (CES-D) is one of the five most frequently used measures of depressive experiences. Previous research has suggested that the scale may consist of two separate factors of happiness and depression, respectively. However, recent methodological research has demonstrated that standard factor analysis cannot be used in this situation to demonstrate such factors are substantive. The substantive factor structure of the CES-D was therefore tested with two samples of younger (N = 8857; age range 27–35) and older (N = 6125; age range 64–65) people. Using a recent correction to CFA, we demonstrate that a two factor structure arises through purely artifactual reasons, and that the CES-D actually has only one substantive factor, providing evidence for a single continuum ranging from happiness to depression.

Introduction

The Center for Epidemiologic Studies-Depression Scale (CES-D) (Radloff, 1977) is one of the five most frequently used self-report measures of depressive experiences (Santor et al., 2006) in psychological and psychiatric researches (Shaver & Brennan, 1990, McDowell & Kristjansson, 1996). The CES-D is a 20-item measure consisting of 16 negatively worded items (e.g., “I felt sad”, “I felt I could not shake the blues even with help from my family or friends”; “I thought my life had been a failure”) and 4 four positively worded items (“I felt happy”; “I enjoyed life”; “I felt that I was just as good as other people”; “I felt hopeful about the future”). Positive items are reverse coded so that scores have a potential range from 0 to 60 (Radloff, 1977), with higher scores indicating greater frequency of depressive experiences.

The CES-D has very strong psychometric properties, showing high convergent validity with both the Beck Depression Inventory (BDI) (r = 0.81) and the Zung measure of depression (r = 0.90), and has high accuracy in detecting depression amongst acute depressives (99% sensitivity), alcoholics (93% sensitivity, 83% specificity), and schizophrenics (93% sensitivity, 86% specificity) (Weissman et al., 1977). The measure is also particularly effective in detecting currently present major depression amongst elderly populations (Beekman et al., 1997). Uniquely amongst the common measures of depression, the CES-D is designed to measure depressive experiences in the general population, and thus conceptualizes depression as a continuum rather than as a dichotomous state (Radloff, 1977).

When studying entire populations, the continuum approach to depression measurement avoids many of the problems associated with restriction of range that occur when using clinical measures in non-clinical samples. This has lead to the CES-D becoming the measure of choice in research programs in social psychological studies and in psychiatry and epidemiology which study well-being in large scale population surveys (for example, the National Longitudinal Surveys administered by the US Bureau of Labor Statistics) (Shaver and Brennan, 1990).

Despite the widespread use of the CES-D, there are unresolved controversies surrounding the implications of including both positive and negative items in the scale. The inclusion of the positive items poses problems of interpretation. It has been suggested (Joseph & Lewis, 1998, Joseph, 2006, Joseph, 2007, Joseph & Wood, in press) that a score of zero does not represent the absence of depression, but rather the presence of happiness. For a score of zero to occur, a person would have to give all of the negative items (e.g., “I felt sad”) the lowest possible score (“rarely of none of the time”), and all of the positive items (e.g., “I felt happy”) the highest possible score (“most or all of the time”). For such a person it would seem misleading to state that they have simply indicated an absence of depressive symptoms; such an individual has also clearly indicated the presence of happiness. Thus the CES-D, as conventionally coded, could be conceptualized as ranging from a positive pole (happiness), through a true zero-point, to a negative pole (depression) (Joseph, 2006, Santor, 2006, Joseph, 2007, Joseph & Wood, in press). While this seems a logical conclusion on the basis of the scales face validity, such a reconceptualization is controversial as it questions decades of research which has treated depression and happiness as separate phenomena and led to two distinct literatures.

Traditionally researchers have questioned the appropriateness of mixing positive and negative items (see Joseph, 2006). Radloff's (1977) original development paper showed that the positive and negative items loaded on separate factors. In a recent review, Shafer (2006) identified 28 studies performing factor analysis of the CES-D between 1977 and 2001 (total N = 22,340) and presented meta-analytical evidence supporting separate positive and negative factors. If the positive and negative items genuinely belonged to different factors, then the normal one-factor coding would seem inappropriate. However, statistical procedures have advanced considerably since the majority of studies reviewed by Shafer were conducted. There is now significant concern about whether factors comprising entirely positive or negative items are genuinely substantive or arise purely as a result of measurement artifact (Schmitt & Stuits, 1985, Marsh, 1986, Marsh, 1996, Woods, 2006).

As already noted, in the usual CES-D coding procedure, the four positive items are reverse coded. Thus potential problems arise when respondents do not fully respond to the change in item direction. At the most extreme, some respondents could not notice that certain items were reversed in content, or have developed a response set where they are rating all items with the same level of agreement. Alternatively, some respondents could show a slight bias where, for example, they were willing to strongly disagree with a negative item but only slightly agree with positive items. Such effects could even be due to “immune neglect”, a phenomenon where people are up to 27% less likely to select items on the far left compared to the far right of a Likert type scale (Nicholls et al., 2006). After reverse coding, differential responding to positive and negative items would result in items which are coded in the same direction being correlated more strongly, biasing subsequent factor analysis. Two Monte Carlo studies have demonstrated that if only 10% of respondents respond carelessly to reverse coded items, then the existence of two factors would be inferred from the normal methods of both exploratory and confirmatory factor analysis (Schmitt & Stuits, 1985, Woods, 2006).

When two factors respectively contain only positive or negative items, the potential substantial importance of each factor is confounded with potential artifactual effects. This observation has lead to a growing consensus amongst methodologists that the normal methods of factor analysis cannot demonstrate the existence of two substantive factors under these conditions (Schmitt & Stuits, 1985, Marsh, 1986, Marsh, 1996, Woods, 2006). Marsh, 1986, Marsh, 1996 suggests two methods for testing whether positive and negative factors are substantive. Traditionally, the substantive importance of positive and negative factors was demonstrated by showing that each factor had different patterns of correlations with other variables. This method is, however, dependant on appropriate variables being selected as outcomes.

As an alternative method, Marsh, 1986, Marsh, 1996 suggested that the different models are directly compared with confirmatory factor analysis (CFA). This approach is illustrated in Fig. 1, for a hypothetical measure with three positive and three negative items. In Model 1, positive and negative items load on a single factor. In Model 2, positive and negative items load on separate factors. In Model 3, each of the positive and negative items load on a single factor, however there are correlated errors between the positive items. Correlated errors represent an additional integrative force between the variables, in addition to the latent factor. Correlated errors would be expected if methodological bias was leading to a difference in responding to the positive items. Marsh, 1986, Marsh, 1996 suggests that the fit of the three models is directly compared with CFA (after adjusting for parsimony).

In this paper we examine the use two strategies to determine whether the positive and negative items form two substantively different factors, as Shafer (2006) suggests, or whether they are actually better conceptualized as one single substantive factor as assumed by the usual method of coding (Radloff, 1977, Joseph, 2006, Joseph, 2007, Joseph & Wood, in press). First, we use the traditional method of testing whether the positive and negative items have different patterns of correlates, with regard to the Big Five and psychological well-being. There is now reasonable consensus that the Big Five represent most of personality at the highest level of abstraction (John and Srivastava, 1999), and that these factors are useful for orienting a scale within a map of personality psychology (Watson et al., 1994). Psychological well-being represents positive well-being, broadly defined, again at a high level of abstraction (Ryff, 1989). The choice of these correlates represents a strategy of selecting variables with breath across personality and well-being. If the positive and negative CES-D factors are substantive, then the factors would be expected to have different patterns of correlations with the Big Five and psychological well-being. Second, we more directly test whether the correlated error model (representing methodological bias) is statistically superior to a two factor model, using the approach of Marsh, 1986, Marsh, 1996.

Section snippets

Older sample

The older sample comprised 6028 people aged between 64 and 65 years from the Wisconsin Longitudinal Study. This sample completed all of the variables reported in this study. These individuals are part of a long-term cohort study which follows up a group of people who graduated from high schools in Wisconsin in 1957. The current data was collected by telephone and mail interviews conducted in 2003–2005. Participants completed the CES-D and measures of the Big Five and psychological well-being.

Confirmatory factor analysis

Covariance structural equation modeling was performed using LISREL. As the items involved ordinal level responses, we used a robust weighted least squares estimation of a polychoric correlation matrix (Joreskog, 1990, Lei, 2009). Three CFA models were tested (conceptually similar to Fig. 1). A single factor model with all items loading on a single factor (Model 1), was compared with a two factor model (with positive and negative items loading on separate factors) (Model 2), and a single factor

Discussion

The results suggest that the CES-D should be conceptualized as having one single underlying factor. Although a substantial number of factor analyses have been conducted on the CES-D (Shafer, 2006), all of the previous work has used standard methods of exploratory and confirmatory analyses. These studies have indicated that the CES-D has separate positive and negative factors. Were these separate factors substantive, this would bring into question the very substantial literature on depression

Acknowledgements

This research uses data from the Wisconsin Longitudinal Study (WLS) of the University of Wisconsin-Madison. Since 1991, the WLS has been supported principally by the National Institute on Aging (AG-9775 and AG-21079), with additional support from the Vilas Estate Trust, the National Science Foundation, the Spencer Foundation, and the Graduate School of the University of Wisconsin-Madison. A public use file of data from the Wisconsin Longitudinal Study is available from the Wisconsin

References (30)

  • C.D. Ryff et al.

    Best news yet on the six-factor model of well-being

    Social Science Research

    (2006)
  • K.W. Springer et al.

    An assessment of the construct validity of ryff's scales of psychological well-being: method, mode, and measurement effects

    Social Science Research

    (2006)
  • C.K. Baker et al.

    NLSY Child Handbook, Rev. Edition: A Guide to the 1986–1990 National Longitudinal Survey of Youth Child Data

    (1993)
  • A.T.F. Beekman et al.

    Criterion validity of the Center for Epidemiologic Studies Depression scale (CES-D): results from a community-based sample of older subjects in the Netherlands

    Psychological Medicine

    (1997)
  • J. Cohen

    A power primer

    Psychological Bulletin

    (1992)
  • O.P. John et al.

    The Big Five trait taxonomy: history, measurement, and theoretical perspectives

  • K.G. Joreskog

    New developments in LISREL — analysis of ordinal variables using polychoric correlations and weighted least-squares

    Quality & Quantity

    (1990)
  • S. Joseph

    Measurement in depression: positive psychology and the statistical bipolarity of depression and happiness

    Measurement

    (2006)
  • S. Joseph

    Is the CES-D a measure of happiness?

    Psychotherapy and Psychosomatics

    (2007)
  • S. Joseph et al.

    The depression–happiness scale: reliability and validity of a bipolar self report scale

    Journal of Clinical Psychology

    (1998)
  • Joseph, S., Wood, A.M., in press. Assessment of positive functioning in clinical psychology: Theoretical and practical...
  • P.W. Lei

    Evaluating estimation methods for ordinal data in structural equation modeling

    Quality & Quantity

    (2009)
  • P.A. Linley et al.

    Positive psychology: past, present, and (possible) future

    The Journal of Positive Psychology

    (2006)
  • H.W. Marsh

    Negative item bias in ratings scales for preadolescent children — a cognitive developmental phenomenon

    Developmental Psychology

    (1986)
  • H.W. Marsh

    Positive and negative global self-esteem: a substantively meaningful distinction or artifactors?

    Journal of Personality and Social Psychology

    (1996)
  • Cited by (93)

    • The psychological toll of food insecurity

      2022, Journal of Economic Behavior and Organization
    • “Development and preliminary validation of an image-based instrument to assess depressive symptoms”

      2019, Psychiatry Research
      Citation Excerpt :

      We expected to find a strong convergent validity between the new measure and a validated instrument to measure depression. Moreover, based on previous studies highlighting the existence of a stronger association of depression with neuroticism when compared with other Big Five traits (Bunevicius et al., 2008; Kendler, and Myers, 2010; Wood et al., 2010), we expected the emoji-based measure to show a moderate positive relation with neuroticism, and weaker relations with the openness, extraversion, agreeableness and consciousness traits. Finally, we developed a regression-based scoring procedure for the emoji items and tested its ability to detect individuals with clinically significant depressive symptoms.

    View all citing articles on Scopus
    View full text