Question

1 Approved Answer

Posted on Oct 13, 2024

The Journal of Forensic Psychiatry & Psychology Vol. 21, No. 1, February 2010, 1-22 RESEARCH ARTICLE Condence and accuracy in assessments of short-term risks presented

The Journal of Forensic Psychiatry & Psychology Vol. 21, No. 1, February 2010, 1-22 RESEARCH ARTICLE Condence and accuracy in assessments of short-term risks presented by forensic psychiatric patients Sarah L. Desmaraisa,c*, Tonia L. Nichollsa,b,d, J. Don Readb and Johann Brinka,d,e a BC Mental Health & Addiction Services, Forensic Psychiatric Hospital, British Columbia, Canada; bDepartment of Psychology, Simon Fraser University, British Columbia, Canada; cSchool of Population and Public Health, University of British Columbia, Canada; dDepartment of Psychiatry, University of British Columbia, Canada; eSchool of Criminology, Simon Fraser University, British Columbia, Canada (Received 30 January 2009; nal version received 13 July 2009) Forensic mental health professionals are asked to estimate with appropriate condence the likelihood of adverse outcomes. But what is an 'appropriate' level of condence? We examined this question in the context of short-term assessments of risk for violence, suicide, self-harm, and unauthorized leave. Using the Short-Term Assessment of Risk and Treatability (START), treatment team members (n 23) completed 331 assessments of 137 forensic psychiatric patients appearing before the British Columbia Review Board over a six-month period. Assessors additionally indicated condence in the accuracy of their risk assessments. Clinical-legal outcome data were collected prospectively for one year using a modied version of the Overt Aggression Scale (OAS). Overall, assessors were highly condent in the accuracy of their assessments; however, analyses revealed few dierences in accuracy as a function of condence. When signicant dierences were observed, higher condence was associated with lower predictive accuracy. Findings suggest that assessors may benet from feedback regarding predictive validity of past assessments and speak to the importance of comprehensive and ongoing training in risk assessment. Keywords: structured professional judgment; condence; predictive validity; violence risk assessment; forensic psychiatric patients; START As a matter of daily practice, clinicians working in forensic settings must make decisions regarding the risks presented by forensic psychiatric patients and how to best manage these risks. During the process of assessing risk, clinicians consider the available information to determine the best course of action (Nicholls, Desmarais, Douglas, & Kropp, 2006). However, as in any *Corresponding author. Email: sarah.desmarais@ubc.ca ISSN 1478-9949 print/ISSN 1478-9957 online 2010 Taylor & Francis DOI: 10.1080/14789940903183932 http://www.informaworld.com 2 S.L. Desmarais et al. decision context, their assessments may reect information relevant to the specic case, but also could be inuenced by biases or heuristics (Kahneman & Tversky, 1973). These guidelines or 'rules of thumb' develop over time, based on personal knowledge and experience. Although such heuristics improve decision-making eciency because they decrease the amount of eort needed, they also contribute to judgment errors. In fact, assessments of violence risk appear to be particularly vulnerable to errors resulting from biases and heuristics, including the availability heuristic, diagnostic overshadowing, hindsight bias, and conrmatory bias (Elbogen, 2002; Grove & Meehl, 1996; Nicholls, Brink, Desmarais, Webster, & Martin, 2006). Studied extensively in human decision-making research (Harvey, 1997), the condence bias has received very little attention in the eld of violence risk assessment. Given the potential severity of risk assessment errors, miscalibration between assessors' condence in the accuracy of their decisions is an important area of study. This is the focus of the present paper. The relationship between assessor condence and predictive accuracy is integral to violence risk assessment and management. A mismatch between actual accuracy and assessor condence may result in serious threats to both public safety and the patient's well-being. In particular, research demonstrates strong associations between clinicians' assessments and tribunal decisions (Hilton & Simmons, 2001; Whittemore, 1999). Thus, overcondence regarding estimates of dangerousness may misdirect the decision-making body (Smith & Dumont, 1997). In the case of overestimated risk (false positive), we may see unwarranted limitation of the patient's civil liberties. In the case of underestimated risk (false negative), the safety of the community may be jeopardized. With respect to short-term risk in inpatient settings, an erroneous yet highly condent overestimate of risk may result in unjustied restriction of a patient's movement or access to programs. Alternatively, a highly condent underestimation of risk may result in failure to implement management strategies and, ultimately, serious injury to sta, co-patients, or the individual patient. Assessor condence also may aect whether risk management strategies are put into place (cf. Rabinowitz & Garelik-Wyler, 1999). Garb (1986), for example, argued that 'The greater the degree of condence a clinician has in a judgment, the more heavily that judgment will be weighed when the clinician makes a treatment decision' (p. 194). In general, research suggests that condence is not a good indication of judgment accuracy. Instead, there is a human tendency to be overcondent in the accuracy of our decisions (Harvey, 1997). Research in the area of clinical decision-making oers similar conclusions (Arkes, 1981; Faust & Ziskin, 1988; Smith & Dumont, 1997). In their review of clinical psychological judgment, Wedding and Faust (1989) noted a lack of evidence that accuracy is related to experience, expertise, or condence in the correctness of predictions. Additionally, although condence increases with experience, accuracy does not (Dawson et al., 1993). High condence The Journal of Forensic Psychiatry & Psychology 3 actually appears to be associated with reduced accuracy of clinical predictions (Arkes, 1981; Faust, 1986; Ruscio, 2000). By extension, we may anticipate a similar mismatch between condence and accuracy in the context of violence risk assessment (Faust & Ziskin, 1988). The ndings of the few studies conducted in the context of violence risk assessment, however, suggest that condence is associated with increases in predictive accuracy. Specically, when dierences are observed, results indicate that assessments are more prone to errors when clinicians are not condent in the accuracy of their assessment (McNiel, Sandberg, & Binder, 1998; Douglas & Oglo, 2003). Even so, conclusions have been mixed regarding the nature of the accuracy-condence relationship. For instance, although both studies identied increases in assessment accuracy as a function of assessor condence, McNiel et al. concluded that condence was an important moderator of predictive validity, driving the extent to which risk judgments are associated with the outcome of interest (i.e., violence). In contrast, Douglas and Oglo concluded that condence looks to be an important mediator of risk assessment accuracy, explaining how or why risk assessments predict violence, rather than when. Moreover, the authors of a third study concluded that condence is not related to risk assessment accuracy (Rabinowitz & Garelik-Wyler, 1999). In the following sections, we review the extant research in more detail. McNiel et al. (1998) conducted the rst empirical examination of accuracy and condence in the context of violence risk assessment. Specically, the authors examined the relationship between condence and accuracy in clinical assessments of psychiatric patients' short-term violence risk. In the course of routine clinical care, 78 physicians rated the level of violence potential posed by 317 civil psychiatric patients (164 males, 153 females) during their rst seven days on a short-term locked inpatient psychiatric unit, from 0% very low risk to 100% very high risk. Clinicians additionally indicated condence in each risk judgment from 0% not at all condent to 100% absolutely certain. Coded by nurses using the Overt Aggression Scale (OAS) (Yudofsky, Silver, Jackson, Endicott, & William, 1986), inpatient violence was used as the outcome criterion. Results of logistic regressions demonstrated that the strength of association between predicted and actual behavior increased as condence increased (low condence: b .01, p 4 .05; moderate: b .03, p 5 .05; high: b .08, p 5 .001). These associations, however, were uniformly weak. Nonetheless, McNiel and colleagues concluded that accuracy improved with increased condence, asserting that 'predictive validity of clinical estimates of patients' risk of violence varies depending on the condence that the clinicians have in their evaluations' (p. 665). Soon thereafter, Rabinowitz and Garelik-Wyler (1999) published results of a study which appear to contradict McNiel et al.'s conclusion. In this study, 13 psychiatric residents assessed the violence risk presented by 99 civil 4 S.L. Desmarais et al. psychiatric patients (60 males, 39 females) during their hospital stay. In addition to rating violence risk on a 5-point scale from 0% to 100%, the residents indicated condence in their risk judgments from 'Not at all certain' to 'Very certain', again on a 5-point scale. Predictive accuracy was measured as a function of inpatient violence, coded based on nursing summary reports completed at the end of each shift. In general, predictive accuracy was low and failed to reach signicance (total predictive value of 61%, k 0.18). Results also showed that the residents were considerably less condent when predicting violence compared to non-violence during follow-up: They were certain/very certain in 29% of the cases in which they predicted violence compared to 54% of the cases in which they predicted non-violence. Total predictive value ranged from 61-2% for patients for whom assessors were certain/very certain (n 41) and relatively certain (n 37) to 54% for patients for whom assessments were not at all certain/ not very certain (n 11), representing a non-signicant dierence. Despite this positive trend, the authors concluded that predictive accuracy was not associated with assessors' condence in their specic risk judgments. The McNiel et al. (1998) and Rabinowitz and Garelik-Wyler (1999) studies examined the relationship between condence and accuracy in unstructured clinical predictions. Over the past 30 years, this approach has come to be seen as highly prone to errors (Ennis & Litwack, 1974; Faust, 1986; Grove & Meehl, 1996). As a result, considerable eort has been devoted to identifying extraneous factors that might inuence predictive accuracy and developing methods which attempt to minimize their inuence. First, we saw the development of actuarial approaches, in which assessors estimate risk based on statistical formulas or decision-making trees (Borum, 1996). This was soon followed by the development of structured professional judgment (SPJ) schemes, in which assessors consider a minimum number of theoretically and empirically identied variables to inform their nal clinical risk judgment (Douglas & Kropp, 2002). Although proponents of actuarial and SPJ schemes agree on their superiority over unstructured clinical judgment, the debate continues regarding which structured approach yields the greatest predictive accuracy (Hilton, Harris, & Rice, 2006). Of relevance to the current research, these approaches may be dierentially aected by condence bias. In fact, one of the proposed advantages of the actuarial approach is its 'imperviousness' to human decision-making errors, including the inuence of heuristics and biases (Douglas & Oglo, 2003). To test this proposition, Douglas and Oglo (2003) examined whether condence is dierentially associated with accuracy as a function of assessment approach. Specically, Douglas and Oglo (2003) investigated the relationship between condence and predictive accuracy in actuarial and SPJ risk assessments. Two graduate students in clinical psychology completed the HCR-20 violence risk assessment scheme (Webster, Douglas, Eaves, & Hart, 1997) through le review for 100 forensic psychiatric The Journal of Forensic Psychiatry & Psychology 5 patients. For each assessment they additionally indicated agreement (from 1 to 10) with the following statement: 'The rater has a feeling of certainty or reliance or trust about the correctness of the rating'. Coded from criminal and hospital records, community violence was used as the outcome criterion. The predictive accuracy of both the HCR-20 total scores and summary judgments was higher among assessments for which assessors expressed condence above the median (high condence group) compared to those for which condence was at or below the median (low condence group). The point biserial correlations for assessments in the low condence group were small and non-signicant (.03-.14, p's 4 .05). In comparison, assessments in the high condence group demonstrated moderate to large and signicant correlations (.43-.62, p's 5 .01). The range of areas under the curves (AUCs) of receiver operating characteristic (ROC) curves provided further evidence of the superiority of assessments in the high condence group: Assessments in the low condence group did not signicantly improve upon chance accuracy (i.e., AUC .50) across categories of violence (AUCs .52-.63, p's 4 .05). Those in the high condence group demonstrated signicant improvements over chance (AUCs .82-.86, p's 5 .05). Although not reported in detail, Douglas and Oglo (2003) additionally collected data on the perceived quality of the information upon which assessors based their risk assessments. Finding a very high correlation between condence and quality (4.90), the authors suggested that quality of assessment data may be one of the factors inuencing condence ratings and recommended re-evaluation of assessment material if assessors have low condence in their assessment accuracy. The present study The eld of violence risk assessment is experiencing a zeitgeist shift (Webster, Nicholls, Martin, Desmarais, & Brink, 2006). Specically, experts have identied the potential benets of considering dynamic as well as protective factors in the process of violence risk assessment (Douglas & Skeem, 2005; Dvoskin & Heilbrun, 2001; Oglo & Daern, 2006; Quinsey, Jones, Book, & Barr, 2006; Rogers, 2000). Professional organizations and regulatory bodies also have advocated such considerations (e.g., American Psychological Association, 2006). Because heuristics and biases are contextbound, these new task characteristics may aect the relationship between accuracy and condence (Dunning, Grin, Milojkovic, & Ross, 1990; Kahneman & Tversky, 1982). Accordingly, we sought to re-examine the accuracy-condence relationship in structured violence risk assessments informed by client strengths and dynamic factors. We additionally wanted to address limitations of previous investigations of accuracy and condence in violence risk assessment. Douglas and Oglo (2003) had just two assessors (graduate students) who completed assessments based on le 6 S.L. Desmarais et al. reviews alone. Conclusions would be strengthened with: (1) a larger number of assessors, (2) assessors who more closely resembled true clinical practitioners, and (3) assessors who had the opportunity to interview the patient. The Rabinowitz and Garelik-Wyler (1999) and McNiel et al. (1998) studies were limited to the extent that they examined unstructured clinical judgment. Although unstructured approaches to short-term risk assessment remain the most prevalent method, structured approaches are becoming increasingly common (Webster, Martin, Brink, Nicholls, & Desmarais, 2009). With these issues in mind, we examined predictive accuracy and condence in assessments of short-term risk for adverse outcomes frequently encountered by clinicians practicing in forensic settings (i.e., violence, suicide, self-harm, and unauthorized absence). We specically focused on risk assessments conducted by forensic mental health professionals using the Short-Term Assessment of Risk and Treatability (START; Webster et al., 2004; Webster, Martin, Brink, Nicholls, & Desmarais, 2009). Developed by a multidisciplinary team of psychologists, psychiatrists, and nurses, the START is a 20-item SPJ guide for the assessment of short-term risk (i.e., over weeks to months) across seven domains: violence to others, suicide, self-harm, victimization, substance use, unauthorized absences, and selfneglect. The approach represents an advance in risk assessment because it provides for the dierential coding of dynamic risk and protective factors (Nicholls et al., 2006; Webster et al., 2006). A relatively new guide, emerging research demonstrates that the instrument has good structural (Desmarais, Nicholls, & Brink, 2008; Nicholls et al., 2006) and interrater reliability (Nicholls et al., 2006; Nonstad et al., under review; Wilson, Desmarais, Nicholls, & Brink, under review; but see Crocker et al., n.d.) and validity in predicting physical aggression (Crocker et al., n.d.; Nicholls et al., 2006; Nonstad et al., under review, Wilson et al., under review). Research has also demonstrated signicant associations between START scores, tribunal hearing outcomes (Brink & Livingston, 2004), and security level (Desmarais, Hucker, Brink, & De Freitas, 2008; Nicholls, Webster, Brink, & Martin, 2008). To date, approximately 2000 English manuals have been distributed worldwide and implementation is underway in at least eight countries. Method Subjects Treatment team members (nine psychiatrists, eight senior nurses, and six social workers) completed 331 risk assessments for 137 forensic psychiatric patients (122 men, 15 women) who appeared before the Review Board in British Columbia, Canada over a six-month period. All patients were under the authority of this tribunal as a result of being found Not Criminally Responsible on account of Mental Disorder (NCRMD; i.e., Canada's The Journal of Forensic Psychiatry & Psychology 7 insanity defense; see Desmarais, Hucker, Brink, & De Freitas, 2008). If a patient appeared before the Review Board multiple times during the study period, we used the assessments from the hearing for which the greatest number of team members had returned completed forms. If the number of forms returned across hearings was equal, we used the assessments from the most recent hearing. Of the 331 returned assessments, 93 did not include condence ratings. As a result, the nal sample comprised 237 START assessments and condence ratings for 119 forensic psychiatric patients (107 males, 12 females). It was our hope to obtain three independent STARTs for each patient (i.e., one completed by each treatment member). The overall completion rate by treatment team members was 71% (83% by psychiatrists, 70% by nurses, and 59% by social workers), resulting in two risk assessments and condence ratings returned per patient, on average. In general, the patients for whom assessments are included in this study were male (90%) with primary diagnoses of schizophrenia spectrum disorders (85%) who often had comorbid substance use disorders (58%). The average patient age was 38.58 years (SD 11.44) and index oenses were primarily assaults/violence against person(s) (64%). The mean length of stay in the hospital prior to the Review Board hearings was 1124.64 days (SD 1912.93; range 24-14502). Following the hearings, 81 (69%) patients remained in custody, 31 (27%) received conditional discharges (i.e., released to the community with conditions), and four (3%) received absolute discharges (i.e., released to the community without conditions). Measures Short-Term Assessment of Risk and Treatability (START) START assessments were conducted as part of a pilot implementation of the scheme within the British Columbia Forensic Psychiatric Hospital. At the time of the study, the START used a continuous 6-point scale ('' at one pole indicating considerable strengths and '777' at the other pole indicating considerable vulnerabilities) and included only four risk domains: violence to others, self-harm, suicide, and unauthorized leave. Clinicians rated each of the 20 items based on the patient's functioning over the past two to three months. They were also encouraged to complete a careful assessment of historical factors known to be associated with risk for harm to self and others. After coding each item and considering the historical factors, the clinicians estimated risk as low, moderate, or high under hypothesized conditions (i.e., in the absence of supervision) for each of the four risk domains. Total scores reported herein are the sum of the individual 20 item ratings and range from 0 to 120, where higher scores reect greater risk. 8 S.L. Desmarais et al. Reliability and validity analyses of the START based on the present sample are described at length elsewhere (Nicholls et al., 2006). Briey, mean START total scores for psychiatrists' assessments (M 82.17; SD 11.92) were slightly higher than for those completed by nurses (M 76.94; SD 12.23) and social workers (M 76.03; SD 15.08). However, pairwise comparisons indicated that START total scores did not vary signicantly as a function of professional group, p's 4 .05. An interrater reliability analysis revealed excellent agreement between the three assessor professions: ICC2 .87, p 5 .001. Internal consistency of START was adequate overall (a .87) and within rater discipline (psychiatry: a .80; nursing: a .88; and social work: a .92). Mean START scores of patients who aggressed against others were signicantly higher than for those patients who remained incident-free during the follow-up period: any aggression to others (M 75.66 vs 65.86, p .001), verbal aggression (M 75.86 vs 66.82, p .001), aggression against objects (M 77.90 vs 68.00, p .001), physical aggression against others (M 76.32 vs 68.25, p .001), violence against others (M 81.82 vs 69.12, p .001), and sexual aggression (M 80.63 70.24, p .05). Mean START scores also diered signicantly between patients who aggressed against themselves and those who did not (t 2.61, p .01). With the exception of unauthorized leave, START total scores generally were found to predict the outcome behaviors at rates greater than chance (e.g., physical aggression: rpb .23, p 5 .001; AUC .65, CI95% .57-.72, p 5 .001). START Outcome Scale (SOS; Nicholls, Gagnon, Crocker, Brink, Desmarais, & Webster, 2007) Outcome data were collected using a scale developed by modifying the OAS (Yudofsky et al., 1986). The OAS measures observable aggressive or violent behavior in four dierent categories: (1) verbal aggression, (2) physical aggression against objects, (3) physical aggression against self, and (4) physical aggression against other people. Behaviors in each category are rated according to severity on a 4-point scale from least (1) to most severe (4). With documented validity and reliability, the OAS is one of the most commonly used instruments for tracking outcomes in studies of psychiatric patients (Silver & Yudofsky, 1991; Yudofsky et al., 1986). Leaving the original OAS items unchanged (with the exception of coding verbal aggression toward others and verbal threats to harm self separately), the remaining risk categories on the START were added to create the START Outcome Scale (SOS) (e.g., self-neglect, unauthorized leave). Because of low base rates, we expanded the categories of suicide and unauthorized leave to include suicide attempts and attempted unauthorized leave. In this study, interrater reliability for the SOS was adequate: ICC2 0.70, p 5 .001. The Journal of Forensic Psychiatry & Psychology 9 Procedure Assessments Following training workshops by one of the START authors, members of each treatment team (one psychiatrist, one nurse, and one social worker) were asked to complete the START on all of their patients scheduled to appear before the Review Board over a six-month period. STARTs were completed independently by the treatment team members prior to the hearing date. Ratings reected information gathered through treatment team meetings, discussions with the patients, and le review as part of routine preparation for the Review Board hearings. Clinicians were asked to estimate the likelihood that the adverse outcomes would occur during the follow-up period under the hypothesized condition that patients would be released from the hospital without supervision following the hearings. Condence ratings Treatment team members additionally were asked to provide global ratings of condence in the predictive accuracy of their START assessments. They indicated agreement on a 5-point scale from 1 (strongly disagree) to 5 (strongly agree) with the following statement: 'I am condent in the accuracy of my assessment'. For the purpose of analyses, START assessments were divided into two condence groups using a median split: (1) assessments for which clinicians indicated low/moderate condence (ratings below the median value of 4.00); and (2) those for which clinicians indicated high condence (ratings at or above the median value of 4.00). We grouped the assessments in this manner in order to be as consistent as possible with past research (Douglas & Oglo, 2003; McNiel et al., 1998), while still allowing for meaningful comparisons (i.e., sucient number of assessments per group to allow for statistical analyses). Follow-up data collection Outcome data were collected for up to one year (M 43.28 weeks, SD 17.50; Mdn 52 weeks; range 1-52 weeks) following the date of the Review Board hearing from clinical-legal les at the forensic hospital and community clinics using the SOS. Due to the broad range of follow-up periods, we present results for the full sample as well as a subsample of patients who remained in-hospital throughout the duration of the study and, therefore, for whom the follow-up period was the same (i.e., 52 weeks). Doing so decreased the likelihood of false negatives and allowed us to control for patient setting. In total, there were 89 START assessments completed for these 45 inpatients, with approximately two assessments returned per inpatient (M 1.98, SD 0.66). Similar to the full sample, 10 S.L. Desmarais et al. these patients typically were men (96%) with schizophrenia spectrum disorders (86%) and comorbid substance use disorders (56%) who were 39.66 years of age (SD 11.75) at the time of the tribunal hearing. Blind to the START evaluations, a research assistant reviewed social, psychiatric, psychological and legal reports, criminal records, and nursing notes to code the SOS. Patients included in the follow-up were under the care of our forensic service and, consequently, any new charges or convictions were reported to their treatment teams and available on le. Although coded on a 4-point severity scale, we collapsed the SOS categories such that outcome was measured dichotomously with regard to the presence or absence of each category. During follow-up, 53 (44%) patients engaged in aggression of some form. Forty-nine (41%) engaged in verbal aggression, 27 (23%) in physical aggression against objects, 32 (27%) in physical aggression toward others, and 11 (9%) in self-harming behaviors. Unauthorized leaves were attempted or successfully taken by 19 (16%) patients. None of the patients included in this study attempted or completed suicide during the follow-up period. Results In the following sections, we rst present analyses treating assessments as the unit of analysis, examining individual assessors' condence ratings for each START assessment. We then present analyses treating patients as the unit of analysis (i.e., one averaged condence rating and one averaged START score per patient). Within each of these sets of analyses, we examined the relationship between condence and assessment accuracy in a variety of ways. We calculated both point biserial correlations and AUCs to measure the strength of associations between the continuous predictor variables (START assessments) and dichotomous outcomes (SOS behaviors). To evaluate assessment accuracy as a function of condence, we used Z-statistics to compare eect sizes between the condence groups (Hanley & McNeil, 1983; McNeil & Hanley, 1984). We additionally examined whether the association between predictive validity and condence varied as a function of patient gender and length of stay. Our nal approach was to test whether condence acted as a covariate in the prediction of adverse outcomes. Assessments as the unit of analysis For both the full and inpatient samples, assessors were condent in the predictive accuracy of their assessments overall. A overwhelming majority of ratings were on the 'condent' end of the scale. The average condence rating was 3.84 in the full sample (SD 0.46, Mdn 4.00, Mode 4.00, range 2-5) and 3.83 in the inpatient sample (SD 0.46; Mdn 4.00; The Journal of Forensic Psychiatry & Psychology 11 Mode 4.00; range 2-5). No signicant dierences were observed in mean condence ratings as a function of assessor profession in either sample (full: F[1, 234] 0.92, p .40, Zp2 .01; inpatient: F[1, 86] 1.17, p .31, Zp2 .03). For assessments in the high condence group, the mean condence rating was 4.04 (SD 0.15) in the full sample and 4.03 (SD 0.16) in the inpatient sample. Ratings of 4 out of 5 were most common in both samples (full: 96%; inpatient: 97%). In comparison, the mean condence rating for assessments in the low/moderate condence group was 2.98 (SD 0.20) and 2.94 (SD 0.25) for the full and inpatient samples, respectively. Ratings of 3 out of 5 were most common (full: 98%; inpatient: 94%). Condence group means were signicantly dierent in both samples, t(235) 33.49, p 5 .001, d 6.00, and t(87) 21.69, p 5 .001, d 5.19. Although slightly higher START total scores were observed for assessments in the high condence group (full: M 70.10, SD 17.05 vs M 68.78, SD 15.04; inpatient: M 78.98, SD 12.24 vs M 77.27, SD 15.72), the dierences were not statistically signicant, t(235) 0.48, p .63, d 0.08, and t(87) 0.48, p .63, d 0.12. Table 1 presents the point biserial correlations between START total scores and the outcome behaviors as a function of condence. For assessments in the low/moderate condence group, correlations generally were signicant and moderate to large in magnitude (Cohen, 1988). In contrast, eects in the high condence group were small to moderate in size. Despite the greater number of assessments and increased power, they also were less likely to be signicant. For the most part, comparisons demonstrated that dierences between condence groups were not statistically signicant (see Table 1). The low/moderate condence group, however, did demonstrate signicantly better accuracy in predicting verbal aggression. We then examined the AUC values for START scores predicting the SOS outcome behaviors (see Table 2). In general, results replicate the ndings of the correlational analyses, but also highlight a couple of interesting dierences. For the categories of any aggression, verbal aggression toward others, and physical aggression toward objects, predictive accuracy was greater for the low/moderate condence group, but not for physical aggression toward others. In the full sample, accuracy in predicting physical aggression toward others was somewhat higher in the high condence group. Yet, as shown in Table 2, the dierence was not statistically signicant. To explore reasons for these ndings, we examined whether the association between predictive validity and assessor condence varied as a function of length of stay and, perhaps, reected practitioner familiarity with the patient. Analyses did not support this explanation. Correlations between condence ratings and length of stay were small and non-signicant (n 73) .20 .21 .16 .24* .16 .14 (n 16) .16* .71** .54* .55* .45 - 237 89 Full sample Any aggression Verbal aggression - others Physical aggression - objects Physical aggression - others Self-harm Unauthorized leave Inpatient subsample Any aggression Verbal aggression - others Physical aggression - objects Physical aggression - others Self-harm Unauthorized leave Note: *p 5 .05; **p 5 .01; ***p 5 .001; - statistic could not be calculated. (n 192) .30*** .25*** .25*** .19* .10 .11 (n 45) .49*** .55*** .45** .39** .40** .06 N High condence rpb Low/moderate condence rpb 70.14 2.23* 1.47 1.24 1.11 - 1.33 2.13* 1.34 1.29 1.90 70.30 Comparison Z Assessments as unit of analysis 45 119 N (n 15) .66** .74** .54* .53* .31 - (n 38) .49** .52*** .39* .38* .42** .09 (n 31) .23 .24 .07 .20 .13 -.14 (n 81) .33** .28** .24* .21 .01 .05 High condence rpb 1.56 1.98* 1.50 1.09 0.53 - 0.95 1.42 0.82 0.92 2.15* 0.20 Comparison Z Patients as unit of analysis Low/moderate condence rpb Correlations between START total scores and outcomes as a function of condence. Outcome behaviors Table 1. 12 S.L. Desmarais et al. (n 73) .61 .62 .60 .64 .62 .38 (n 16) .87* .90** .81* .83** .82 .58 237 89 Full sample Any aggression Verbal aggression - others Physical aggression - objects Physical aggression - others Self-harm Unauthorized leave Inpatient subsample Any aggression Verbal aggression - others Physical aggression - objects Physical aggression - others Self-harm Unauthorized leave Note: *p 5 .05; **p 5 .01; ***p 5 .001; - statistic could not be calculated. (n 192) .68*** .66*** .69*** .73** .61 .58 (n 45) .79*** .83*** .78** .63** .87* .58 N High condence AUC Low/moderate condence AUC 2.23* 2.59** 1.57 1.45 1.38 - 1.32 2.08* 0.09 1.06 2.63** 0.08 Comparison Z Assessments as unit of analysis 45 119 N (n 15) .88* .94** .90** .83* .76 - (n 38) .83*** .85*** .80** .77** .88** .57 (n 31) .60 .61 .56 .60 .63 .37 (n 81) .70** .68** .70** .65* .57 .54 High condence AUC 1.86 2.63** 2.37* 1.49 0.68 - 1.31 1.88 0.85 1.03 2.11* 0.20 Comparison Z Patients as unit of analysis Low/moderate condence AUC AUC values for START total scores predicting outcomes as a function of condence. Outcome behaviors Table 2. The Journal of Forensic Psychiatry & Psychology 13 14 S.L. Desmarais et al. for both samples (full: r .10, p .12; inpatient: r .03, p .71). ANOVAs showed no signicant dierences in length of stay as a function of condence (full: F[1, 235] 0.42, p .52, Zp2 .00; inpatient: F[1, 87] 0.02, p .88, Zp2 .00). We additionally examined whether condence varied as a function of patient gender. Again, analyses failed to demonstrate signicant dierences in condence ratings for assessments of male and female patients (full: M 3.84, SD 0.46 vs M 3.80, SD 0.50; inpatient: M 3.84, SD 0.46 vs M 3.75, SD 0.50), F(1, 235) 0.21, p .65, Zp2 .00, and F(1, 87) 0.13, p .72, Zp2 .00. We conducted further analyses to test condence as a covariate in the model of START assessments predicting adverse outcomes. Logistic regression models generally failed to support condence as a predictor of accuracy. Specically, condence was not found to be a signicant predictor of any aggression (full: b 70.26, p .40; inpatient: b 70.03, p .96), verbal aggression (full: b 70.09, p .77; inpatient: b 0.34, p .51), physical aggression toward objects (full: b 0.01, p .97; inpatient: b 0.02, p .97), self-harm (full: b 0.02, p .96; inpatient: b 70.25, p .70), or unauthorized leave (full: b 0.36, p .39; inpatient: b 1.26, p .26). The model approached signicance for the prediction of aggression toward others in the full sample (full: b 70.58 p .07; inpatient: b 70.36, p .49). Patients as the unit of analysis Treating patients as the unit of analysis, we found very similar results to what we found in the analyses of assessments as the unit of analysis. For patients in the high condence group, the average condence rating was 4.04 (SD 0.16) in the full sample and 4.02 (SD 0.09) in the inpatient sample. In comparison, the average condence rating for assessments in the low/ moderate condence group was 3.42 (SD 0.24) and 3.38 (SD 0.26) for the full and inpatient samples, respectively. Again, these group means were signicantly dierent, t(117) 16.63, p 5 .001, d 3.04, and t(43) 12.31, p 5 .001, d 3.29. We observed no signicant dierences in START scores between the condence groups (full: t[117] 0.86, p .39, d 0.17; inpatient: t[43] 0.40, p .69, d 0.13). For the low/moderate condence group, we found signicant correlations, moderate to large in magnitude (see Table 1). Eects for assessments in the high condence group were small to moderate in size. The low/ moderate condence group demonstrated signicantly stronger associations with both self-harm and verbal aggression than did the high condence group. Results of ROC analyses echo these ndings (see Table 2). Predictive accuracy was greater for the low/moderate condence group for self-harm and verbal aggression, as well as physical aggression against objects. As before, we found no evidence that condence was associated with length of The Journal of Forensic Psychiatry & Psychology 15 stay or patient gender. Correlations between condence ratings and length of stay were not signicant (full: r .11, p .24; inpatient: r .04, p .79), nor were ANOVAs comparing mean length of stay as a function of condence (full: F[1, 117] 1.40, p .24, Zp2 .01; inpatient: F[1, 43] 0.03, p .87, Zp2 .00) and condence as a function of patient gender (full: F[1, 117] 0.00, p .98, Zp2 .00; inpatient: F[1, 43] 0.01, p .95, Zp2 .00). The nal set of logistic regression analyses oered no further evidence of condence as a predictor of accuracy. Across samples, assessment condence did not contribute signicantly to the prediction of any aggression (full: b 70.50, p .39; inpatient: b 70.05, p .96), verbal aggression (full: b 70.23, p .79; inpatient: b 0.51, p .60), physical aggression toward objects (full: b 0.35, p .60; inpatient: b 0.03, p .98), or self-harm (full: b 70.26, p .77; inpatient: b 71.17, p .33). In the full sample, the model approached signicance for the prediction of physical aggression toward others (full: b 71.08, p .08; inpatient: b 70.96, p .32) and unauthorized leave (full: b 1.57, p .07; inpatient: b 2.75, p .28). Discussion This study examined the accuracy-condence relationship in structured assessments of risk for adverse outcomes commonly required of clinicians in forensic practice. Overall, the condence with which clinicians estimated short-term risk was high. Condence ratings were higher than might be anticipated given that these assessments required consideration of factors for which there may be less information on le (i.e., dynamic factors and client strengths). Though, it should be noted that: (1) these were actual mental health assessments conducted as part of clinical practice (i.e., le review in conjunction with patient interviews and treatment); and (2) our other research suggests that dynamic variables and protective factors can be coded from le with high interrater reliability (Nicholls et al., 2006; Wilson et al., under review). Such short-term evaluations of risk are made by forensic mental health professionals regularly in the course of daily practice. The START assessments may simply reect a more structured and formalized reporting of these evaluations. Also, the patients generally were well known by the treatment team members. It is possible that general familiarity with the patient and assessment processes may have overshadowed true condence in the specic START assessment. Findings suggest that the association between condence and accuracy for assessments of short-term risk is minimal when completed using an SPJ approach. Specically, analyses comparing START assessments as a function of condence failed to produce signicant results. We also found 16 S.L. Desmarais et al. relatively few signicant dierences in predictive accuracy. These results are consistent with Rabinowitz and Garelik-Wyler's (1999) ndings. Where dierences were observed, our results suggested an overcondence bias: predictive validity was greater for assessments in the low/moderate condence group. In only two instances were trends in the opposite direction observed: (1) the any aggression correlation for assessments in the low/ moderate condence group (.16) was lower than observed in the high condence group (.20); and (2) the physical aggression against others AUC value for assessments in the low/moderate condence group was lower (.63) than observed in the high condence group (.73). Although these dierences did not reach statistical signicance, the pattern is consistent with the ndings of Douglas and Oglo (2003) and McNiel et al. (1998). These results may reect the importance of specic domains in daily risk management. In other words, the consequences of physical aggression probably gure more prominently in clinical risk assessment and management. The ndings also may indicate clinicians' increased familiarity and practice in the formal prediction of violence compared to the other risk domains. Finally, the improved calibration between condence and accuracy might be evidence of clinicians' knowledge regarding the base rates of violence. Some research suggests that people are overcondent in dicult tasks and undercondent in tasks that are comparatively easy (e.g., Arkes, Christensen, Lai, & Blumer, 1987; Pulford, & Colman, 1997). The overcondence observed in the present study suggests that the task of assessing risk is (appropriately) perceived as a dicult one. Our ndings also may provide insight regarding perceived diculty of predicting dierent adverse outcomes. Specically, the risk domains for which we observed overcondence may represent the more dicult assessment tasks. On the one hand, results may be interpreted as indicating that prediction of verbal aggression is the most dicult task. However, this is not our interpretation of the ndings. Instead, we believe that results reect the way in which clinical practice is conducted and the priorities of forensic care. Verbal aggression usually has minimal implications for the well-being of the victim(s). It is also quite common among forensic psychiatric patients generally and among the patients at the study hospital specically (Nicholls, Brink, Greaves, Lussier, & Verdun-Jones, 2009). As a result, verbal aggression may be perceived as less important than the more pressing concerns of preventing physical violence to self and others. More importantly, we did not explicitly instruct clinicians to assess for risk of verbal aggression and/or property damage, but rather asked them to rate patients' risk for violence generally. Thus, the nding of no dierence between assessments of risk for violence probably more accurately reects the relationship between condence and accuracy for START assessments. The Journal of Forensic Psychiatry & Psychology 17 Implications The relationship between predictive accuracy and assessor condence is integral to the assessment and management of short-term risks presented by forensic psychiatric patients. Forensic mental health professionals are asked to estimate with an 'appropriate' degree of condence the likelihood of adverse outcomes (Ruscio, 2000; Smith & Dumont, 1997; Tolman & Rotzien, 2007). In fact, legislation requires that clinicians demonstrate with sucient certainty that patients pose a signicant threat to themselves or others to warrant continued detention (Desmarais, Hucker et al., 2008). Yet, condence and accuracy are two separate constructs. One can be very condent about an assessment and be completely wrong. As reviewed in the Introduction, this can have dire consequences in clinical forensic settings, such as overlooked risk to self or others, or a negative impact on the patient-clinician relationship. Our ndings add to a well-established literature that suggests assessors would benet from feedback regarding the validity of past assessments (Arkes et al., 1987; Lichtenstein & Fischho, 1980; Smith & Dumont, 1997). Specically, a feedback loop may increase the appropriateness of assessor condence levels (Einhorn & Hogarth, 1978; Smith & Agate, 2004; Smith & Dumont, 1997; Tramow & Sniezek, 1994). Unfortunately, this is often unrealistic in clinical forensic practice. If risk is identied, strategies will be implemented to manage this risk and prevent the adverse outcomes. If these interventions are successful and the violence is prevented, then the clinician does not receive feedback regarding the accuracy of the prediction. Moreover, after an assessment is completed, the clinician may have no further contact if the patient does not remain under the jurisdiction of a specic agency. So, how do we provide feedback regarding risk assessment accuracy? The solution may lie within risk assessment education and training programs. Thorough education incorporating theory and research should increase accuracy, thereby improving the calibration between accuracy and condence (Arkes et al., 1987). Such training programs may additionally aord the opportunity to discuss issues relating to accuracy and caution clinicians to be wary of their 'expertise' and condence. Real cases for which outcomes are known could be used as practice cases for coding of the instrument. Trainers would then have the ability to provide feedback regarding assessment accuracy and avoid inating assessor condence. Limitations Conclusions should be drawn with a few caveats in mind. First, assessors had attended only a brief training session and the START manual had not yet been published. Second, START in its current form (Version 1.1, Webster et al., 2009) has undergone some signicant changes since the pilot 18 S.L. Desmarais et al. version tested in this study. Third, generalizability may be limited as a result of the fact that each assessor completed multiple assessments. Finally, although assessors made specic risk estimates for dierent outcomes (i.e., violence to others, self-harm, suicide, and unauthorized leave), they provided only one global condence rating. This is probably the most signicant limitation of the present study because this global rating may have obscured dierences between the dierent domains. Despite these limitations, the strength of the study lies in our evaluation of the condence with which 23 multidisciplinary clinicians completed structured assessments of risk in the context of routine forensic practice. This design feature increases the ecological validity of our study over previous research which examined 'mock' decision makers or 'mock' decision contexts (de Vogel & de Ruiter, 2004). General conclusions and future directions Results of the current and past research suggest that condence may be an important factor in the assessment of violence risk (Mills & Kroner, 2006). As suggested by McNiel et al. (1998), failure to consider assessor condence may contribute to biased evaluations of mental health professionals' capacity to assess violence potential. Consequently, further research is needed on the relationship between condence and accuracy in violence risk assessment. In particular, inconsistent ndings across studies may underscore the inuence that task characteristics can have on the accuracy- condence relationship. In the context of violence risk assessment, relevant task characteristics may include the focus of the assessment (violence vs other adverse outcomes), length of time between assessment and outcome measurement (days, weeks, months, years), assessment approach (unstructured vs actuarial vs SPJ), assessment context (civil vs forensic vs correctional), information available (ocial records and/or collateral informants), and the assessors themselves (clinicians vs research assistants, years of practice, length of time and in what capacity the assessor has known the patient). Future research should examine how such characteristics inuence the relationship between assessor condence and risk assessment accuracy. Further, prior investigations demonstrate that clinicians' feelings and attitudes can impact their assessments (Dernevik, Falkheim, Holmquist, & Sandell, 2001; de Vogel & de Ruiter, 2004). Thus, characteristics of both the patient and clinician-patient relationship may aect the accuracy- condence relationship in the context of violence risk assessment. Finally, the implications of eliciting risk-specic condence estimates should not be understated. We asked clinicians to attend to diverse outcomes while providing a single estimate of their condence level. It is possible that our results underestimate the accuracy-condence relationship. For instance, if clinicians were highly condent of their estimate of violence risk but not The Journal of Forensic Psychiatry & Psychology 19 condent of their estimate of unauthorized leave risk, they may have opted to average the two. We do not know how the study participants would have resolved this challenging task. Our future research will address this methodological limitation. Violence risk assessment has largely come to be seen as a process best handled by structured team-based approaches (American Psychological Association, 2006). In prior analyses, START assessments did not dier signicantly by profession nor did predictive validity (Nicholls et al., 2006). In the present study, condence did not dier as a function of assessor profession. This interdisciplinary agreement may have been inated because individual ratings were completed following treatment team meetings, although generally consistent with the manner in which START can be used (Webster et al., 2006). Overall, results of the present study support the structured, multidisciplinary team-based approach to short-term risk assessment proposed by the START. References American Psychological Association Presidential Task Force on Evidence-Based Practice (2006). Evidence-based practice in psychology. American Psychologist, 61, 271-285. Arkes, H.R. (1981). Impediments to accurate clinical judgment and possible ways to minimize their impact. Journal of Consulting and Clinical Psychology, 49, 323-330. Arkes, H.R., Christensen, C., Lai, C., & Blumer, C. (1987). Two methods of reducing overcondence. Organizational Behavior and Human Decision Processes, 39, 133-144. Borum, R. (1996). Improving the clinical practice of violence risk assessment. American Psychologist, 51, 945-956. Brink, J., & Livingston, J. (2004). Testing the utility and acceptance of the STAR. In C.D. Webster, M.-L. Martin, J. Brink, T.L. Nicholls, & C. Middleton (Eds.), Manual for the Short-Term Assessment of Risk and Treatability (START) (Version 1.0, Consultation Edition) (pp. 88-94). Hamilton, Canada: St. Joseph's Healthcare; Coquitlam, Canada: Forensic Psychiatric Services Commission. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). New Jersey: Erlbaum. Crocker, A.G., Garcia, A., Israel, M., Hindle, Y., Gagnon, D., Venegas, C., et al. (n.d.). Implementing and using a systematic risk assessment scheme to increase patient safety on a risk management unit for individuals with severe mental illness: A demonstration project. Edmonton, Canada: Canadian Patient Safety Institute. Retrieved December 2, 2008, from http://www.patientsafetyinstitute.ca/ uploadedFiles/Research/Final%20Report(6).pdf Dawson, N.V., Conners, A.F., Speroof, T., Kemka, A., Shaw, P., & Arkes, H.R. (1993). Homodynamic assessment in managing the critically ill: Is physician condence warranted? Medical Decision Making, 13, 258-266. Dernevik, M., Falkheim, M., Holmquist, R., & Sandell, R. (2001). Implementing risk assessment procedures in a forensic psychiatric setting: Clinical judgement revisited. In D.P. Farrington, C.R. Hollin, & M. McMurran (Eds.), Sex and violence: The psychology of crime and risk assessment (pp. 83-101). New York: Routledge. 20 S.L. Desmarais et al. Desmarais, S.L., Hucker, S., Brink, J., & De Freitas, K. (2008). A Canadian example of insanity defence reform: Accused found not criminally responsible before and after the Winko decision. International Journal of Forensic Mental Health, 7, 1-14. Desmarais, S.L., Nicholls, T.L., & Brink, J. (2008, July). Psychometric properties of the START in practice: Increasing the ecological validity of risk assessment research. Paper presented at the meetings of the International Association of Forensic Mental Health Services, Vienna, Austria. de Vogel, V., & de Ruiter, C. (2004). Dierences between clinicians and researchers in assessing risk of violence in forensic psychiatric patients. The Journal of Forensic Psychiatry & Psychology, 15, 145-164. Douglas, K.S., & Kropp, P.R. (2002). A prevention-based paradigm for violence risk assessment: Clinical and research applications. Criminal Justice and Behavior, 29, 617-658. Douglas, K.S., & Oglo, J.R.P. (2003). The impact of condence on the accuracy of structured professional and actuarial violence risk judgments in a sample of forensic psychiatric patients. Law and Human Behavior, 27, 573-587. Douglas, K.S., & Skeem, J.L. (2005). Violence risk assessment: Getting specic about being dynamic. Psychology, Public Policy, and Law, 11, 347-383. Dunning, D., Grin, D.W., Milojkovic, J.D., & Ross, L. (1990). The overcondence eect in social prediction. Journal of Personality and Social Psychology, 58, 568-581. Dvoskin, J.A., & Heilbrun, K. (2001). Risk assessment and release decision-making: Toward resolving the great debate. Journal of the American Academy of Psychiatry and the Law, 29, 6-10. Einhorn, H.J., & Hogarth, R.M. (1978). Condence in judgment: Persistence of the illusion of validity. Psychological Review, 85, 395-416. Elbogen, E.B. (2002). The process of violence risk assessment: A review of descriptive research. Aggression and Violent Behavior, 7, 591-604. Ennis, B.J., & Litwack, T.R. (1974). Psychiatry and the presumption of expertise: Flipping coins in the courtroom. California Law Review, 62, 693-752. Faust, D. (1986). Research on human judgment and its application to clinical practice. Professional Psychology: Research and Practice, 17, 420-430. Faust, D., & Ziskin, J. (1988). The expert witness in psychology and psychiatry. Science, 241, 31-35. Garb, H.N. (1986). The appropriateness of condence ratings in clinical judgment. Journal of Clinical Psychology, 42, 190-197. Grove, W.M., & Meehl, P.E. (1996). Comparative eciency of informal (subjective, impressionistic) and formal (mechanical, algorithmic) prediction procedures: The clinical-statistical controversy. Psychology, Public Policy, and Law, 2, 293-323. Hanley, J.A., & McNeil, B.J. (1983). A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology, 148, 839-843. Harvey, N. (1997). Condence in judgement. Trends in Cognitive Sciences, 1, 78-82. Hilton, Z., Harris, G.T., & Rice, M.E. (2006). Sixty-six years of research on the clinical versus actuarial prediction of violence. Counseling Psychologist, 34, 400-409. Hilton, Z., & Simmons, J.L. (2001). The inuence of actuarial risk assessment in clinical judgments and tribunal decisions about mentally disordered oenders in maximum security. Law and Human Behavior, 25, 393-408. Kahneman, D., & Tversky, A. (1973). On the psychology of prediction. Psychological Review, 80, 237-251. The Journal of Forensic Psychiatry & Psychology 21 Kahneman, D., & Tversky, A. (1982). On the study of statistical intuition. Cognition, 11, 123-141. Lichtenstein, S., & Fischho, B. (1980). Training for calibration. Organizational Behavior and Human Performance, 26, 149-171. McNeil, B.J., & Hanley, J.A. (1984). Statistical approaches to the analysis of Receiver Operating Characteristic (ROC) curves. Medical Decision Making, 4, 137-150. McNiel, D.E., Sandberg, D.A., & Binder, R.L. (1998). The relationship between condence and accuracy in clinical assessments of psychiatric patients' potential for violence. Law and Human Behavior, 22, 655-669. Mills, J.F., & Kroner, D.G. (2006). The eect of discordance among violence and general recidivism risk estimates on predictive accuracy. Criminal Behaviour and Mental Health, 16, 155-166. Nicholls, T.L., Brink, J., Desmarais, S.L., Webster, C.D., & Martin, M.-L. (2006). The Short-Term Assessment of Risk and Treatability (START): A prospective validation study in a forensic psychiatric sample. Assessment, 13, 313-327. Nicholls, T.L., Brink, J., Greaves, C., Lussier, P., & Verdun-Jones, S. (2009). Forensic psychiatric inpatients and aggression: An exploration of incidence, prevalence, severity, and interventions by patient gender. International Journal of Law and Psychiatry, 32, 23-30. Nicholls, T.L., Desmarais, S.L., Douglas, K.S., & Kropp, R.K. (2006). Violence risk assessments with perpetrators of intimate partner abuse. In J. Hamel & T.L. Nicholls (Eds.), Family interventions in domestic violence: A handbook of genderinclusive theory and treatment (pp. 275-301). London: Springer. Nicholls, T.L., Gagnon, N., Crocker, A.G., Brink, J., Desmarais, S.L., & Webster, C. (2007). START Outcomes Scale (SOS). Vancouver, Canada: BC Mental Health & Addiction Services. Nicholls, T.L., Webster, C.D., Brink, J., & Martin, M.-L. (2008). Short-term assessment of risk and treatability. In B. Cutler & P. Zapf (Eds.), Encyclopedia of psychology and law (pp. 744-746). Thousand Oaks, CA: Sage. Nonstad, K., Nesset, M.B., Pedersen, T.W., Kroppan, E., Nottestad, J.A., Almvik, R., et al. (under review). The predictive validity and other properties of the Short Term Assessment of Risk and Treatability (START). Oglo, J.R.P., & Daern, M. (2006). The dynamic appraisal of situational aggression: An instrument to assess risk for imminent aggression in psychiatric inpatients. Behavioral Sciences and the Law, 24, 799-813. Pulford, B.D., & Colman, A.M. (1997). Overcondence: Feedback and item diculty eects. Personality and Individual Dierences, 23, 125-133. Quinsey, V.L., Jones, G.B., Book, A.S., & Barr, K.N. (2006). The dynamic prediction of antisocial behavior among forensic psychiatric patients: A prospective eld study. Journal of Interpersonal Violence, 21, 1539-1565. Rabinowitz, J., & Garelik-Wyler, R. (1999). Accuracy and condence in clinical assessments of psychiatric inpatients risk of violence. International Journal of Law and Psychiatry, 22, 99-106. Rogers, R. (2000). The uncritical acceptance of risk assessment in forensic practice. Law and Human Behavior, 24, 595-605. Ruscio, J. (2000). The role of complex thought in clinical prediction: Social accountability and the need for cognition. Journal of Consulting and Clinical Psychology, 68, 145-154. Silver, J.M., & Yudofsky, S.C. (1991). The overt aggression scale: Overview and guiding principles. Journal of Neuropsychiatry & Clinical Neurosciences, 3, S22-S29. 22 S.L. Desmarais et al. Smith, D., & Agate, J. (2004). Solutions for overcondence: Evaluation of an instructional module for counsellor trainees. Counsellor Education & Supervision, 44, 31-43. Smith, D., & Dumont, F. (1997). Eliminating overcondence in psychodiagnosis: Strategies for training and practice. Clinical Psychology: Science and Practice, 4, 335-345. Tolman, A.O., & Rotzien, A.L. (2007). Conducting risk evaluations for future violence: Ethical practice is possible. Professional Psychology: Research and Practice, 38, 71-79. Tramow, D., & Sniezek, J.A. (1994). Perceived expertise and its eect on condence. Organizational Behavior & Human Decision Processes, 57, 290-302. Webster, C.D., Douglas, K.S., Eaves, D., & Hart, S.D. (1997). HCR-20: Assessing risk for violence (Version 2). Vancouver, Canada: Mental Health, Law, & Policy Institute, Simon Fraser University. Webster, C.D., Martin, M.-L., Brink, J., Nicholls, T.L., & Desmarais, S.L. (2009). Manual for the Short-Term Assessment of Risk and Treatability (START) (Version 1.1). Coquitlam, Canada: BC Mental Health & Addiction Services; Hamilton, Canada: St. Joseph's Healthcare. Webster, C.D., Martin, M.-L., Brink, J., Nicholls, T.L., & Middleton, C. (2004). Manual for the Short-Term Assessment of Risk and Treatability (START) (Version 1.0, Consultation Edition). Hamilton, Canada: St. Joseph's Healthcare; Coquitlam, Canada: Forensic Psychiatric Services Commission. Webster, C.D., Nicholls, T.L., Martin, M.-L., Desmarais, S.L., & Brink, J. (2006). Short-Term Assessment of Risk and Treatability (START): The case for a new violence risk structured professional judgment scheme. Behavioral Sciences and the Law, 24, 747-766. Wedding, D., & Faust, D. (1989). Clinical judgment and decision making in neuropsychology. Archives of Clinical Neuropsychology, 4, 233-265. Whittemore, K.E. (1999). Releasing the mentally disordered oender: Disposition decisions for individuals found unt to stand trial and not criminally responsible. Unpublished doctoral dissertation, Simon Fraser University, Burnaby, Canada. Wilson, C.M., Desmarais, S.L., Nicholls, T.L., & Brink, J. (under review). The role of client strengths in assessments of short-term violence risk. Yudofsky, S.C., Silver, J.M., Jackson, W., Endicott, J., & Williams, D. (1986). The Overt Aggression Scale for the objective rating of verbal and physical aggression. American Journal of Psychiatry, 143, 45-49. Copyright of Journal of Forensic Psychiatry & Psychology is the property of Routledge and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. Running head: DATA ANALYSIS AND APPLICATION Data Analysis and Application (DAA) Template Learner Name Capella University 1 Running head: DATA ANALYSIS AND APPLICATION 2 Data File Description 1. Describe the context of the data set. You may cite your previous description if the same data set is used from a previous assignment. 2. Specify the variables used in this DAA and the scale of measurement of each variable. 3. Specify sample size (N). Testing Assumptions 1. Articulate the assumptions of the statistical test. 2. Paste SPSS output that tests those assumptions and interpret them. Properly embed SPSS output where appropriate. Do not string all output together at the beginning of the section. 3. Summarize whether or not the assumptions are met. If assumptions are not met, discuss how to ameliorate violations of the assumptions. Research Question, Hypotheses, and Alpha Level 1. Articulate a research question relevant to the statistical test. 2. Articulate the null hypothesis and alternative hypothesis. 3. Specify the alpha level. Interpretation 1. Paste SPSS output for an inferential statistic and report it. Properly embed SPSS output where appropriate. Do not string all output together at the beginning of the section. 2. Interpret statistical results against the null hypothesis. Conclusion 1. State your conclusions. 2. Analyze strengths and limitations of the statistical test. Running head: DATA ANALYSIS AND APPLICATION References Provide references if necessary. 3