A Self-Report Approach to Screening Police Candidates’ Aggressive Tendencies

This paper introduces the concepts of labeling (detection of aggression severity) and provocation (response to directed aggression) as meaningful dimensions for evaluating police candidates ' patterns of aggressive tendencies. The evaluation uses candidates ' judgments on instances of suspect behaviors during hypothetical arrest situations. Findings showed that candidates agreed on an ordered continua of behavior severity, alpha= 0.99. One was able to predict very well candidates ' provocation tendencies from knowing their labeling tendencies and vice versa,? = 0.76. Labeling and provocation tendencies were related to other established measures of aggression ( e.g., Buss-Perry Aggression Questionnaire , 1992). I discuss potential applications of candidates' labeling and provocation tendencies for use in police training sessions and employment evaluations. Acknowledgements Although this research bears my name, its beginning and completion would not be possible without the assistance of many other people. To the police agencies and training academies who participated , their administrators and personnel who supported my request to survey police recruits , I thank them for their assistance and trust. Members of my dissertation committee , Dr. Jerry L. Cohen , Dr. Patricia L. Gallagher , and Dennis C. Hilliard helped give substance and direction to this research . Dr. Patricia L. Gallagher was instrumental in sharing information on psychological testing in police employment settings . My major professor, Dr. Charles E. Collyer , has been supportive and has offered valuable perceptions on measuring police candidates ' aggres sive tendencies. His thoughts on concepts of labeling and provocation were important to this study. Finally, to my wife Kim and sons Frank and Nico, without their unending source of support this work would not be possible, thank you.


List of Tables
Police have the responsibility to safeguard the well-being of the community , prevent crime, and enforce laws (Adams , 1999). Constitutional provisions recognize that such work demands carry with them the need to use some degree of coercion to effect them (Terry v. Ohio, 1968). As long as some citizens continue to demonstrate their willingness to break the law, the use of force will remain an unavoidable activity of police work (Skolnick & Fyfe, 1993). The importance of aggression in police work demands selection procedures that screen in police candidates who are willing to be forceful , yet screen out those candidates who demonstrate a lack of restraint and selfcontrol. Psychological screening has long been one component of evaluating a candidate ' s ability to balance aggression (Benner, 1986). Yet, highly publicized abuses of force by the police remind contemporary thinkers that psychological screening efforts might not always identify the right balance. The tasks of this study are to (a) look at screening approaches used to decide which police candidates to select and which to weed out, (b) review psychological tests used for police selection and their links to predicting subsequent abuses of aggression, and (c) propose an aggression assessment to identify which candidates might experience difficulties with using aggression in a law enforcement capacity.

Approaches to Psychological Screening
Most police agencies recognize psychological evaluations as a required component of the police selection process (Detrick, Chibnall, & Rosso, 2001). The courts have looked at failure to provide psychological screening as negligence (Bonsignore v. City of New York, 1981;Conte v. Horcher , 1977;McKenna v. Fargo , 1978). Today, more than 80% of U.S. police agencies requir e the administration of psychologic al tests to screen polic e candidates (Craig, 2005). Traditionally, determining the psychological suitability of police candidates might involve two selection events: a "screen-out" decision or a "se lect-in" decision (Benner , 1986).
Selecting in desirable police candid ates involves choosing the most qualified who demon strate positive qualities necessary to be successful in the wo rk field. A work task analysis of successful officers is one strategy usually adopted for gainin g select-in information that identifies important polic e functions and the necessary police characteristics to perform them (Craig, 2005;Inwald , Knatz , & Shusman, 1983) .
Effective policin g and its performance require a mixture of tasks. The evo lving nature of policin g might cause difficultie s in identifyin g certa in qualities necessary to perform such tasks (Cohen & Chaiken, 1973). Consequently , the select-in decision might focus on particular police characte ristics that no longer reflect existing police pract ices (Brenge lman , 1982; as cited in Grant & Grant, 1995) . For example, in the 1990s law enforcement saw a shift from car-based policing to intra communit y-based patrols that focused on problem-orie nted policing . Police agencies recognized a new need for select ing officers who have certain problem solving skills .
Alth ough psychologists might use select-in criteria to arrive at an acce pt decision , there is a lack of consensus among police stakeholders on the qualities needed to be successful in the law enforceme nt professio n (Benner, 1986). Benner recognized that there is more agreement on the unwanted or negative qualitie s of police candidates and that selection usually involves a screen-out deci sion. This kind of selection event calls for elimin ating those candidates who demonstrate undesirable police characterist ics. Psychologic al stability is the major concern. Psychologists hypothesize that a psychologically unstable officer is more likely to perform poorly in the work field than is a "normal" officer. However, some reviews on the screen -out approach suggest a lack of consistent evidence on predicting which candidates are more likely to experience on-the-job difficulties (e.g., Daley, 1982;Varela , Boccacci ni, Scogin, Stump, & Caputo, 2004 ).
The select-in and screen-out decisions might include evaluations of mental health, which psychologists must carryout in accordance with the Americans with Disabilities Act (ADA), and only after a conditional offer of employment to the police candidate (Hibler & Kurke, 1995). For pre-conditional offers, psychologists use personality tests and other police screening methods that do not include evaluations of mental health (Vetter, 1999). Both conditional and pre-conditional offer psychological evaluations focus on screening for suitable candidates. The President 's Commission on Law Enforcement and Administration of Justice (1967) recommends psychological research and development of valid tests for screen-out and select-in procedures.
In practice, pre-emplo yment psychologica l evaluations should focus on the suitability of police candidates to perform essenti al job tasks, while screening for characteristics that may adversely affect their job performance (Internat ional Association of Chiefs of Police, 2004). Although some psychologi cal screener s might lean toward a select-in or screen-out approach, many screeners favor a psychological test battery that satisfies both selection events: screen-out psychopathology ( clinical) and select-in ideal police attitudes, traits, and background .

Tests for Psychological Screening
The International Association of Chiefs of Police (IACP) suggests twenty-two recommendations for the pre-employment psychological evaluation of police candidates (IACP, 2004). Among the criteria , the IACP recommends the use of objective and validated tests that specify what job-related functions they intend to measure. The Minnesota Multiphasic Personality Inventory (MMPI; Hathaway & McKinley, 1943), the California Psychological Inventory (CPI; Gough, 1975), and the Inwald Personality Inventory (IPI; Inwald et al., 1983) are some of the more commonly used psychological tests for police screening (Wrightsman , 2005).
The MMPI and the CPI are general personality inventori es used to assess the relatively stable and enduring characteristics of test takers. They tap a number of dimensions (or factors) thought to make up the respondent ' s personality , which might affect subsequent uses of job-work aggression . The MMPI is a clinical instrument designed to measure dimensions of psychopathology. Whereas the CPI is a nonclinical instrument designed to measure normal personality traits important for social living and interaction. Authors of the MMPI and the CPI did not initially design the instruments to screen police candidates. There are, however, police and public safety reports available for both the MMPI and CPI.
In contrast to the MMPI and CPI, Inwald et al. (1983) developed the IPI to predict normal as well as deviant job-performance pattern s of police candidates. Four general content areas of the IPI measure job-related criteria: guardedness, acting out behaviors, internaliz ed conflict, and interpersonal conflict. Psychological tests such as the MMPI , CPI, and IPI capture an objective measure of a sample of the candidate's behaviors (Anastasi & Urbina, 1997). The diagnostic value of these tests is to forecast what the candidate might say or do under work conditions. Therefore, "forming the connection between applicants' test responses and eventual job performance is crucial in the evaluation of a test's general usefulness" (Inwald & Shusman, 1984, p. 1). Do personality tests do well at predicting which police candidates will have difficulties with on-the-job uses of aggression? Hargrave, Hiatt, and Gaffney (1988) found that elevations on an "aggression index" composed of MMPI scales F (infrequency), 4 (psychopathic deviate), and 9 (hypomania), combined with elevated Cn (control in psychological adjustment) scale scores correctly classified aggressive incumbent officers who received disciplinary actions for aggressive misconduct against offenders, inmates, co-workers, or family members. Costello, Schneider, and Schoenfeld (1996) observed that elevations on the F+4+9 aggression index predicted suspensions of officers after three years of service.
Although MMPI scale scores have generated some discussion on their usefulness for predicting job-related aggression, there appears an absence of research on the utility of the MMPI's current version, the MMPI-2 (Butcher, Dahlstrom, Graham, Tellegen, & Kaemmer, 1989), for predicting police difficulties with uses of force.
There is some psychological literature showing validity of the CPI for predicting abuses of aggression. Hargrave and Hiatt (1989) reported an association between low CPI scale scores on socialization, self-control, and well-being and disciplinary actions against incumbent police officers. Job difficulties that led to disciplinary actions included unnecessary uses of force. Fitzgerald (1987) found that officers with low CPI Re (responsibility) scores tended to receive citizen complaints, which included uses of unnecessary force. Sarchione, Cutler, Muchinsky, Nelson-Gray (1998) reported that low CPI scale scores on Re, So (socialization), and Sc (selfcontrol) discriminated officers who received disciplinary action from those officers who did not. Reported job dysfunct ions that led to discipline includ ed using excessive force and inappropri ate verbal conduct toward the public . Scogin , Schumach er, Gardner , and Chaplin (1995) found that IPI scales Absence Abus e, Anxiety, Substanc e Abuse , Rigid Type, Critical Items , Undue Suspiciousness , Unusual Experience , and Sexua l Concerns best predicted which officers would receive citizen complaints. The author s did not report on the types of citizen complaints filed against officers.
Much of the literature on personality tests used for predicting subsequent police job performance has linked tests scores with objective criteria such as disciplinary actions, absenteeism , and citizen compla ints, or with subjective criteria such as supervisory performance ratings (Varela et al., 2004). Wher e disciplinary actions and citizen complaints are job performanc e criteria, they are often composite measures that might include uses of abusive aggression that authors repo rt or sometimes fail to report. Generally, there is a lack of evidence that ties down specific psychological constructs of personality tests to certain measures of j ob work aggression (Grant & Grant, 1995).
Which police candidates are prone to unre asona ble uses of aggression? The Independent Commi ssion on the Los Ang eles Police Department (ICLAPD; found that officers who had high rates of excessive force complaint s also receiv ed superior supervisory performance ratings and psychologists rated them as suitable for police work. If personality tests are contributing some knowledge about aggressive tendencies, then being prone to abuses of aggression might be more than a matter of measuring personality traits (Grant & Grant, 1995) . Toch (1995) recognized that not all psychologically healthy officers are free from abuses of aggres sion. Situational factors might contribute to aggressive overreactions (Benner , 1986;Mills & Stratton , 1982). Abuses of aggression might be an artifact of attitudes and belief systems that develop after selection (ICLAPD, 1991) or be independent of them . Police experience and effects of the occupational culture might lead to job-related problems not predicted by candidates' psychological profiles. Megargee (1969) suggests that instigation, inhibition, and situational factors interact to determine some acts of aggression. In short, test responses , in part , help identify aggressi ve tendencies. The test data when coupled with the personal history (e.g., legal difficulties, physical altercations, other antisocial or unconventional tendencies) improve identification.
Adding additional valid approaches may further enhance this process.

Thinking about Aggression Assessment
In this research , I propose rendering an aggression assessment of police candidates to determine their present aggressive tendencies in the management of potential job-related conflict. Consider that psychologists can reproduce important aggressive behavior patterns if they present hypothetical force situations to candidates and ask them to predict their performance. Candidates must use their experience to answer such questions when they have little direct knowledge on how to employ force in the work field . In creating hypothetical force situations, it is important to include conditions in which candidates need to use some force to manage job-related problems. Most police use-of-force incidents result in arrests (Croft & Austin, 1987; as cited in Adams, 1995). Using arrests as a source of data about use of force allows psychological screeners to approximate work conditions in which police might use varying degrees of force against suspects. While suspects might respond to arrests by using firearms, knif es, kicks, punches, or profanity against the police, the police have a range of possible forceful responses that might include the use of police equipment such as firearms, batons, chemical agents, stun guns or the use of weaponless tactics such as arm bars, pressure points, or verbal commands. Police trainers usually represent these types of police and suspect behaviors along a continuum of force (Garner, Buchanan, Schade, & Hepburn, 1996;Garner & Maxwell, 1999;McLaughlin, 1992). The continuum categorizes behaviors and orders the categories on their relative severity. These gradations capture important variations in the types of force police and suspects might use in a given encounter. The continuum of force illustrates how police agencies conceptualize the measurement of aggression.
Designing an inventory of force situations that include a continuum of force (a) provides a set of behaviors that reveal levels of aggression, (b) gives psychological screeners guidelines for defining levels of force candidates might use in response to levels of suspect behaviors, and ( c) allows screeners to define "excessive force" in terms of police training practices.
Of course, not all candidates are equal in the way they might use force to solve hypothetical arrest situations. Collyer, Gallo, and Boney-McCoy (2004) suggest that individuals may differ in two ways. First, there can be differences in the threshold adopted by two candidates for the use of a given response tactic. For example, if a suspect begins to use profanity , one candidate might begin to employ physical responses; another candidate might require a physical action by the suspect before responding physically him/herself. Note that in this example, both candidates can be in agreement about the underlying ordered continua of force that apply to their own and the suspect's behavior; their disagre ement is over where to "draw the line " (or threshold) with respect to physical tactics. Most candidates can probably be trained to have a common conception of the continuum, even if individual differences in thresholds remain.
A more significant kind of difference arises when candidates have different conceptions of the continuum of force itself -that is, when their ordering of actions by severity are not the same . For example , one candidate may rate a threat of physical force, such as shaking a stick, as more provocative than profanity. Another candidate (perhaps for personal or cultural reasons) may regard the profanity as more provocative , and even as the triggering event for a forceful response . In this case, there is a need for psychological screeners and police trainers to recognize that the two candidates have not internalized the same continuum of force.
I have been discussing what could be called provocation -the use of some level of force in response to a suspect's behavior. Another measurable aspect of a candidate's approach to force is labeling. Labeling refers to the way a police candidate assesses the severity of aggression of an action , not necessarily in a threatening situation, but generally. Collyer et al. (2004) found that some people regard profanity as a moderately violent behavior, while others rated profanity as "not violent at all." Again, there are two ways police candidates can differ in their labeling judgments: (1) they may agree on the underlying order of behaviors with respect to severity of aggression , but differ in where they draw the line for applying labels such as "aggressive" and "nonaggressive"; and (2) they may differ in their actual ordering of behaviors by severity.
The data of Collyer et al. so far suggest that individuals have a common conception of the relative severity of aggressive actions, and that this common ordering by severity underlies both labeling and provocation judgments. They argue that threshold differences predominate over differences in ordering , because the correlations among individuals are high (r = +0.85); on the other hand, if it were the case that differences in ordering predominated , these correlations would be low by definition. One may speculate , however , that a shared understanding of how to order behaviors may arise from common cultural experience in a homogeneous group, and that differences in ordering may become more frequent when individuals come from different backgrounds.
Are labeling and provocation judgment s related? Collyer et al. (2004) observed a relationship between college students' labeling and provocation ratings on instances of violent actions , r = +0.36, p < .01. In a scatterplot, the paired values (individual s' average ratings) using median splits on low and high labeling and provocation ratings define four aggression types: individuals with high labeling and high provocation ratings, high labeling and low provocation, low labeling and low provocation, and low labeling and high provocation ratings. In this research , police candidates with low and high labeling and provocation ratings will be sorted into the four aggression types: 1. HH -high labeling raters and high provocation raters. Candidates see many types of suspect force as aggression , and are easily provoked into being aggressive.
Such candidates are more likely to use force when needed. Extreme HH types are more likely to use force in excessive ways.
2. HL -high labeling raters and low provocation raters. Candidates see many types of suspect force as aggression, but are difficult to provoke into being aggressive.
Such candidates are more likely to be overly cautious in employing force.
3. LLlow labeling raters and low provocation raters . Candidat es discount many types of suspect force as aggression , and are difficult to provoke into being aggressive. Such candidates are more likely to be permissive and have difficulty making decisions to use force .
4. LHlow labeling raters and high provocation raters. Candidates discount many types of suspect force as aggression , but are easily provoked into being aggressive. Such candidates are more likely to be unpredictable in their uses of force.
Distinguishing aggression types consists of using candidates ' labeling and provocation judgments on instances of suspect behaviors during hypothetical arrest situations. Those judgments might reflect four dispositional subtraits of aggression: "Physical and verbal aggression, which involve hurting or harming others , represent the instrumental or motor component of behavior. Anger , which involves physiological arousal and preparation for aggression , represents the emotional or affective component of behavior. Hostility , which consists of feelings of ill will and injustice , represents the cognitive component of behavior" (Buss and Perry, 1992, p. 457). Self-report measures such as questions on candidates ' past behavioral expressions of these subclasses of aggression can alert police screeners to the ways in which candidates' experiences influence their labeling and provocation judgments. At the time of academy entrance, both state and municipal police candidates had passed general requirements (i.e., minimum education level of a high school degree or its equivalent, written exam, physical agility test, and an oral board interview) of their hiring agencies. All candidates had been through a psychological screening process ; psychologists who conducted the evaluations judged them to be suitable for police work. Demographic data collected on police candidates included gender, racial group membership, age, and education achievement. The study consisted of four police recruit classes .
Recruit class 1. The first class consisted of 42 municipal police candidates .
Thirty-nine (or 92.9%) recruits were male and three (or 7.1 %) were female . The racial composition of the class was 38 (or 90.5%) White recruits , 2 (or 4.8%) Black , 1 (or 2.4%) Asian, and 1 (or 2.4%) Hispanic. On average, recruits were roughly 26 years of age and they had earned around 72 college credits .
Recrui t class 2. The second class, which was state police candidates, consisted of 20 (or 83.3%) male and 4 (or 16.7%) female recruits. Sixteen (or 90.5%) recruits were White , 4 (or 4.8%) were Black , and 3 (or 2.4%) were Hispanic . The average age of recruits was around twenty-seven. Fourteen recruits ( or 58.3%) had earned 120 or more college credits and 4 (or 22.2%) recruits had earned at least 60 but less than 120 college credits. Information on the academic achievement of six recruit s was unavailable.
Recruit class 3. The third class consisted of 30 municipal police candidates who were all male. Most recruits were White (28 or 93.3%) and the remainder was Hispanic (2 or 6.7%). On average, recruits were roughly 27 years of age and they had earned around 86 college credits.
Recruit class 4. The fourth class, which was also municipal police candidates, was composed of 40 (or 83.3%) males and 8 (or 16.7%) females. Most recruits were White (45 or 93.8%), followed by Black (1 or 2.1 %), Asian (1 or 2.1 %), and Hispanic (1 or 2.1 % ). The average age of recruits was almost twenty-seven. Their average college credits earned was about ninety.

Development of a Rea ction Inv entory -Force
The Reaction Inventory -Force (RIF) was designed to measur e the degree to which stimulus situations ( or arrest encounters involving force) may reveal labeling and provocation tendencies. A careful review of the police literature did not uncover any uses of hypothetical arrest situations that might have provided a starting point , and so the RIF is a new first-stage instrument. The method of inventory development involved generating suspect behaviors and police responses categorized along a continuum of force. A review of the items by police candidates and a panel of police experts served to assess the adequacy and meaningfu lness of the inventory.
It em generation . A review of use of force continua from northeast U.S. police agencies, and a review of the literature on force by the police guided the selection of six categories of force that suspects might use against police during an arrest: (1) nonverba l resistance -the suspect's intentional use of nonverbal behaviors that indicate his or her attitude, appearance, or physical readiness to resist the officer; (2) verbal resistance -the suspect's intentional use of verbal responses that indicate his or her unwillingness to cooperate with the officer; (3) passive resistance -the suspect ' s intentional use of physical actions not directed against the officer, with no intent to prevent the officer's attempt to take control; ( 4) defensive resistance -the suspect's intentional use of physical actions to escape, with no intent to cause harm to the officer; (5) assaultive resistance -the suspect's intentional use of physical actions against the officer, with intent to cause harm to the officer; and (6) deadly force resistance -the suspect's intentional use of physical actions or weapons against the officer, with intent to cause serious bodily harm or death to the officer. The categories provided a framework to organize different types of forceful behaviors by their relative degree of severity. Initial categorization was important because there is no one accepte d configuration by police agencies . Catego ry labels and definitions were sufficiently broad to include most items police professionals might suggest.
Initial item generation of behaviors within each category involved a content analysis of literature on force by and against the police and my police experience. A tentative item pool of 37 forceful behaviors was comprehensive enough to represent the six categories of suspect force (6 nonverbal resistance behaviors, 5 verbal resistance , 6 passive resistance, 6 defensive resistance, 7 assaultive resistance, and 7 deadly force resistance behaviors) .
A five-person focus group of police trainers convened to help examine the properties of the proposed 37 behavior items: category assignment and relevance (i.e., high, moderate, or low), and vocabulary clarity and conciseness (Fowler, 1995). The Labeling task. I arranged the generated 37 suspect behaviors as part of a paperand-pencil survey called "labeling task" (see Appendix A). Appendix B shows the constituent behavior items and their assigned force categories. The order of suspect behaviors consisted of non-repeated gradations of force. The labeling task asked police candidates how much force they would associate with each of the 37 suspect behaviors used during an arrest. Candidates used a six-point Likert type respons e format , which ranged from no force to maximum force (l=no force, 2=low force, 3=moderate force, 4=intermediate force , 5=high force, 6=maximum force). A number and label assignment to each scale option can improve the reliability of the respons e task by providing a basis for discriminating between options (Converse & Presser , 1986;Fowler , 1995). Provocation task. The task was similar to the labeling task, except that I framed the suspect behaviors as directed against candidates; the instructions asked candidates to give their opinions aloud on how they would respond to the behaviors; the response task involved timed-conditions ; and a computer program (Macromedia Flash Movie) presented the behaviors (i.e., text items) on a projection screen.
Appendix C gives the provocation task.
Using a timed-condition response procedure artificially increases states of arousal usually experienced in force situations . Under timed-conditions, I believe that candidates are more likely to operate on "automatic," put little effort into being guarded or defensive with their responses, and use their experience to choose a response predictive of their response under real conditions. Field interview. The goal of the field interview was to find out how potential respondents would understand the behavior items and respond to the hypothetical force situations. Available incumbent police officers (2 males) and police candidates (3 males) participated in a presurvey evaluation . They were members of the target population who might complete the actual labeling and provocation tasks. I constructed five pencil-and-paper short-forms of the provocation task in which the responses required no time constraints. Each short-form consisted of six different behaviors from all categories of suspect force. Short-forms are less demanding and allow respondents to recall and elaborate on how they arrive at their responses (Sudman & Bradburn, 1982). Appendix D gives an example of one short-form. I chose the provocation task for the field interview because there is a level of threat in asking about candidates' aggressi ve behavior that may reveal some undesirable characteristics. Clearly , it is more difficult for inexperienced police candidates to answer questions about using aggression than to answer nonthreatening questions about labeling severity.
Procedures for the presurvey evaluation consisted of having each respondent go twice through a different short-form: respondents first answered the behavior items in the usual way , and then discussed the process they used for answering each item (Forsyth & Lessler , 1991). The question-by-question review followed a standard interview protocol (Fowler, 1995). Appendix E provides the protocol used.
Individual interview sessions lasted roughly 45 minutes. Officers and police candidates expressed that both the instructions and the response scale format were clear and succinct. They agreed that what force meant was not limited to physical actions or the use of weapons, but included officer presence and verbal techniques .
Officers and candidates pointed out that they were able to discriminate well between forceful response categories. Their exampl es of behaviors linked to response categories confirmed clear gradations along the severity dimension. They were using the response categories correctly.
In discussing each behavior item with the incumbent police officers, they found that suspect behaviors were obvious. They went about predicting their responses by using prior experiences with suspects. Officers expressed being very confident when giving their responses. Similarly , police candidates found the suspect behaviors clear. They spoke about visualizing suspect behaviors during arrests , and constructing some strategy from experience (e.g., personal or media) to predict their responses. For example, one candidate mentioned having been punched and used this experience to help him decide on an appropriate response. Police candidates said they were very confident when giving their responses. As expected, for both incumbent officers and police candidates, there was a convergence between their experience and response choices when forecasting their performance in hypothetical force situations. In light of the interview sessions and focus group discussions, I administered the final forms of the labeling and provocation tasks to samples of police recruits.
The definition of the Anger Arousa l subscale from the MAI involves physiologica l arousal, which fits Buss and Perry ' s (1992) definition of anger (Bryant & Smith ,200 1). Cook-Medley's Hostility Scale is a frequently used self-repo rt measure of hostility, which has shown to be a valid predictor of a person ' s physiological and interpersonal functioning (Conrada & Jussim, 1992). Bryant and Smith (2001) reported significant correlations of Anger Arousal scores from the MAI and Cook-Medley Hostility Scale scor es with AQ short-form subscale scores on ANG and HO respectively.

Survey Schedule
Recruit classes 1, 2, and 3. There were preliminary decisions made regarding the collection of data . First, recruits should be relatively na"ive to police endorsed training practices. When recruits have little knowledge on how to use force , they must use prior experiences; their answers will better reflect what they might say or do in force situations (Fowler, 1995;Poland , 1978;Smith & Klein, 1984). Second , a time interval between survey sessions should be long enough to eliminate or reduce sources of extraneous variability particularly carryover effects. Survey procedures for recruit classes 1, 2, and 3 followed a three-day data collection schedule , which began before recruits received extensive training on use s of force.
1. Day 1: In a group session , recruits completed the AQ short-form. It took them roughly 5 minutes to complete.
2. Day 2 (four days from day 1): In a group session, recruits completed the labeling task. It took roughly 15 minutes to complete.
3. Day 3 (four days from day 2): In single-participant sessions, recruits completed the provocation task. It took each recruit roughly 6 minutes to complete.
Following the task, recruits completed the Marlowe-Crowne 13-item short form social desirability questionnaire (Reynolds , 1982) because they might lean toward making favorable impressions in an attempt to appear well suited for police work. The questionnaire is psychometrically superior (Zook & Sipps, 1985;Reynolds;Silverstein, 1983) to most other short-form alternatives to the original Marlowe-Crowne scale (Crowne & Marlowe, 1960). Appendix K gives the Marlowe-Crowne scale. Scoring the scale entailed summing incorrect scores ( correct response = 0, incorrect response= 1). At the end of each test session, I asked the recruit not to discuss the survey with other recruits.
Recruit class 4. The survey schedule closely matched the schedule for recruit classes 1, 2, and 3 except that on Day 1 recruits filled out the above set of additional criterion measures including the behavioral experience questionnaire (BEQ; see Appendix L). The questionnaire revealed self-report information about different kinds of life events that might involve aggressive behavior. Questions were drawn from clinical and life history questionnaires and were used to obtain recruits' background information as a source for describing labeling and provocation tendencies.

Analyses
An exploratory data analyses examined whether I could combine data sets collected from recruit classes into a single data set to obtain improved variability estimates . Researchers must be careful to avoid combining data sets that resemble "apples and oranges." There was no expectation that recruit classes were coming from dissimilar pools of police applicants that would require separate analyses.
Suspect behavior as unit of analysis. The correlation of average behavior ratings for the labeling task with those for the provocation task evaluated the extent of recruits ' agreement on an ordered continuum of suspect behaviors. I expected a high correlation between the average behavior ratings on the two tasks , which would signify a stable underl ying stimulu s dimension of severity.
How man y categories are necessary to map an ordered dimension of sever ity?
A principl e factor analysis (PFA) with Varimax rotation was perform ed on the sample inter-item correlation matrix. The data, 6-point scaled items, has a quasi-continu ous qualit y necessary for using analytic techniques such as factor analysis (Floyd & Widaman , 1995). To check wheth er recruits ' average behavior ratings differed from a test mod el of conservative responses, I constru cted an ordered continuum of force for comparison. The philo sophy underl ying the generated values was a "One-for -One " concept in which recruits would select levels of force that paralleled levels of resistance. Values for items constituting the categories nonverbal and verbal resis tance, passive resistance, defensive resistance , assaultive resista nce, and deadl y force resistance were 2, 3, 4, 5, and 6 respectively. I used a one-samp le t-test proce dure. The num eric test value (or average rating) was 3.81 (see Appendix M).
Individual recruit as unit of analysis. The correlation of recruits ' average ratings for the labeling task with those for the provocation task assessed whether there was a strong relationship between them. I expected that knowing recruits ' labeling ratings would provide some information about their provocation ratings and vice versa.

Exploratory Data Analysis
Recruit responses to the labeling and provocation tasks were initially screened for missing value patterns. There were no items on the labeling task with 5% or more missing values. On the provocation task, there was one item with 5% or more missing: item 2, "suspect runs out of their house away from me" (n = 7). Twenty-four recruits (17 .1 % ) did not respond (within the 3 second interval) to a range of suspect behaviors (n = 38) at arrest. Eighty-four percent ( or 32) of those behaviors involved suspects using defensive resistance (16), passive resistance (7), nonverbal resistance (7), or verbal resistance (2) against recruits. Some recruits had difficulty responding to nonphysical directed acts of resistance. For missing values, cases were excluded analysis-by-analysis (pairwise deletion).
On substantial subject matter grounds, the way police agencies selected candidates for academy training was comparable: candidates had a minimum education level of a high school degree or its equivalent and had passed a written exam, physical agility test, oral board interview, and a battery of psychological tests.
The onset of data collection from police recruit classes took place at different times in the course of the research: day 1, recruit class 1; day 4, recruit class 2; day 105, recruit class 3; and day 252, recruit class 4. No apparent historical events as sources of extraneous variability were associated with any of the recruit classes. On statistical grounds, diagnostics to determine if the data sets could be combined as a single set involved obtaining descriptive statistics by recruit classes, testing the normality of the data, testing the equality of variances in recruit classes, and conducting ANOV As on the labeling and provocation tasks. Table 1    indicated relatively small degrees of differences between recruit class data sets. Within the limits of sampling error, results favored equal variances in recruit classes .
The last step to determine whether the recruit classes were a homogenous set, permitting combination of the classes, was to conduct separate ANOVAs for the labeling and provocation tasks. In this research , however, the recruit class sizes were unequal. Consequently, the actual discrepancy in variances might be magnified, which would affect the probability of making a Type 1 error (Keppel , 1991). To correct for inflated variance heterogeneity, I applied a more conservative significance level of a= .025 as the criterion value for ANOV A tests. On the labeling task, there was no statistical difference among recruit classes at the p > .025 level (F(3, 139) = 2.83).
Likewise , there was no statistical difference on the provocation task by recruit classes at the p > .025 level (F(3, 136) = 1.93).
In summary, a review of the exploratory data analyses suggested that the recruit class data sets appear homogenous and that I could analyze them as a single set.

Suspect Behavior as Unit of Analysis
Labeling task. Appendix M gives the average severity ratings and corresponding ranks of the 37 suspect behavior s. An intraclass correlation (ICC), two-way mixed effect model (consistency definition) , revealed that the extent of consensus on rating the severity of suspect behaviors was excellent, alpha= 0.99. Figure 1 shows the average ratings of severity (mean average rating = 3.67) plotted against the rank position corresponding to each behavior as listed in Appendix M. "S uspect swears at officer " anchors the far left of the scale, and "suspect fires a handgun at officer" anchors the far right. Also shown are two individual labeling tendencies (individual recruit as unit of analysis), one for a "high rater" (M = 5.35), and one for a "low rater" (M = 2.49). The correlation of average behavior ratings (or vert ical mea sures) for the labeling task with those for the provocation task was almost perfect (r = +0.99 , n = 37, p < .011-tailed).
Diagnostics on the labeling data showed that the suspect behavior items were suitable for factor analysis: the 37 X 37 sample inter-item correlation matrix showed evidence of coefficients greater than 0.30 ; the Kaiser-Meyer-Olkin index was 0.90; and Bartlett ' s test of sphericity was significant, i (666) = 4016.11 at p < .01.
On subject matter and statistical grounds , a five-factor solution best explained the pattern of interrelat edness among behavior items along a severity dimension.
Retained factors had moderat e to high saturat ion levels (.60 and .80) with five or more behavior items per factor, which suggested that the sample size was sufficient to obtain a stable factor pattern that approximated the population pattern ( Figure 1 ). Also shown are two individual provocation tendencies.

Individual Recruit as Unit of Analysis
The correlation of individual s' average ratings for the labeling task with those for the provoc ation task was very strong (r = +0.87, n = 139,p < .011-t ailed). One could predict very well a recruit's provocation ratings from knowing his or her labeling ratings and vice versa, r2 = 0.76. Figure 4 shows recruits' provoca tion ratings ( or average responses to suspect behaviors) plotted by their labeling ratings ( or average severity ratings of suspect behaviors). Using the scale score of 3 to mark the point at which recruits begin to respond with high ratings, the data defined four aggression types: recruits with high labeling and high provocation ratings (HH; n = 106), high labeling and low provocation (HL; n = 17), low labeling and low provocatio n (LL; n = 13), and low labeling and high provocation ratings (LH; n = 3).
Recruits ' labeling and provocation ratings for group separation were as follow: HH = labeling> 3 and provocation> 3; HL =l abeling> 3 and provocation :S 3; LL= labeling S 3 and provocation S 3; and LH = labeling S 3 and provocat ion > 3. There were five missing cases.
There was a convergence between recruits' life events and their aggress ive tendencies. Appendix O shows recruits having high labeling and high provocation ratings (HH) had a life history of experiences that involved both the use of aggress ion and possible use of aggression. Although recruits of the HL , LL, and LH aggression types showed some history of aggressive behaviors , further study of these types and linkage to actual life events require more occupant s for each type. Table 3  Scores on the Marlowe-Crowne social desirability scale exposed recruits' tendencies for giving guarded responses to appear more acceptable or desirable for police work, M = 10.19. A mean score of 13 would have indicated that recruits were extreme in a way that favored making a good impression, but a mean score of zero would have indicated that recruits were not motivated to "fake good."

Perceptions of Aggression
Labeling and provocation judgments are facets of recruits' behavior in which there is agreement on an underlying dimension of perceived aggression (or behavior severity). Average ratings of suspect behaviors for the labeling task with those for the provocation task correlate almost perfectly. When responding to situations such as arrests that might require some degree of coercive action by recruits , they see in like ways the severity of different suspect behaviors at arrest. They also tend to see clusters of suspect behaviors along a severity dimension . A factor analysis revealed that recruits group together different suspect behaviors that they perceive to be related (see Appendix N). Low, moderate, intermediate , high, and maximum labels are conceptually appropriate to describe the pattern of behavior associations in terms of severity or relative degree of potential injury to the recruit. Recruits ' sketch of suspect behaviors along this severity dimension, however, raises some concern for police trainers.
Recruits grouped what police experts and trainers would consider a collection of dissimilar behaviors. For example, recruits saw suspects who raised their arms and made fists, clenched their fists, or stood in fighting stances as displaying the same level of threat as those suspects who pushed, kicked, or punched . Although nonverbal types of behavior might serve as preparatory cues of active resistance , qualitatively and quantitatively they might call for different responses. We can see complex factor loadings where suspect behavior items correlate with more than one factor. For example, recruits thought suspects who fold and lock their arms demonstrate the same willingness to avoid arrest as those suspects who shout and curse. We also see cross factor loading s for the suspect behavior "hitting neck with baseball bat," signaling recruits ' insensitivity to threat level.
Factor analysis procedures uncover recruits' insensitivity to the finer distinctions of some potentia l citizen behaviors during a foreseeable task of policing.
Even so, psychological screeners can arm police trainers with such informati on so that recruit s receive training on police continua and avoid using responses that are physical where verba l ones may be reasonable alternatives .
The labeling task is sensitive to detecting police recruits' conception of behavior severity . Overall, recruits' average ratings of behavior severity are not very different from data generated for a One-for-One comparison model of conservative responses (see Figu re 2). Recruits ' average behavior ratings can serve equally well as a comparison model against which to test individual labeling and provocation differences.

Threshold Measures
How sensitive are recruit s at detecting the severity level of behaviors suspects might use during arrest situations? What is the minimum amount of suspect resistance needed to trigger a forcefu l response? Given that recruits have a commonly understood scale of behavior severity, psycho logical screeners can treat it as a stable stimulus property. They can observe and speak about where a recruit begins to detect differences in behaviors located along the severity scale and whe re the recruit begins to respond to behaviors by usin g tactics that are more forceful. Average labeling and provocation ratings represent the best estimates of where recruits draw these lines ( or thresholds) with respect to detecting aggression and using physical tactics. Average acts of aggression. The data on which I argue the use of typologies is not complete , but is suggestive of certain kinds of candidates that emerge from the data. My proposed typologies are a conceptual speculation informed by police practices, informed and limited by empirical evidence. Police professionalism implies screening the fitness of candidates to manage force events. The HH, HL, LL, and LH typology framework is useful as a way of directing psychological screeners ' attention to patterns of aggressive tendencies revealed in the test data. Each aggression type is specifiable by the perceptions and by the behaviors of its occupants. Although the typology framework has screening utility , its theoretical justification requires further empirical investigation. Predictive validation procedures allow screeners to render an at-risk assessment of eventual uses of job-related aggression. Such procedures are best when screeners use longitudinal studies (Beutler , Nussbaum , & Meredith , 1988;and Bartol, 1991). Future work involves studying how police recruits' labeling and provocation ratings behave relative to job performance data such as academy class rank, department disciplinary action , and citizen complaints of verbal discourtesy and excessive physical force.

Methodological Conclusions, Limitation s, and Future Dir ections
Forming connections between the different combinations of labeling and provocation ratings and job performance measures will begin to round out the theoretical utility ( construct and criterion validity) of distinguishing aggression types. Well-populated aggression types might emerge through continuing data collection: Further analytical descriptions of the aggression types and better discrimination among them are possible. At present, the typology framework is tentative.
The foregoing look at screening police candidates ' aggressive tendencies using a labeling and provocation task is encouraging. My proposal has some empirical support and practical justification. The self-report screening approach provides aggression measurements that are very useful to police practitioners . Future work should provide estimates of the predictive validity of the labeling and provocation framework in police employment evaluations. Future directions may well include measuring provocation responses by alternati ve methods such as a paper-and-pencil test, computer test, and an interactive video situational test.

Appendices Appendix A: Labeling Task
Using the 6 point scale shown below, indicate how much force you think you would associate with each of the following suspect behaviors during an arrest. Place your rating in the response space to the right of the behavior. There are no right or wrong answers. Please stand and remain standing six feet from the projection screen during this portion of the survey. We have provided a six-foot floor marker for you.
Using the 6 point scale shown below, indicate how you think you would respond to each of the following suspect behaviors during an arrest.
Each suspect behavior will appear on the screen. You will have only three (3) seconds to read the behavior.
Following each suspect behavior, the 6 point scale shown below will appear on the screen. You will have only three (3) seconds to choose and state your response aloud. There are no right or wrong answers.
Following the 6 point scale, a blank screen will appear. You will have only three (3) seconds to prepare for the next suspect behavior. There are thirty-seven (37) suspect behaviors. ratings are conceptually similar to the use of thresholds in psychophysic s (Collyer et al., 2004).
Consider how psychological screene rs could index recruits' labeling thresholds using rating functions in Figure  What police trainers hope to accomplish in academy training sessions is raising aggression detection among police candidates (high labeling raters) and offering reasonable guidelines for choosing appropriate responses to threatening situations.
Response correctness might mean raising the average provocation ratings of those candidates who are difficult to provoke into being aggressive and lowering it for those who are more easily provoked. What should be the condition of candidates who have received training is that they are perceptive to threat, but not hyper-receptive , and they are judicious in their forceful responses to threat, but not guarded.

Responses to Aggression
Although recruits agree on a common scale of behavior severity, the labeling and provocation tasks provide evidence that not every recruit perceives or responds to a given suspect behavior in the same way (see Figure 1 and 3). How recruits perceive aggression, however, is a strong predictor of how they will respond when provoked , r2 = 0.76. Prediction was not as strong in Collyer et al. (2004) sample of college students, r2 = 0.13. Perhaps this difference is a selection phenomenon that is especially true in the police employment setting. Candidates have met strict entry standards and might be like-minded and better equipped to both detect and respond to acts of aggression. Figure 4 shows a linear function , higher average labeling ratings generally imply higher average provocation ratings . Consider a high labeling rater, M = 4. On average, this recruit sees many behaviors as aggressive and discounts few of them as nonaggressi ve. If we read up to locate the recruit's average response to different provocative behaviors , we find that the recruit generally uses responses that are more forceful. The recruit is physically aggressive to get the job done. Provocation ratings provide a measurement of physical aggression similar to Buss and Perry' s Aggression Questionnaire (1992; see Table 3).
Provocation ratings assess the readiness of rec~its to be aggressive against a threat of force , but also reveal a degree of anger -"physiological arousal in preparation for aggression " (see Table 3). Anger is facilitative; it primes recruit s' defensive mechanisms when they interpret the behaviors of others as threatening or dangerous. Anger prepares recruits to fight. It might also trigger a hot reactive aggression (e.g., Beck, 1999). With this kind of angry aggression, recruits might hold suspects who resist their authority -showing disrespect -more culpable . Recruits' forceful responses might take the form of punishment ( or umeasonable force) ; suspects "must be taught a lesson for being disrespectful. " Hot reactive aggression might echo recruits ' cultural experience s (e.g., parenting styles, peer group pres sure).
Although anger is a normal reaction to threatening conditions , police trainers can educate recruits on anger's functional disadvantages and provide them with techniques to avoid its harmful effects.

LL, LH, HH, and HL Typologies
This paper proposes a framework for distinguishing police candidates Once in a while I can't control the urge to strike another person.
Given enough provocation, I may hit another person.
If somebody hits me, I hit back.
I get into fights a little more than the average person.
If I have to resort to violence to protect my rights, I will.
There are people who pushed me so far that we came to blows.
I can think of no good reason for ever hitting a person. If the statement is completely undescriptive of you, circle a 1.
If the statement is mostly undescriptive of you, circle a 2.
If the statement is partly undescriptive and partly descriptive of you, circle a 3.
If the statement is mostly descriptive of you, circle a 4.
If the statement is completely descriptive of you, circle a 5.
Please answer every item.
1. I tend to get angry more frequently than most people. T F 34. I have often met people who are supposed to be expe rts who were no T F better than I. 35. It makes me think of failure when I hear of the success of someone I T F know well. 36. I would certainly enjoy beating a crook at his own game. T F 37. I have at times had to be rough with people who were rude or annoying.
T F 38. People generally demand more respect for their own right s than they are T F willing to allow for others. 39. There are certain people whom I dislike so much I am inwardly please d T F when they are catching it for som ething they have done. 40. I am often inclined to go out of my way to win a point with someone who T F has opposed me. 41. I am quite often not in on the gossip and talk of the group I belong to.
T F 42. The man who ha the most to do with me when I was a child (such as my T F father, ste p-father, etc.) was very strict wit h me. 43. I have often found people jeal ous of my goo d ideas just becau se they had T F not thought of them first. 44. When a man is with a woman , he is usually thinkin g of things related to T F her sex. 45. I do not try to cover up my poor opinion or pity of a person so that he T F won't know how I feel. 46. I have frequently worked under people w ho seem to have things arranged T F so that they get credit for good work, but are able to pass off mistakes to those under them. 47. I strongly defend my own opinions as a rule. Have been subject to court-martial or disciplinary proceedings under the Uniform Code of Militar y Justice (include non-judicial , Captain's Mast, etc.)

37
Received less than an honorable discharge from the military (provide the type of discharge in the following Description of Circumstances and Outcome section) In the space below, enter the year (or approximate year), the item number and a brief description of the circumstances and outcome surrounding each activity or event that you listed above as happening to you or was something that you did.
Year Item Description of Circumstances and Outcome  Note. HH = high labeling high provocation raters; HL = high labeling low provocation raters; LL= low labeling low provocation raters; LH = low labeling high provocation raters. See Appendix L for a complete description of life event items.