Methodological Issues in Evaluating Police Performance Research Paper

This sample Methodological Issues in Evaluating Police Performance Research Paper is published for educational and informational purposes only. If you need help writing your assignment, please use our research paper writing service and buy a paper on any topic at affordable price. Also check our tips on how to write a research paper, see the lists of criminal justice research paper topics, and browse research paper examples.

Measuring and evaluating police performance is not easy. Both technical issues, such as designing the best measures, and normative concerns, such as deciding what tasks the police should be performing, are important to consider (Moore and Braga 2004). The focus here is on methodological issues, not normative ones, but one cannot assess the methodological quality of police performance measures without some consideration of what “good” performance entails.

Police performance traditionally has been evaluated with data on crime-related outcomes (e.g., calls for service, incidents, and arrests). “Measurement of the police’s impact on crime has long been the sine qua non of those endeavors that might be loosely dubbed ‘police performance measurement’” (Wycoff and Manning 1983: 15). This does not mean that crime measures are the only relevant performance measures, but crime is logically one important indicator of police performance. The focus below is on the flaws of using aggregate crime data to assess departments and the prospects for using crime data from evaluations as one means to assess performance.

Measures of arrests, clearance rates, and response time are all crime-related outcomes, but they shift the focus to the activities the police are engaged in. None of these measures, however, gives a true representation of how the police are spending their time. Better data on actual activities and resource allocation are needed. Advances in technology offer possibilities for improving departmental knowledge of officer activities, and methods such as systematic social observation (Mastrofski et al. 2010) allow for a rigorous assessment of policing activities.

One component missing from traditional measures is an assessment of citizen views of the police. This seems particularly important in efforts to evaluate the dual police goals of fairness and effectiveness noted by the National Research Council (2004). What can the police do to assess the fairness of their actions? Interviews and surveys of citizens offer police important feedback, although these methods are not without flaws. Police cannot truly evaluate their performance without understanding the demands and viewpoints of their “customers,” the citizenry at large, and particularly the citizens they have contact with.

Multicomponent measurement systems help generate as valid a picture of police performance as possible. Quantifying success remains a challenge however, and so proxy measures are often used. Such measures are necessary but problematic (Swindell and Kelly 2000). Measuring police outputs may not necessarily provide reliable data on outcomes. These proxy measures that police can use for assessing performance are discussed below. Methodological issues in using official data on crime are addressed first, followed by issues in understanding what the police are spending their time on, and then a discussion of measuring citizen perceptions. Finally, concluding remarks on the prospects for evaluating police performance are included.

Official Data On Crime

If measures of crime are one piece of the puzzle to evaluating police performance, how can official crime data best be used? One major concern in most current assessments of police performance is the focus on citywide crime data and trends. In some ways, this focus makes sense, because such data are easily available. Over 18,000 police agencies provide data to the Federal Bureau of Investigation (FBI), which then presents annual crime totals by agency for the Uniform Crime Report (UCR).

The problems with the UCR are numerous (see Mosher et al. 2002), but a few important points are worth reviewing. First, the UCR only covers crimes reported to the police, and for crimes other than homicide, a large portion of crime goes unreported. Second, the FBI has specified definitions for each offense type that do not necessarily match up with each jurisdiction’s statutory definitions. As an example, the 2010 Annual Report of the Washington, DC Metropolitan Police Department (MPD) presents FBI versus DC Code definitions for a number of offenses to demonstrate that UCR data and DC data do not always match. Such differences are not always minor. The MPD reported 3,538 aggravated assaults to the FBI in 2010, but using the DC code definition for the close to equivalent offense of assault with a deadly weapon, the city had 2,615 offenses. Third, even when focusing on just the most serious Part I crimes, jurisdiction-wide crime rates are dominated by the least serious of these crimes, larceny thefts. These issues are raised here to emphasize the difficulties of using UCR data to assess performance. If police performance measurement is inherently difficult, starting with a flawed data source makes things even more challenging.

While the FBI strongly discourages using the data from the UCR for ranking cities, this does not prevent annual rankings of the “most dangerous” and “safest” cities from being published by media outlets. As the FBI warns on its website, these rankings do not consider the many variables that affect crime in particular jurisdictions and thus can be misleading and incomplete. Ranking cities based on UCR data is like running a regression model on the correlates of crime and only including a single predictor variable. There is far too much variance unexplained in an approach that looks exclusively at crime at the jurisdiction level and ignores other jurisdiction-level characteristics.

Despite the warning from the FBI, agencies are interested in knowing how they compare to their peers. Evaluating performance is difficult without benchmarks, and so comparisons to “similar” departments would ideally be an important part of judging how typical the results in one agency are. Colleges and universities keep a close eye on rankings from US News and World Report, and many evaluate their performance in relation to peer institutions. Should police departments be expected to do the same? Not everyone believes ranking cities is inherently problematic. Sage Publications publishes an annual book ranking cities by crime rate. The latest version is Crime City Rankings 2011–2012: Crime in Metropolitan America. The International City/County Management Association (ICMA) provides performance measures across 18 areas including policing for over 200 cities through their Center for Performance Management. Cities report information to ICMA, which then compiles a variety of crime and service-related measures for agencies. The organization actively encourages comparisons between peer organizations. What ICMA offers to cities that the FBI does not is an extensive effort to ensure collected data are valid. While the issues in comparing across jurisdictions still exist, one improvement here is the review of the data by an outside organization. Similar arguments have been made by Sherman (1998), who points to the benefits of police agencies using auditors just like private companies to ensure that crime data are reported validly. Alpert and Moore (1993: 122) also argue that “existing measures could be improved to live up to the challenge of professionalism. This would include audited clearance and arrest rates.”

In contrast, in the United Kingdom the government is explicitly involved in assessing and ranking police forces. The UK is not entirely comparable to the USA because statutes and crime definitions are consistent across the UK, and the government has national oversight of the police. Still, the work of Her Majesty’s Inspectorate of Constabulary is far more comprehensive than anything in the USA at the local or national level. The public has access to comparable data for each of the 43 forces on crime, quality of service, and expenditures. Importantly, the data are checked for quality, and each agency has been assigned peer agencies, which are as similar as possible in population and demographics.

While comparing agencies is one challenge, the major issue with the use of crime data to assess performance is with the unit of analysis. As the unit increases in geographic size and scope, it becomes increasingly difficult to estimate to what extent police practices contribute to changes in observed crime. Linking police performance to crime at the jurisdiction level is most difficult of all. In recent years, scholars have increasingly recognized that the police can have some beneficial impact on crime (see National Research Council [NRC] 2004). This is a change from conventional wisdom even as recently as the early 1990s. Still, even the strongest advocates for the police role in reducing crime do not believe that police have complete control over crime rates. A multitude of factors combine to influence crime, and disentangling how each component influences crime is an impossible task. That is what makes evaluating police performance at the city level so challenging. There is no validated way of estimating how much the police contributed to the crime rate in a city for a particular year versus all the other factors that influence crime.

Such efforts to link police performance to crime rates are not hopeless, however. Importantly, when the scope of the geographic analysis is narrowed, it becomes possible to link police activities to crime declines, and this becomes especially true when smaller geographic units are used in concert with rigorous research designs to test the impact of particular police practices. In many ways, this makes more sense than a citywide assessment. While police are responsible for crime across the entire city, in reality their resources tend to be concentrated in a small number of places, and it is in those places where one would expect police performance to make the most difference.

This point is made by Bayley (1996: 49), who argues that police efforts to prevent crime should be evaluated only when they are targeted on specific sorts of crime in specific locations, “To demonstrate their effectiveness, the police have to target their shots.” Similarly, Wycoff and Manning (1983) note that efforts to improve the measurement of police performance should focus on either improving the methodology of measurement or better conceptualizing what is to be measured. Aggregate official crime data should not be used in comparative studies, because the data across cities are not comparable. They point to a model of evaluating performance that relies on rigorous evaluation and close attention to differences across cities: “The most reliable analyses will be experimental studies conducted on a site-by-site basis.. .The next most reliable method will be longitudinal studies, again done site-by-site, with particular attention to contextual and organizational factors” (Wycoff and Manning 1983: 19).

Wilson (1993) also argues persuasively about this point, recognizing that a problem in police performance measurement is that there are no adequate performance measures, because it is impossible to precisely estimate what portion of any measure of public safety in a city is attributable to police actions. If the police are unable to adequately assess their performance with citywide crime rates, they often turn to proxy measures, such as response time or clearance rates, which have their own flaws, and are not necessarily related to the crime rate. Focusing attention on specific police actions in specific geographic areas is Wilson’s (1993) solution. He suggests the usefulness of police targeting particular problem neighborhoods (however, the agency defines neighborhood) and then using “microlevel” measures (e.g., changes in calls for service in that neighborhood) to assess the effectiveness of police efforts to address neighborhood problems. As Wilson (1993: 162) argues, “These micro-measures are likely to be among the few valid measures of police performance. They may well lead to conclusions quite at variance with city-wide, aggregate data.” Randomized experiments are particularly useful in combination with these micro-measures because such designs are assumed to have the highest internal validity and allow for the strongest causal statements about the effects of interventions (see Cook and Campbell 1979). This is the case because subjects are randomized into treatment and control conditions, and thus, their inclusion in one group or another is simply determined by chance. If an agency uses a randomized design to assess an intervention in a specified geographic area, then a crime decline observed in this area (when comparing the treatment area to a randomly allocated control site) can reasonably be attributed to the efforts of police. Randomized experiments of geographic police interventions offer the opportunity for police to persuasively and validly evaluate how their performance influences crime and disorder. Randomized experiments do not solve the problems with official data and the UCR, such as the vast underreporting of most crimes, but experiments offer a way to more rigorously assess how police performance affects crime.

Hot spots policing is a particularly good example of a policing intervention that takes place in a small unit of geography and which has been rigorously evaluated with a series of randomized trials (NRC 2004). A large body of research suggests that police can have a significant beneficial impact on crime and disorder when they narrow their focus to microgeographic units with high rates of crime (see Weisburd and Eck 2004). Importantly, work examining hot spots of crime in Seattle also suggests that citywide crime trends tend to mask the trends at particular street blocks in the city (Weisburd et al. 2012). While Seattle enjoyed a citywide crime decline in the 1990s, this decline was isolated to a small percentage of blocks, while the vast majority of blocks showed little change and some even showed substantial crime increases. This reinforces the problem of using citywide crime data to assess police performance.

Compstat provides another good example of how crime data can be used as a means of evaluating police performance. Compstat forces police commanders to focus in on crime problems occurring in short-time periods and assess how police actions can address these problems quickly and efficiently. Compstat was an effective approach for evaluating and altering performance in the New York Police Department. Moore and Braga (2003: 446) note that in Compstat, “the measures are simple, objective, reliably measured, and continuous so that changes in performance can be observed over time within an operational unit, and across units that are roughly similar.” While Compstat does not make use of randomized experiments in its assessment of performance, it does make important strides in generating measures that are consistently measured over time, allowing for comparisons within and across units and making it possible to more closely link police actions in a specified time period to changes in crime.

What Are The Police Spending Their Time On?

The police rely heavily on crime measures to assess their performance, but these are not the only measures traditionally used. Police also are interested in understanding what officers are doing with their time, and hence, they often track measures of police activity such as response time to 911 calls, arrests made (and citations written), and clearance rates. These measures, however, provide only limited data on what officers are doing with their time, and, if crime control remains a primary measure of police performance, research suggests that rapid response and general increases in arrests are not effective ways to address crime (see Weisburd and Eck 2004). Additionally, these measures give insufficient attention to police resource allocation. Systematic social observation research (see below) suggests that only a very small portion of officer time is spent arresting offenders and a fairly small proportion of time is spent responding to calls for service. Departments are not adequately measuring performance by focusing on these two measures alone.

Clearance rates can also be a problematic measure. The FBI reporting rules allow for situations in which data are presented in a way that is technically correct but misleading. For example, the MPD widely touted a 94 % homicide clearance rate in 2011. One would likely assume that DC police made an arrest in 94 % of 2011 murder cases. The FBI, however, counts homicide closures that occurred in any year in the current year’s clearance rate. DC homicide detectives closed 62 of the 108 homicides in 2011 and also closed about 40 homicides that occurred in earlier years. While the 94 % homicide clearance rate is accurate under FBI reporting rules, it would seem to make more sense to report on the percentage of homicide cases closed within 12 months, rather than report a percentage that is not entirely related to current year data. Additionally, while clearance rates have been on the decline in recent decades, the factors associated with these declines may be outside police control. Ousey and Lee (2010) find that changes in police personnel and workload are not associated with changes in the clearance rate, but that changes in immigration can decrease clearance rates. Like crime rates, certain factors that police have no control over may contribute to decreased clearance rates.

Number of arrests and police response time are both measures, to some extent, of police productivity. These measures, however, give police little information about what officers are spending their time on day to day. How are departmental resources being used, and is resource allocation effective and efficient? These are also important questions for police and present challenges for measurement. Unlike profits in the private sector, there is no single bottom line in policing, and so assessing whether resources are being properly used is not easy.

Ostrom (1983) makes this clear in her discussion of equity in the delivery of police services. What should be the standard for equity and how should this standard be measured? Should services be distributed based on taxes paid, or based on need, or to all citizens equally, or to ensure that all citizens get equal results? These are normative questions, but the effort to assess equity presents a host of methodological problems. The service that police are providing is ambiguous, making it difficult to measure whether service is being distributed equitably. Expenditure data can be used for assessing level of service, although budgeted resources do not necessarily correspond to services delivered to particular communities. Ostrom (1983) points to four flawed ways researchers typically assess equity in policing. Data on the distribution of police across jurisdictions typically shows that higher crime areas receive more police (i.e., the needier areas get more), but this says little about what police are actually doing in these areas. As Ostrom (1983: 110) notes, “The difference between an occupying army and a service-oriented police force cannot be determined solely by examining personnel allocation.” Similarly, response time data and crime data do not provide much information on the actual delivery of police services in these areas. A final less commonly used method is citizen surveys to assess fear of crime, which are potentially useful, but survey data do not necessarily provide insights into why citizens feel the way they do (see below).

A recent study in Dallas (Weisburd et al. 2013) was the first randomized experiment to make use of data from automatic vehicle locators (AVL). AVL technology tracks the exact location of a patrol car repeatedly. This has obvious benefits for officer safety in terms of monitoring patrol cars, but it also has benefits for assessing where officers are going during their shift. In hot spots policing studies, saturation patrol is often an important component of the intervention, and so AVL technology could allow the department to assess how often officers are visiting hot spots and whether treatment is being delivered as intended. While police unions are reluctant to allow departments to closely monitor AVL data, there is potential here to better identify officers’ spatial location during their shift.

To fully evaluate performance, police agencies must know not only where officers are but what they are doing. While agencies have increasingly moved towards a greater embrace of problem solving and working with the community, there is little or no official departmental data on whether officers are actually engaging with the community or using proactive problem solving (Alpert and Moore 1993). One option is to better survey officers about their activities while on duty. Most police agencies currently keep track of how long officers spend on particular calls for service, but have much less data on what officers are doing in their uncommitted time. Dadds and Scheide (2000) discuss the use of activity measurement in Australia as a way to better monitor what officers are doing in the field and how these activities may be linked to desired outcomes. Officers spend 1 or 2 weeks annually filling out a survey on specifically how they are spending their time on each shift. This method relies on the veracity of officer responses, and it can be difficult to place officer activities into specific categories. Still, the survey provides an opportunity to estimate how officers’ time is divided between traffic services, community police services, crime management, emergency response management and coordination, and criminal justice support.

Similarly, Mastrofski (1983) calls for greater police attention to noncrime services provided to citizens. While incident reports and arrests are well recorded, police devote a substantial portion of their time to providing information or assistance, dealing with nuisances, or breaking up disputes, and unless a report is written, none of this noncrime service is well recorded. Providing officers with a set of response activity codes that could be used to report what actions were taken in a particular citizen encounter would be useful and would provide more data on how police were allocating their noncrime services.

Mastrofski (1996) provides five different methods for evaluating what officers are doing in interactions with citizens. The first, as described above, is officer self-reports of behavior. While inexpensive, this method suffers from potential problems in reporting validity. The reports of citizen who have interactions with the police are another possibility and are discussed more below. A third possibility is using evaluations by other human service professionals who regularly interact with the police. While such evaluations would be less subjective than self-report data, such professionals would not be present during a large proportion of citizen interactions. The last two methods offer the most objective means of evaluating police performance: the use of indirect third-party observation (e.g., video recording) and the use of direct third-party observation. Direct third-party observation is discussed below as a means for systematic social observation. The use of police video recordings has been fairly uncommon to date, although as more departments record police-citizen interactions increases and as technology improves, this would seem to be a promising and inexpensive route for future research on police performance.

Systematic social observation (SSO) offers an even better way to assess what activities officers are engaged in and provides a more objective measure of police-citizen interactions. SSO makes use of protocols in field observations of the police to ensure that data collection is standardized within and across observers. As Mastrofski et al. (2010: 243–244) note, “it offers enhanced prospects of validity, and in many situations it provides for increased confidence in reliability, because of the researcher’s direct access to the phenomenon of interest and greater control and transparency of data encoding. Further, it affords greater precision in capturing details of the phenomenon and its context.” SSO is not without potential threats to validity. A primary one is the possibility of observer error in recalling events. While multiple observers of the same scene would be ideal to avoid this, it becomes difficult both in terms of cost and logistics to use more than one observer per officer. Another concern is reactivity effects. Are officers responding differently when observers are watching their actions? Research on this topic is limited, but in a long-term project where officers are being observed repeatedly, it seems less likely that reactivity would be a major problem. Systematic social observation has been an important component of a small but growing number of research studies on police behavior. Famega et al. (2005) used systematic social observation in Baltimore to assess both how officer time was allocated and the reason for officer actions over the course of 163 shifts. Their finding that 75 % of officer time is unassigned suggests that simply examining calls for service data is insufficient for understanding what officers are doing during their shifts.

An important question with data on police activities and interactions with citizens is how these data can be used to evaluate police performance. Even with the problems with crime data, it seems clear what the desired outcome is – overall crime reduction. When evaluating what officers are spending their time on and how they interact with citizens, the outcomes become fuzzier. In terms of what officers spend their time on, if the goal is to relate these activities to crime control effectiveness, then it is easier to assess whether or not officers are performing in ways that are evidence based. But what about evaluating how an officer handles a particular domestic violence call, or how officers carry out a traffic stop? One part of evaluating performance here would be assessing levels of citizen satisfaction and beliefs about the fairness of the interaction, an issue discussed in the next section.

But police, like other professionals, do not want their performance assessment based only on the views of outsiders. Mastrofski (1996) suggests that police master craftsman, as chosen by surveys of fellow officers, could work to develop performance criteria for difficult or common police problems. Then, observed officer actions could be compared to these criteria. The development of such criteria would not be easy and would take much deliberation among highly regarded officers, but could be quite useful for helping to evaluate what good performance should look like. Recent research by Mastrofski et al. (2011) suggests both that officers can identify master craftsman (i.e., officers whose performance and opinion they respect) and that officers can be interviewed about evaluating police performance in particular situations. While officer views are not always consistent on what constitutes good performance, this is a promising area for future research on how police can use observational data to evaluate police performance based on agreedupon departmental standards.

Measuring Citizen Perceptions

While innovations such as community policing have stressed the need for police to partner and work closely with community members to effectively address crime and fear of crime, the police have not been quick to add measures that will help gauge the community’s view of the department (Alpert and Moore 1993; Moore et al. 2002). As Mastrofski (1996: 210) notes in arguing for more attention to be placed on police-citizen encounters, “Crime rates, fear levels and citizen satisfaction – though keenly felt at the personal level – are hard to observe directly.. .It is not at all clear that the public holds the police accountable for the crime rate, but they do hold them accountable for their actions.” This makes assessing how citizens view officer actions an important aspect of evaluating police performance. While departments have traditionally collected data on citizen complaints, such data are not sufficient for evaluating performance (Moore et al. 2002). While complaint data are an important way for departments to learn about and remedy improper police behavior, complaint data do not give a full picture of police-citizen encounters, and importantly for efforts to evaluate “good” performance, complaints provide no data on positive police-citizen interactions.

Data on citizen perceptions can be gathered in multiple ways. Some police departments, regularly survey 911 callers in satisfaction surveys. A number of researchers have used telephone surveys to gather information on recent citizen encounters with the police and how citizens viewed such encounters, or on citizens’ perceptions of their neighborhood and/or recent victimization experiences (see Skogan 1999). Rosenbuam et al. (2008) were successful in assessing citizen views of the police through an Internet survey in Chicago. In addition to large-scale surveys, qualitative interviews can also be an important way to gather detailed information on police-citizen encounters.

Brown and Benedict (2002) point to some methodological issues that can arise in research on citizen perceptions of the police. Broad-based citizen surveys may not capture a sufficient number of witnesses, victims, and suspects and thus may oversample individuals who have not had direct contact with the police. Even when researchers have access to lists of individuals who have had recent police contact, it is not always easy to track these people down and to acquire their consent to participate in research. Such individuals, particularly suspects, are often skeptical of any research related to the police and are reluctant to believe that confidentiality measures are in place. Victims or witnesses may not feel comfortable discussing a sensitive matter, particularly for violent crimes. It may be especially difficult to survey young minority males, a group of particular interest in assessing police performance, because of their typically high levels of contact with the police and low levels of perceived police legitimacy. Wolfer and Baker (2000) suggest the additional problem of “halo effects” in survey research, especially in small and rural agencies. Citizens in these areas may rate the police as satisfactory based on personal relationships and not on actual good performance. In their study of problem-oriented policing, citizens tended to rate the police highly in areas where other objective measures suggested the police were not doing a particularly good job.

Brown and Benedict (2002: 562) also stress the benefits of efforts to “develop objective, independent measures of police activity to better determine the impact of officers’ behaviors on attitudes towards the police.” Efforts to combine SSO and citizen perception data could be particularly beneficial here. While this raises concerns about citizen privacy, an ideal research design would be to use SSO data to objectively assess police-citizen encounters and then conduct follow-up interviews with these citizens to better understand their views of the encounter and the police. Such a design would help better link citizen attitudes and officer performance.

While large-scale survey research has the benefit of generating larger sample sizes, Shilston (2008) stresses the importance of using qualitative data to assess police performance.

General satisfaction surveys will likely be unable to provide detailed depictions of citizen views of and attitudes towards the police. Such surveys can be useful as a quantitative summary of perceptions, but they fail to provide insight into the individual experiences of those who interacted with the police. Interviewing those who had direct contact with the police is a better method for gaining in-depth information on citizen perceptions.

Fielding and Innes (2006: 135) also call for a more qualitative approach to assess public perceptions of the police and argue that “A better approach than using randomly selected respondents to measure change may be to employ judgements made by key informants: individuals who have detailed knowledge of communal life in a locale and are in a position to provide a meaningful assessment of how policing there has improved or worsened.” This purposive sampling technique may provide a richer data source for police. They also point to the benefits of qualitative assessments of the narratives from complaint data to get a more descriptive view of negative police-citizen encounters. This could provide greater insight for police departments than a simple count of the number of complaints.

Conclusions: What Should Police And Researchers Be Measuring?

In efforts to evaluate police performance, it is important for police and researchers to use multiple methods of data collection to create a fuller portrait of police performance. Cordner (1996: 200) sums this up well by noting “The bottom line is that policing must be judged by bottom lines.” Police should continue to collect data on the traditional measures above, but should be cognizant of the problems with these measures. They should also expand the list of measures to include variables that reflect the importance of police activities and interactions with citizens and the views of citizens, particularly in terms of the fairness of police actions. For example, official data on drug calls for service and drug arrests are problematic, because they do not always match up with drug market boundaries. Official data should be supplemented with resident surveys to assess the severity and visibility of problems, as well as data sources from other agencies, such as the location of 911 calls for drug overdoses. Just as multiple data sources are a better way to identify drug markets, they would also be important to assess whether police are having an impact on drug markets. Arrest patterns reflect more about police enforcement priorities than whether police reduced harm from particular drug markets. But the triangulation of official data, health data, and resident data could help indicate whether police actions were having impacts on citizen safety and well-being. Skogan (1999) also points to the benefits of triangulation in his review of methodological challenges in using both survey and official crime data. When victimization data, official crime data, and citizen perception data are all pointing in the same direction, then evaluating police performance can be done with greater confidence.

Moore et al. (2002) provides a series of goals for police and measures that would be useful to include in what he describes as a “model annual report.” These include not only traditional measures, in efforts to reduce criminal victimization and call offenders to account, but also measures explicitly focused on the views of citizens in efforts to reduce fear of crime and satisfy customer demands and achieve legitimacy with those policed. Such efforts to view citizens as the customers of police fit in well with the private sector notion of total quality management (TQM). Hoover (1996) describes how TQM focuses on a number of aspects relevant for evaluating police performance, including using routine and constant measurement to enhance performance, focusing on the concerns of customers, and being explicitly concerned with counting and quantifying interactions with customers and the outcomes of these interactions. Management strategies from the private sector do not transfer perfectly to policing. “Good” policing has no clear definition, and so success is not as easy as raising profits. Hoover (1996: 19) points out that “Not all qualitative elements of police work can be quantified. There are too many exigencies, contingencies and intangibles.” Nonetheless, efforts to better measure various aspects of police performance are worthwhile and important to more fully evaluating police fairness and effectiveness.

One final issue is whether police performance measurement should be focused on outputs or outcomes. Stephens (1996: 121) points to the need to supplement standard measurements of police performance with measures of outcomes and notes, “The issue of measuring effectiveness is not entirely one of discarding the old and replacing it with new measures. It is one of putting the traditional measures in perspective (using them when appropriate) and placing greater emphasis on outcomes than process or ‘proxy measures.’” While it seems reasonable to focus on the outcomes of policing, Moore et al. (2002) argues that while outcomes are often seen as more important, one cannot ignore the outputs (and inputs) of policing. Outcome measures can be challenging to collect, and in policing, the concern is not just about outcomes but also how police are doing their job. While reducing crime and disorder is a desired outcome that should be assessed as well as possible, it is equally important that the police are being fair in their use of force, their allocation of resources, and their interactions with citizens.

Starbuck (2004: 337) summarizes the problems and prospects of evaluating performance in any agency: “Performance measures are everywhere, but they are filled with errors, and these errors are likely to cause faulty inferences. We should distrust performance measures, but we cannot ignore them because they are powerful motivators that can produce dramatic improvements in human and organizational performance.” While evaluating police performance entails a number of methodological challenges, measuring performance is an important way to both bring about desired changes in police organizations (e.g., Compstat) and assess whether police performance fits in with notions of what “good” policing should entail.

Bibliography:

Alpert G, Moore M (1993) Measuring police performance in the new paradigm of policing. In: DiIulio JJ et al (eds) Performance measures for the criminal justice system. Bureau of Justice Statistics, U.S. Department of Justice, Washington, DC, pp 109–142
Bayley DH (1996) Measuring overall effectiveness. In: Hoover LT (ed) Quantifying quality in policing. Police Executive Research Forum, Washington, DC, pp 37–54
Brown B, Benedict W (2002) Perceptions of the police: past findings, methodological issues, conceptual issues and policy implications. Policing: Int J Police Strategies Manag 25:543–580
Cook TD, Campbell D (1979) Quasi-experimentation: design and analysis issues for field settings. Rand McNally, Chicago
Cordner GW (1996) Evaluating tactical patrol. In: Hoover LT (ed) Quantifying quality in policing. Police Executive Research Forum, Washington, DC, pp 185–206
Dadds V, Scheide T (2000) Police performance and activity measurement, vol 180, Trends and issues in crime and criminal justice. Australia Institute of Criminology, Canberra
Famega CN, Frank J, Mazerolle LG (2005) Managing police patrol time: the role of supervisors. Justice Quart 22:540–559
Fielding N, Innes M (2006) Reassurance policing, community policing and measuring police performance. Policing Soc 16:127–145
Hoover LT (1996) Translating total quality management from the private sector to policing. In: Hoover LT (ed) Quantifying quality in policing. Police Executive Research Forum, Washington, DC, pp 1–22
Mastrofski SD (1983) The police and noncrime services. In: Whitaker GP, Phillips CD (eds) Evaluating performance of criminal justice agencies. Sage, Beverly Hills, CA, pp 33–61
Mastrofski SD (1996) Measuring police performance in public encounters. In: Hoover LT (ed) Quantifying quality in policing. Police Executive Research Forum, Washington, DC, pp 207–241
Mastrofski SD, Parks RB, McCluskey JD (2010) Systematic social observation in criminology. In: Piquero AR, Weisburd D (eds) Handbook of quantitative criminology. Springer, New York, pp 225–247
Mastrofski SD, Willis JJ, Revier L (2011) How police distinguish quality in police work. Paper presented at the Annual Meeting of the American Society of Criminology, Washington, DC
Moore MH, Braga AA (2003) Measuring and improving police performance: the lessons of Compstat and its progeny. Policing: Int J Police Strategies Manag 26:439–453
Moore MH, Braga AA (2004) Police performance measurement: a normative framework. Crim Just Ethics 23:3–19
Moore M, Thacher D, Dodge A, Moore T (2002) Recognizing value in policing: the challenge of measuring police performance. Police Executive Research Forum, Washington, DC
Mosher CJ, Miethe TD, Phillips DM (2002) The mismeasure of crime. Sage, Thousand Oaks
National Research Council (2004) Fairness and effectiveness in policing: the evidence. Committee to review research on police policy and practices. In: Skogan W, Frydl K (eds) Committee on law and justice, division of behavioral and social sciences and education. National Academies Press, Washington, DC
Ostrom E (1983) Equity in police services. In: Whitaker GP, Phillips CD (eds) Evaluating performance of criminal justice agencies. Sage, Beverly Hills, pp 99–125
Ousey GC, Lee MR (2010) To know the unknown: the decline in homicide clearance rates 1980–2010. Crim Just Rev 35:141–158
Rosenbuam D, Schuck A, Graziano L, Stephens C (2008) Measuring police and community performance using web-based surveys: findings from the Chicago internet project. National Institute of Justice, U.S. Department of Justice, Washington, DC
Sherman LW (1998) Evidence-based policing. Ideas in American policing. Police Foundation, Washington, DC
Shilston TG (2008) One, two, three, what are we still counting for? Police performance regimes, public perceptions of service delivery and the failure of quantitative measurement. Policing 2:359–366
Skogan WG (1999) Measuring what matters: crime, disorder, and fear. In: Langworthy RH (ed) Measuring what matters: Proceedings from the police research institute meetings, pp 37–54. Washington, DC: National Institute of Justice and Office of Community Oriented Policing Services, U.S. Department of Justice
Starbuck WH (2004) Methodological challenges posed by measures of police performance. J Manag Gov 8:337–343
Stephens D (1996) Community problem-oriented policing: measuring impacts. In: Hoover LT (ed) Quantifying quality in policing. Police Executive Research Forum, Washington, DC, pp 95–129
Swindell D, Kelly JM (2000) Linking citizen satisfaction data to performance measures: a preliminary examination. Public Perform Manag Rev 24:30–52
Weisburd D, Eck JE (2004) What can the police do to reduce crime, disorder, and fear? Ann Am Acad Pol Soc Sci 593:42–65
Weisburd D, Groff ER, Yang SM (2012) The criminology of place: street segments and our understanding of the crime problem. Oxford University Press, New York
Weisburd D, Groff E, Jones G, Amendola KL, Cave B (2013) The Dallas AVL experiment: evaluating the use of automated vehicle locator technologies in policing. Police Foundation, Washington, DC
Wilson JQ (1993) The problem of defining agency success. In: DiIulio JJ et al (eds) Performance measures for the criminal justice system. Bureau of Justice Statistics, U.S. Department of Justice, Washington, DC, pp 156–164
Wolfer L, Baker TE (2000) Evaluating small town policing: methodological issues. J Police Crim Psychol 15:52–63
Wycoff MA, Manning PK (1983) The police and crime control. In: Whitaker GP, Phillips CD (eds) Evaluating performance of criminal justice agencies. Sage, Beverly Hills, pp 15–32