Behavioral Observation in Schools Research Paper

This sample Behavioral Observation in Schools Research Paper is published for educational and informational purposes only. Free research papers are not written by our writers, they are contributed by users, so we are not responsible for the content of this free sample paper. If you want to buy a high quality research paper on any topic at affordable price please use custom research paper writing services.


Behavioral observation refers to a method of assessment whereby human observers objectively record the ongoing behavior of a person or persons in specific environmental circumstances.


  1. Introduction
  2. General Methods of Behavioral Observation
  3. Measuring and Recording Behavior Systematically
  4. Coding Schemes
  5. Observational Instruments
  6. Summarizing Behavioral Observation Data
  7. Reliability and Validity of Behavioral Observation
  8. Summary

1. Introduction

Direct observation is one of the most widely used individual assessment procedures by school-based professionals. In a survey of more than 1000 school-based professionals, of the 26 different types of individual assessment instruments listed across seven different assessment categories (e.g., aptitude, social–emotional, and personality), behavioral observation methods ranked highest in terms of frequency of use. Overall, respondents indicated that on average they conduct approximately 15 behavioral observations during the course of a typical month.

2. General Methods Of Behavioral Observation

When conducting behavioral observations, school based professionals generally rely on one of two basic approaches to gathering information. One form, narrative recording, refers to the observation and collection of information on student behavior in naturally occurring arrangements with little constraints placed on the observer for how and/or what to observe and record. With narrative recording, the observer generally describes the observed situation making notes of specific behaviors as they occur. The method(s) of observation, the type of information noted, and the manner in which the information is summarized are left to the discretion of the observer. The strength of such procedures lies in the flexibility that the observer has in choosing when and how to observe and the minimization of obtrusiveness or reactivity that may occur as a result of the presence of the observer. For interpretative purposes, narrative recording is often used to help develop hypotheses about the various behavioral and environmental factors that may be worthy of further observation and analysis. This flexibility, however, is also one of the main weaknesses of narrative recording. Because the observer has great autonomy in choosing how and what to observe, judgments about the worth of such reports are inextricably bound to the subjective judgments of the observer. As such, it would be highly unlikely for two independent observations collected at the same time to appear identical in the information provided. Therefore, narrative recording is generally best used as a precursor to more specific, objective accounts of behavior.

In contrast to narrative recording, systematic direct observation refers to the observation of behavior that is explicitly defined under predetermined settings. Although this approach is also concerned with observing behavior under naturally occurring environmental contexts, the aim is to define beforehand the behaviors of interest, choose specific recording strategies, and have observers record whenever behavior corresponding to the predefined operational definitions occurs. In particular, systematic direct observation is distinguished by five characteristics: (i) The goal of observation is to record specific behaviors, (ii) the behaviors being observed have been operationally defined a priori in a precise manner, (iii) observations are conducted using standardized procedures and are highly objective in nature, (iv) the times and settings for observation are carefully selected and specified, and (v) scoring and summarizing of data are standardized and do not vary from one observer to another. In comparison to narrative recording, a main objective of systematic direct observation is to have independent observers agree on what is observed and recorded, assuming that they have observed the same stream of behavior.

3. Measuring And Recording Behavior Systematically

Various types of data can be collected during systematic direct observation. A workable definition of a target behavior is one that provides an accurate description of the behavior and clearly defines the boundaries of its existence and nonexistence. Therefore, constructs and reifications do not lend themselves well to direct forms of observation. For example, raising one’s hand to be called on is an observable and measurable behavior. Behaving ‘‘off the wall’’ is not something that can be directly observed (although operational definitions of the behaviors that constitute ‘‘off the wall’’ could be developed). Once behavior is defined, the calibration of the operational definition is determined by the nature of the data—the frequency of its occurrence and the particular interests of the observer. In addition, practical considerations, such as the availability of observers, the amount of time a student is accessible, or any combination of these factors, all dictate the type of data collected.

4. Coding Schemes

When observing behavior systematically, observers generally use any one of a number or a combination of approaches to recording behavior. The more common approaches usually involve counting the number of times a behavior occurs during a specified time period and/or noting the presence or absence of behavior at specific time intervals of an observational session. Frequency or event recording involves counting the number of times a behavior is observed during a specified time period. When the time periods vary across multiple observational sessions, frequencies are converted to rate of behavior per unit time. For example, an observer may report that a child got out of his or her seat at an average rate of one time per minute during three separate observations conducted over the course of 3 days, even though the actual duration of each observation period varied. By using rate of behavior rather than frequency, comparisons can be made across observational sessions that differ with respect to time. Frequency recording is most useful for observing behaviors that have a discrete beginning and ending and are relatively consistent in the length of time that they take to occur. For example, ‘‘tantruming’’ may not lend itself well to frequency recording because the beginning and end of each tantrum might be difficult to discern and the length of time of each tantrum might vary from a few minutes to hours. Moreover, frequency recording is usually better suited for behaviors that occur at lower rather than higher rates. With higher rates of behavior, accurately detecting each instance of behavior can be challenging.

In addition to frequency recording, another commonly used recording schedule is interval recording. In comparison to frequency recording, which notes each occurrence of behavior, interval recording divides the observational session into a number of equal intervals and simply records the presence or absence of specified behaviors during each interval. For example, a 20 min observation session could be broken down into 120 10 s intervals. Moreover, unlike frequency recording, in which each instance of behavior is noted, interval recording is only concerned with whether the targeted behavior occurs during the interval. As such, distinctions are not made regarding how many behaviors were observed during the interval. For example, an interval in which there were nine instances of out-of-seat behavior would be coded the same as an interval in which only one instance of out-of-seat behavior occurred.

In practice, interval-recording data are generally collected using one of three coding techniques. When using whole-interval recording, an interval is scored when the target behavior was present throughout the entire interval. Since the behavior must be present for the entire interval, whole-interval recording is well suited to behaviors that are continuous or intervals of short duration. One of the drawbacks of whole-interval recording is that it tends to underestimate the presence of the behavior. For example, if ‘‘off-task’’ behavior were the target behavior of interest and it was observed to have occurred for 8 s of a 10 s interval, the interval would not be scored for the presence of off-task since it did not occur for the entire 10 s. In contrast to whole interval recording, partial-interval recording codes the occurrence of a behavior if it occurs during any part of the interval. Partial-interval recording is a good choice for behaviors that occur at a relatively low rate or for behaviors of somewhat inconsistent duration. However, because the interval is scored for any presence of the target behavior, partial-interval recording tends to overestimate the actual occurrence of behavior. Lastly, momentary time sampling codes the presence or absence of the target behavior at only one predefined instant of the interval (e.g., usually the instant in which the interval begins). As such, the target behavior is coded as occurring or not on the basis of a fractional observation of the total interval. Once the target behavior is coded, no other behaviors are noted for the balance of the interval. Although momentary time sampling appears counterintuitive because it is based on the smallest sample of behavior, it actually provides the least biased estimate of behavior, generally providing neither over nor underestimates of actual behavior. In addition to frequency and interval recording, other less commonly used coding methods include duration (i.e., the actual amount of time that a behavior occurs), latency (i.e., the time it takes for a behavior to be initiated following a prompt or directive), and intensity (i.e., the amplitude of behavior).

5. Observational Instruments

In addition to observer-designed coding schemes, observational instruments have been developed to assess a specific range of behaviors in specific environmental circumstances. For example, an observer might choose to use an observational instrument designed specifically to quantify the percentage of time a student is academically engaged in or on-task, the frequency with which the student interacts with other students on the playground, or the number of times a teacher provides directives, opportunities to respond, or positive reinforcement. In comparison to observer-developed coding schemes, observational instruments can be somewhat more limited in their flexibility since they are usually developed with specific standardized operational definitions and recording schedules. On the other hand, since observational instruments are usually developed with a specific purpose in mind, they often provide a more detailed account of the student’s behavior under the specific environmental context of interest.

6. Summarizing Behavioral Observation Data

Once behavioral observation data are coded they are usually summarized across the entire observational period to provide a description of behavior. As previously indicated, frequency data are reported as a function of rate (i.e., number of behaviors noted divided by the amount of time observed). By doing so, summary data collected across observational sessions of varying time can be compared. Interval data are expressed as a percentage of the intervals that the target behavior occurred compared to the total number of intervals observed. In summarizing and reporting interval recording results, it is important to note that the behavior was coded for a certain percentage of intervals observed and not time (e.g., ‘‘Off-task behavior was noted for 60% of the intervals observed,’’ not ‘‘Off-task behavior was noted 60% of the time’’). This subtle distinction highlights that the coding scheme used was interval recording, and that intervals and not actual time was the unit of analysis. Duration recording is generally reported as the cumulative amount of time that a behavior was observed. However, it is not uncommon to record the duration of each instance of observed behavior separately and then sum the durations for a final cumulative total. In addition to providing total duration, the latter procedure can also provide the average duration per instance of behavior. Latency recording is summarized similarly to duration, with either cumulative latency or average latency per behavior generally noted. Lastly, intensity is usually summarized using some form of subjective ordinal scale of analysis (e.g., spoke in a voice loud enough to be heard by a person sitting next to him or her, spoke in a voice loud enough to be heard by a person in the room, spoke in a voice loud enough to be heard by a person outside the room, etc.) or by some mechanical recording device specifically designed to measure the dimension of interest (e.g., amplimeter).

7. Reliability And Validity Of Behavioral Observation

The applicability of reliability and validity to behavioral observation has been equivocal, with some suggesting that classical psychometric concepts based on differences between persons are irrelevant to an assessment methodology that focuses on behavior and its variation within individuals, and others suggesting that the differences between traditional and behavioral assessment are primarily conceptual, not methodological, and as such reliability and validity considerations apply.

Traditionally, behavioral assessors and researchers have approached the issue of reliability using accuracy and interobserver agreement as substitutes. Specifically, accuracy refers to the extent to which observed values of behavior approximate the ‘‘true’’ state of the behavior as it actually occurs. When the true state of the behavior is known, comparing the observed values to the true values derives accuracy of measurement. In this case, the behavioral notion of accuracy is the same as reliability in the classic measurement case in which reliability is represented by the extent to which an observed score represents the true score. As such, in order for a measurement system to be accurate it must be sensitive to the occurrence of true behavior, its repeated occurrence, and its occurrence in multiple settings.

The problem with accuracy, however, is obtaining measures of behavior that can legitimately be considered true values. In particular, when systematic direct observation by humans are used to collect data, developing an incontrovertible index against which to compare scores generated from such an observation system can prove difficult. When possible, some form of technological reproduction is typically used to generate the true values of the behavior. For example, sample sessions of behavior may be videotaped and studied carefully in order to determine the true values for the target behavior. The ability to replay the tape as many times as necessary or to use stop action and reverse options helps avoid error, although it can never be proven that error is completely absent. In this manner, observed behavioral values can be compared to the true value template and accuracy assessed.

The problems in assessing accuracy are obvious. First, the practicality and feasibility of obtaining true values or an incontrovertible index of behavior are low. Simply, most behavioral assessors and many researchers are not equipped with the type of technological assistance that is required to capture the true essence of the target behavior. Second, the issue that one can never fully know the true value behavior because it is impossible to separate what we know from how we know it is ever present. Therefore, accuracy is highly time and context dependent and only represents point estimates of reliable measurement. For these reasons, accuracy is rarely considered or reported in the behavioral assessment literature.

In addition to accuracy, a second commonly used proxy for reliability is that of interobserver agreement. Here, concurrent observations are made independently by two observers and the degree of association between the observers is assessed. It is important to note, however, that interobserver agreement provides no information with respect to accuracy and reliability. Reliable or accurate observations are those that demonstrate a consistent relationship to the behavior as it actually occurs; interobserver agreement provides no such information. Simply, the fact that two observers agree on the number of times a target behavior occurred says nothing about whether their observations were accurate or reliable. For example, two observers may both note the occurrence of 10 instances of the target behavior. The level of their agreement would appear to be 100%. However, if 20 instances of the behavior actually occurred, and observer A witnessed only the oddnumbered occurrences and observer B noted only the even-numbered occurrences, their actual agreement would be 0%. As such, although complete agreement between two observers might seem like a comforting piece of information, the only conclusion that can be made is that their total values (not the actual behaviors) for the session were in agreement.

As can be seen, both interobserver agreement and accuracy involve comparing an observer’s data to some other source. However, they differ considerably in the extent to which the source of comparisons can be entrusted to reflect the true behavior as it actually occurred. Accuracy, which makes a direct comparison to an incontrovertible index of the real behavior, is obviously the more reliable (and valid) of the two; however, it is difficult to estimate in applied research settings. Interobserver agreement, although easier to acquire and often used interchangeably with reliability, suffers from an inability to demonstrate accuracy and makes obvious the problems with using consensus as a replacement for reliability.

8. Summary

Behavioral observation is one of the most widely used assessment strategies in schools. Given its flexibility and ease of use, behavioral observation procedures can be used to collect a range of data that provide helpful information and are useful for making a variety of psychoeducational decisions. Because of its direct nature, behavioral observation is particularly well suited for everyday life settings and can provide a systematic record of behavior that can be used in preliminary evaluation, intervention planning and design, the documentation of changes over time, and as part of a multimethod–multisource evaluation that integrates other forms of assessment (e.g., interviews and rating scales) and sources (e.g., teachers, parents, and children).


  1. Alessi, G. (1988). Direct observation methods for emotional/behavioral problems. In E. S. Shapiro, & T. R. Kratochwill (Eds.), Behavior assessment in schools: Conceptual foundations and practical applications (pp. 14–75). New York: Guilford.
  2. Hintze, J. M., Volpe, R. J., & Shapiro, E. S. (2002). Best practices in the systematic direct observation of student behavior. In A. Thomas, & J. Grimes (Eds.), Best practices in school psychology—IV (Vol. 2, pp. 993–1006). Washington, DC: National Association of School Psychologists.
  3. Merrell, K. W. (1999). Behavioral, social, and emotional assessment of children & adolescents. Mahwah, NJ: Erlbaum.
  4. Skinner, C. H., Dittmer, K. I., & Howell, L. A. (2000). Direct observation in school settings: Theoretical issues. In E. S. Shairo, & T. R. Kratochwill (Eds.), Behavioral assessment in schools: Theory, research, and clinical foundations (2nd ed., pp. 19–45). New York: Guilford.

See also:

Free research papers are not written to satisfy your specific instructions. You can use our professional writing services to order a custom research paper on any topic and get your high quality paper at affordable price.


Always on-time


100% Confidentiality
Special offer! Get discount 10% for the first order. Promo code: cd1a428655