ACM Digital Library

Communications of the ACM
Volume 46, Number 12 (2003), Pages 66-72
Using data mining to profile TV viewers
William E. Spangler, Mordechai Gal-Or, Jerrold H. May

Table of Contents

back to top 

Mining thousands of viewing choices and millions of patterns, advertisers and TV networks identify household characteristics, tastes, and desires to create and deliver custom targeted advertising.

back to top 

The emergence of the digital personal video recorder (PVR) is likely to coincide with profound changes in television viewing, as viewers use the technology to time-shift their viewing and skim over or eliminate "in stream" commercials. This trend represents a significant threat to television advertisers and service providers, jeopardizing the traditional means by which advertising finances so-called free programming. Indeed, due to the rapid mass-market acceptance of PVRs, Forrester Research has predicted the viewing of television commercials could decrease by 20% in just a few years. Forrester analyst Joshua Bernoff suggests this is because the PVR "degrades the value of advertising," predicting the eventual disappearance of an important social element of the television viewing model, that is, "television in which you sit through the commercials is about to be replaced" [6].

Advertisers and service providers must adapt to these changes, as well as to the opportunities, engendered by the PVR. For example, because PVRs are programmable, they can record and transmit household viewing patterns. Data mining techniques can then be used to predict a household's characteristics from its viewing choices, and customized advertising relevant to that type of household can be sent through its PVR. Such targeted advertising is more likely to be viewed and influence purchasing behavior than its non-targeted counterpart; it is also more likely to be more efficient because fewer placements are needed to achieve the same number of exposures to the desired demographic group. From the viewer's perspective, efficient delivery of advertising subsidizes delivery of programming while reducing advertising obtrusiveness.

To deliver targeted advertising, TV networks must be able to identify the members of target groups in a way that is accurate, unobtrusive, and auditable. Toward these ends, we have developed a data mining viewer profiling system—the Advertising Delivery System (ADS)—to identify the demographic and psychographic (behavioral) characteristics of viewers based on their viewing patterns. Our research is based on the premise "you are what you watch," or individuals' viewing habits tend to reveal their most personal characteristics, tastes, and desires. Viewing patterns include types of shows viewed, the frequency each show is viewed, the time each show is viewed, and the duration of each viewing. The related data mining techniques use these components to identify subsets of viewers rich in the target groups of interest to TV advertisers and networks.

The table here outlines the gains from viewer profiling for five target gender/age segments defined by Nielsen Media Services, Inc.; see [1] for a discussion of segmentation strategies. If an ad is sent to everyone, 25.18% of the recipient households will include a female aged 18 to 34. If an ad is sent only to households selected by the profiling system, 58.06% will include such a person. A household selected by the profiling system group is therefore 58.06%/25.18%, or 2.3 times more likely, to be of the desired type than one selected at random. This ratio is the model's "lift"; the lifts in the table are different for different demographic groups because some groups' characteristics are easier to predict based on their viewing patterns.


The key to success is the extent to which the system is able to achieve lift over selecting from a random sample.


Our research complements other approaches to identifying individuals' characteristics and targeting content based on these characteristics. Decision-theoretic and collaborative-filtering techniques have been suggested for direct mail [5], for tailoring ads on Web pages [4, 8], and for constructing personalized TV program guides [11]. Key conceptual and technical issues in the development and implementation of viewer profiling systems include:

We have also looked into the privacy concerns engendered by systems designed to monitor and analyze viewing behavior. We address these concerns through privacy policies clearly communicated to the viewer. Key elements of such a policy might include, for example, an "opt-in" feature, providing information to advertisers only in the aggregate, and restricting advertising only to the PVR.

back to top  Generating Profiles

The ADS employs a series of supervised learning techniques to analyze labeled viewing data and classify individuals and households. Each classification is a profile and may include demographic (age, gender), geographic (areas, markets, types of markets), and psychographic (interests, lifestyle) information. For simplicity here, we restrict ourselves to viewers' demographic characteristics.

Nielsen maintains up-to-date profiles and complete broadcast viewing records of more than 11,000 viewers in more than 5,000 households. We used Nielsen's database to build and tune statistical models for viewer profiling. We plan to build statistical models directly from the PVR data stream when it is available and use the Nielsen data to validate the models.

We also developed a Profiling Module (PM) for the ADS. Figure 1 outlines how the ADS delivers programming and advertising to the PVR, which records the programs and the ads presented on the TV screen, as well as the time and duration of each presentation. Targeted ads are traditional TV ads stored in the PVR and presented on the TV prior to the relay of a recorded program. We say "presented" because the PVR knows only that something was displayed on the screen, not which humans, if any, viewed it. The PVR communicates ad and presentation information back to the ADS. The ADS's Advertising Management Module records which ads have been presented and maintains the current inventory of available presentation time slots. The PM's data mining component uses the programming data returned by the PVR to create or update individual viewer profiles. The system selects ads from the profile and existing ad inventory to deliver to the household's PVR.

Data miners first consider which tools and implementations are most appropriate to the problem at hand [3, 7]. Their selection depends on several factors, including the nature of the task: classification (disjoint or overlapping [12]), value prediction (or both), and the content and format of the source data. For viewer profiling, a particular implementation of neural networks might be best for identifying certain types of viewers or for profiling certain viewing behaviors. Discriminant analysis might be superior for other types of viewers and patterns. Because it might not be possible to anticipate method performance a priori, we've developed a combination of publicly available and custom-designed candidate algorithms.

The key to our approach is the way the methods are managed within the integrated system. However, managing multiple data mining methods involves several notable challenges, including:

The PM automatically builds models for predicting profiles from viewing records in a fixed training set and refines and validates these models using fixed test sets. The PM begins by cleaning and restructuring the demographic-behavioral data to conform to the models it uses (see Figure 2). Cleaning removes duplicate records and TV programs not displayed long enough to qualify as having been viewed. Restructuring discretizes and categorizes continuous fields, including income, age, and education level.

The next step in producing data useful to advertisers and TV networks is to create the input data files for the models by separating attributes into independent and dependent variables and partitioning the data into subsets corresponding to different time periods. Independent variables describe viewing behaviors. Dependent variables include household location, income level, and size, as well as education level and the gender and ages of the individuals in the household. The data is divided into subsets because there are a priori reasons for believing the relationships between viewing and demographics might vary from day to day and week to week; for example, significantly more working-age adults watch television at 2 P.M. on weekends than at 2 P.M. on weekdays. Differences in the makeup of a potential audience can lead to differences in algorithm performance; for example, one model might be best at identifying teenage males (dependent variables) on Saturday mornings (an independent variable), while another is best at identifying females age 65 and over on Sunday evenings.

Preprocessing produces a stream of data for every household. The PM could then employ any of a number of algorithms involving:

After preprocessing, the PM allows all the models to generate viewer profiles based on the viewing patterns in the training set, as shown in Figure 2. The process begins by standardizing the current viewer data to correspond to the record format of the demographic-behavioral data files, including the manner in which the demographic-behavioral data sets were segmented based on time period. The standardized current viewer data files are processed by the models, producing as output demographic descriptions of each household and its members.


If Viewers are to accept the technology of viewer profiling, service providers must communicate not only the protections inherent in a privacy policy but the benefits viewers receive by participating in the process.


back to top  Results Reconciliation

The PM reconciles the results by applying multiple models to all the data subsets using a voting scheme whereby each model indicates its interpretation of a viewing pattern via its classifications and its confidence in these classifications. Using any number of vote-counting strategies, the PM resolves any differences of opinion. Techniques for reconciling the results of multiple classification models are widely studied [9, 10]. Selecting a vote-counting strategy depends on three factors:

The confidence level and strength of the model are both necessary because these two measures might differ under different circumstances. Because we do not know all the circumstances under which this might occur, we cannot determine a priori which measure must take precedence. Therefore, we are required to take an experimental approach that considers both measures.

The PM applies all the algorithms against each data subset, and each algorithm in turn reports its subset-specific classification and confidence levels. The PM then determines the strength of each model's classifications using confusion matrices to estimate misclassification rates on training and test set data (derived through a simple data set split). It then chooses models for analysis of live viewing behavior based on their relative performance.

Because the ADS presents advertisers with groups of households it claims belong to particular demographic groups, misclassification measures derived from the confusion matrices are important when a model is used in production. Assume for example that an advertiser wishes to send ads to a targeted group of 100 male teenagers in affluent households. If the model is 70% accurate in classifying members of this group, the ADS would have to create a pool of 143 households it identifies as affluent and that includes a male teenager in order to satisfy the advertiser, or 100/.70 = 143 households. That is, when the ad is presented to 143 households, 100 of them would be expected to qualify based on the known error rate. This approach allows the system to determine, with reasonable precision, the number of households to which an ad must be presented in order to reach a pre-specified percentage of the target group.

The greater the error rate, the more households to which ads must be served in order to satisfy the advertiser. The key to success is the extent to which the system is able to achieve lift over selecting from a random sample. For example, a 1% accuracy rate, which would require sending 100 ads for each desired member of the target group, might be acceptable if one of 1,000 members of the general population belongs to the target group.

All of this relates to the more general issue of inventory management, including how a system manages the distribution of ads to a limited number of potential recipients, given existing contractual obligations to advertisers. The problem is exacerbated by the "perishability," or time sensitivity, of the ads themselves. These issues have been explored in part in the context of Web site banner advertising [4] and are the subject of our own continuing investigation.

back to top  Departure from Traditional Role

Our development of the ADS, along with its data mining component, has generated several insights regarding a general approach to data mining in integrated application systems. The implementation suggests that a data mining approach can be used as an automated component in production systems. This is a departure from data mining's traditional role as an analytic, decision-support tool. The ADS reflects specific strategies for managing the uncertainty of real-world data and the abilities of the various models to classify data. The strategies include the use of multiple data mining models and a means of reconciling outcomes, as well as performance-validation results in the operational use of models. Included, too, is the selection of the proper size of the targeted viewer pool to obtain a prespecified "hit count." This selection step in effect factors the inherent uncertainty of classification into the implementation.

Development of the ADS underscores the constraints the data-collection process places on data mining. In this particular domain—targeted TV advertising to viewers of digital TV and cable TV services—attempting to distinguish among, say, households and individuals within households is highly problematic. The approach we've adopted to develop the ADS and deliver targeted advertising is to defer the issue of individual profiling by building models describing household composition and viewing decisions. These models are used to identify the most desirable households (from the advertiser's point of view) in the live data.

The size and number of demographic groups under consideration have a significant effect on the classification accuracy of the various groups. Predicting the behavior of smaller groups is more difficult than larger groups, mainly because they provide fewer observations per group, thus representing a smaller target. For example, it would be more difficult to predict the viewing behavior of two groups of children ages 2–6 and 7–11 than a single group of children ages 2–11. This difficulty strongly suggests that advertising campaigns in which advertisers target specific types of persons or households are most conducive to such an approach. By providing categories to the data mining system prior to the generation of profiles, advertisers limit the number of potential categories, or combinations of demographic groups, the system has to consider, thus improving the system's overall performance.

Finally, the generation and sale of viewer profiles to advertisers for targeted advertising has the potential to raise legitimate privacy concerns among the public at large. It is therefore important to properly manage the collection and use of viewer data, including the development and communication of a general privacy policy statement to viewers and to the public at large. Based on current practices and recommendations, we suggest a privacy policy include at least the following four principles:

If viewers and the general public are to accept the technology of viewer profiling, service providers must communicate not only the protections inherent in a privacy policy but the benefits viewers receive by participating in the process. For example, a viewer might choose to provide personal information in exchange for more personalized programming—perhaps including a personalized programming guide [2, 11]—as well as more relevant and useful advertising. Other benefits might include subsidizing the monthly cost of the PVR and/or other services for viewers who participate and offering coupons for relevant products.

back to top  Conclusion

These issues generalize beyond the ADS viewer-profiling application, potentially to any activity involving consumer behavior information. For example, companies routinely capture consumer purchasing transaction data involving credit cards and frequent-shopper cards, in a manner similar to the use of PVRs to capture viewing choices. As with TV viewing data, purchasing data can be used to construct demographic profiles of consumers, enabling retailers to target their marketing efforts and advertising to individual consumers. As computer technologies make consumer behavioral data increasingly accessible, it also becomes increasingly important to address the analytical, business, social/privacy, legal, and ethical considerations in data mining for consumer marketing.

back to top  References

1. Apte, C., Liu, B., Pednault, E., and Smyth, P. Business applications of data mining. Commun. ACM 45, 8 (Aug. 2002), 49–53.

2. Baudisch, P. and Brueckner, L. TV Scout: Lowering the entry barrier to personalized TV program recommendation. In Proceedings of the 2nd International Conference on Adaptive Hypermedia and Adaptive Web-Based Systems (Malaga, Spain, May 29–31). Springer-Verlag, Berlin, 2002, 58–68.

3. Berry, M. and Linoff, G. Mastering Data Mining. John Wiley & Sons, Inc., New York, 1999.

4. Chickering, D. and Heckerman, D. Targeted advertising with inventory management. In Proceedings of the ACM Conference on Electronic Commerce (Minneapolis, Oct. 17–20). ACM Press, New York, 2000, 145–149.

5. Chickering, D. and Heckerman, D. A decision-theoretic approach to targeted advertising. In Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence (Stanford, CA, June 30–July 3). Morgan Kaufman, San Francisco, 2000, 82–88.

6. Chunovic, L. Change ahead for channels. Electronic Media 20, 31 (July 30, 2001), 1.

7. Duda, R., Hart, P., and Stork, D. Pattern Classification. John Wiley & Sons, Inc., New York, 2000.

8. Konstan, J., Miller, B., Maltz, D., Herlocker, J., Gordon, L., and Riedl, J. GroupLens: Applying collaborative filtering to Usenet news. Commun. ACM 40, 3 (Mar. 1997), 77–87.

9. Ortega, J., Koppel, M., and Argamon, S. Arbitrating among competing classifiers using learned referees. Knowl. Inform. Syst. 3, 4 (Nov. 2001), 470–490.

10. Prodromidis, A. and Stolfo, S. Cost complexity-based pruning of ensemble classifiers. Knowl. Inform. Syst. 3, 4 (Nov. 2001), 449–469.

11. Smyth, B. and Cotter, P. A personalized television listings service. Commun. ACM 43, 8 (Aug. 2000), 107–111.

12. Spangler, W., May, J., and Vargas, L. Choosing data mining methods for multiple classification: Representational and performance measurement implications for decision support. J. Mgmt. Info. Syst. 16, 1 (summer 1999), 37–62.

back to top  Authors

William E. Spangler (spangler@duq.edu) is an assistant professor of information technology in the A.J. Palumbo School of Business Administration at Duquesne University, Pittsburgh, PA.

Mordechai Gal-Or (galor@duq.edu) is an assistant professor of information technology in the A.J. Palumbo School of Business Administration at Duquesne University, Pittsburgh, PA.

Jerrold H. May (jerrymay@katz.edu) is a professor of decision sciences and intelligent systems in the J.M. Katz Graduate School of Business, the University of Pittsburgh, Pittsburgh, PA.

back to top  Figures

F1Figure 1. The Advertising Delivery System.

F2Figure 2. The Profiling Module, a subsystem of the Advertising Delivery System.

back to top  Tables

UT1Table. Performance of the Profiling Module; h means a household includes at least one member of the demographic group, and ĥ means the data mining system predicts the household includes at least one such person.

back to top 

©2003 ACM  0002-0782/03/1200  $5.00

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

The Digital Library is published by the Association for Computing Machinery. Copyright © 2003 ACM, Inc.