You are here

Best Practices in Data Cleaning

Best Practices in Data Cleaning
A Complete Guide to Everything You Need to Do Before and After Collecting Your Data

First Edition

January 2012 | 296 pages | SAGE Publications, Inc

Many researchers jump straight from data collection to data analysis without realizing how analyses and hypothesis tests can go profoundly wrong without clean data. This book provides a clear, step-by-step process to examining and cleaning data in order to decrease error rates and increase both the power and replicability of results. Jason W. Osborne, author of Best Practices in Quantitative Methods (SAGE, 2008) provides easily-implemented suggestions that are research-based and will motivate change in practice by empirically demonstrating for each topic the benefits of following best practices and the potential consequences of not following these guidelines. If your goal is to do the best research you can do, draw conclusions that are most likely to be accurate representations of the population(s) you wish to speak about, and report results that are most likely to be replicated by other researchers, then this basic guidebook is indispensible.


Chapter 1. Why Data Cleaning is Important: Debunking the Myth of Robustness
Part 1. Best Practices as you Prepare for Data Collection
Chapter 2. Power and Planning for Data Collection: Debunking the Myth of Adequate Power
Chapter 3. Being True to the Target Population: Debunking the Myth of Representativeness
Chapter 4. Using Large Data Sets with Probability Sampling Frameworks: Debunking the Myth of Equality
Part 2. Best Practices in Data Cleaning and Screening
Chapter 5. Screening your Data for Potential Problems: Debunking the Myth of Perfect Data
Chapter 6. Dealing with Missing or Incomplete Data: Debunking the Myth of Emptiness
Chapter 7. Extreme and Influential Data Points: Debunking the Myth of Equality
Chapter 8. Improving the Normality of Variables through Box-Cox Transformation: Debunking the Myth of Distributional Irrelevance
Chapter 9. Does Reliability Matter? Debunking the Myth of Perfect Measurement
Part 3. Advanced Topics in Data Cleaning
Chapter 10. Random Responding, Motivated Mis-Responding, and Response Sets: Debunking the Myth of the Motivated Participant
Chapter 11. Why Dichotomizing Continuous Variables is Rarely a Good Practice: Debunking the Myth of Categorization
Chapter 12. The Special Challenge of Cleaning Repeated Measures Data: Lots of Pits to Fall into
Chapter 13. Now that the Myths are Debunked... Visions of Rational Quantitative Methodology for the 21st Century

“This book provides the perfect bridge between the formal study of statistics and the practice of statistics. It fills the gap left by many of the traditional texts that focus either on the technical presentation or recipe-driven presentation of topics.”

Elizabeth M. Flow-Delwiche
Community College of Baltimore County

“The first comprehensive and generally accessible text in this area.”

J. Michael Hardin
The University of Alabama
Key features


  • Clear guidance with a step-by-step process of examining and cleaning data will decrease error rates and increase both the power and replicability of results.
  • Easily implementable suggestions are research-based and will motivate change in practice.
  • The author demonstrates the benefits of following best practices through examples of real research data.
  • Debunking ten common research myths helps the reader hone a more accurate view of research and data gathering.


An open access Student Study Site at includes relevant data sets for further research on important chapter topics, as well as useful exercises and activities for both instructors and students to use to extend and reinforce their understanding of the subject matter.

Sage College Publishing

You can purchase or sample this product on our Sage College Publishing site:

Go To College Site

This title is also available on SAGE Research Methods, the ultimate digital methods library. If your library doesn’t have access, ask your librarian to start a trial.