You are here

An Introduction to Data Science

An Introduction to Data Science

Additional resources:

September 2017 | 288 pages | SAGE Publications, Inc

An Introduction to Data Science is an easy-to-read, gentle introduction for advanced undergraduate, certificate, and graduate students coming from a wide range of backgrounds into the world of data science. After introducing the basic concepts of data science, the book builds on these foundations to explain data science techniques using the R programming language and RStudio® from the ground up. Short chapters allow instructors to group concepts together for a semester course and provide students with manageable amounts of information for each concept. By taking students systematically through the R programming environment, the book takes the fear out of data science and familiarizes students with the environment so they can be successful when performing advanced functions.


The authors cover statistics from a conceptual standpoint, focusing on how to use and interpret statistics, rather than the math behind the statistics. This text then demonstrates how to use data effectively and efficiently to construct models, predict outcomes, visualize data, and make decisions. Accompanying digital resources provide code and datasets for instructors and learners to perform a wide range of data science tasks.  

About the Authors
Introduction: Data Science, Many Skills
What Is Data Science?  
The Steps in Doing Data Science  
The Skills Needed to Do Data Science  
Chapter 1 • About Data
Storing Data—Using Bits and Bytes  
Combining Bytes Into Larger Structures  
Creating a Data Set in R  
Chapter 2 • Identifying Data Problems
Talking to Subject Matter Experts  
Looking for the Exception  
Exploring Risk and Uncertainty  
Chapter 3 • Getting Started With R
Installing R  
Using R  
Creating and Using Vectors  
Chapter 4 • Follow the Data
Understand Existing Data Sources  
Exploring Data Models  
Chapter 5 • Rows and Columns
Creating Dataframes  
Exploring Dataframes  
Accessing Columns in a Dataframe  
Chapter 6 • Data Munging
Reading a CSV Text File  
Removing Rows and Columns  
Renaming Rows and Columns  
Cleaning Up the Elements  
Sorting Dataframes  
Chapter 7 • Onward With RStudio®
Using an Integrated Development Environment  
Installing RStudio  
Creating R Scripts  
Chapter 8 • What’s My Function?
Why Create and Use Functions?  
Creating Functions in R  
Testing Functions  
Installing a Package to Access a Function  
Chapter 9 • Beer, Farms, and Peas and the Use of Statistics
Historical Perspective  
Sampling a Population  
Understanding Descriptive Statistics  
Using Descriptive Statistics  
Using Histograms to Understand a Distribution  
Normal Distributions  
Chapter 10 • Sample in a Jar
Sampling in R  
Repeating Our Sampling  
Law of Large Numbers and the Central Limit Theorem  
Comparing Two Samples  
Chapter 11 • Storage Wars
Importing Data Using RStudio  
Accessing Excel Data  
Accessing a Database  
Comparing SQL and R for Accessing a Data Set  
Accessing JSON Data  
Chapter 12 • Pictures Versus Numbers
A Visualization Overview  
Basic Plots in R  
Using ggplot2  
More Advanced ggplot2 Visualizations  
Chapter 13 • Map Mashup
Creating Map Visualizations With ggplot2  
Showing Points on a Map  
A Map Visualization Example  
Chapter 14 • Word Perfect
Reading in Text Files  
Using the Text Mining Package  
Creating Word Clouds  
Chapter 15 • Happy Words?
Sentiment Analysis  
Other Uses of Text Mining  
Chapter 16 • Lining Up Our Models
What Is a Model?  
Linear Modeling  
An Example—Car Maintenance  
Chapter 17 • Hi Ho, Hi Ho—Data Mining We Go
Data Mining Overview  
Association Rules Data  
Association Rules Mining  
Exploring How the Association Rules Algorithm Works  
Chapter 18 • What’s Your Vector, Victor?
Supervised and Unsupervised Learning  
Supervised Learning via Support Vector Machines  
Support Vector Machines in R  
Chapter 19 • Shiny® Web Apps
Creating Web Applications in R  
Deploying the Application  
Chapter 20 • Big Data? Big Deal!
What Is Big Data?  
The Tools for Big Data  


Student Study Site
    • Lab and homework assignments accompany chapter material and are downloadable as R source code.
    • R Code from the book, available as an R source file.
    • Multimedia content includes links to YouTube videos showing demos of using R, audio, data, and web resources.
Instructor Resouce Site

Password-protected Instructor Resources include the following:


    • Editable, chapter-specific Microsoft® PowerPoint® slides offer you complete flexibility in easily creating a multimedia presentation for your course. Highlight essential content and features.
    • Lab and homework assignments and their solutions accompany chapter material and are downloadable as R source code.
    • R Code from the book, available as an R source file
    • Multimedia content includes links to YouTube videos showing demos of using R, audio, data, and web resources that appeal to students with different learning styles and prompts classroom discussion.
Key features


  • Students cement their knowledge of data science with learning objectives, chapter challenge exercises, R code examples throughout, basic summaries of statistics, and a companion site with digital resources and code.
  • Use of free and open source R and RStudio® to work with real data examples to illustrate both the challenges of data science and the techniques used to address those challenges.
  • Examples with real data make the book meaningful for readers.

Sample Materials & Chapters


Chapter 6

Preview this book

For instructors

Review and Desk copies for this title are available digitally via VitalSource.

Request e-review copy

If you require a print review copy, please call: (800) 818-7243 ext. 6140 or email

Select a Purchasing Option

Electronic version
Prices from
*180 day rental