Cover: Machine Learning in the AWS Cloud by Abhishek Mishra

Machine Learning in the AWS Cloud

Add Intelligence to Applications with Amazon SageMaker and Amazon Rekognition

 

 

Abhishek Mishra

 

 

 

 

 

 

 

Wiley Logo
  1. To my wife Sonam, for her love and support through all the years we've been together.

    To my daughter Elana, for bringing joy and happiness into our lives.

    —Abhishek

Acknowledgments

This book would not have been possible without the support of the team at Wiley, including Jim Minatel, Kenyon Brown, David Clark, Kim Cofer, and Pete Gaughan. I would also like to thank Chaim Krause for his keen eye for detail. It has been my privilege to work with all of you. Thank you.

About the Author

Abhishek Mishra has been active in the IT industry for over 19 years and has extensive experience with a wide range of programming languages, enterprise systems, service architectures, and platforms.

He holds a master's degree in computer science from the University of London and currently provides consultancy services to Lloyds Banking Group in London as a security and fraud solution architect. He is the author of several books, including Amazon Web Services for Mobile Developers.

About the Technical Editor

Chaim Krause is a lover of computers, electronics, animals, and electronic music. He's tickled pink when he can combine two or more in some project. He has come by the vast majority of his knowledge through independent learning. He jokes with everyone that the only difference between what he does at home and what he does at work is the logon he uses. As a lifelong learner he is often frustrated with technical errors in documentation that waste valuable time and cause unnecessary frustration. One of the reasons he works as the technical editor on books is to help others avoid those same pitfalls.

Introduction

Amazon Web Services (AWS) is one of the leading cloud-computing platforms in the industry today. At the time this book was written, AWS offered more than 100 services, each of which resided in one of 18 different service categories. For someone who is new to cloud computing or to the AWS ecosystem, the sheer number of services on offer can be daunting. It can be difficult to know where to begin and what services to focus on.

Developers who are new to machine learning as well as experienced data scientists are often not aware of the power of the public cloud and AWS's offerings in the machine learning space in particular. In the past, cloud-based machine learning offerings have been limited in the types of algorithms they could support and the level of customization that was possible. All of this changed when Amazon announced SageMaker—a service that provided the ability to build machine learning models based on Amazon's implementation of cutting-edge algorithms, as well as the option to build custom models with frameworks such as Scikit-learn and Google TensorFlow.

Real-world use cases of cloud-based machine learning models are not based on using the model in isolation, but instead rely on a number of supporting systems such as databases, load balancers, API gateways, and identity providers, all of which are provided by AWS. This book is written to provide both seasoned machine learning experts and enthusiasts alike an introduction to a selection of AWS machine learning services that are based on pre-trained models, as well as step-by-step examples of how to train and deploy your own custom models on Amazon SageMaker. For enthusiasts who are new to machine learning, this book also provides a selection of chapters that cover the fundamentals of machine learning such as data preprocessing, visualization, feature engineering, and the use of common Python libraries such as NumPy, Pandas, and Scikit-learn.

This book at all times attempts to balance between theory and practice, giving you enough visibility into the underlying concepts and providing you with the best practices and practical advice that you can apply at your workplace right away. I have also made every attempt to keep the content up-to-date and relevant. Even though this makes the book susceptible to being outdated in a few rare instances, I am confident the content will remain useful and relevant through the next versions of the AWS services.

Who This Book Is For

This book is best suited for software developers who wish to learn about machine learning in general and how to leverage machine learning–specific offerings from AWS. The book is also useful to data scientists, system architects, and application architects, who want to get an introduction to some of the commonly used AWS services in the machine learning space.

If you are new to both machine learning and AWS, I advise that you read all chapters from start to finish. If you are an experienced data scientist, you may want to skip ahead to Part 2 to learn about machine learning–specific AWS services.

What This Book Covers

This book covers building and training machine learning models with Python on the AWS cloud, as well as a number of ready-to-use machine learning services such as Amazon Rekognition, Amazon Comprehend, and Amazon Lex.

The book also covers general high-level concepts of machine learning, including feature engineering, data visualization, as well as supporting AWS services that are used to build machine learning systems such as Amazon IAM, Amazon Cognito, Amazon S3, Amazon DynamoDB, and AWS Lambda.

The model-building and evaluation code in this book is written in Python 3. Services provided by Amazon, Apple, and Google are updated frequently and therefore sometimes you may encounter a newer version of a screen when you follow the instructions in a chapter.

How This Book Is Structured

This book consists of 18 chapters that are grouped into two parts, and four appendices. The first part consists of five chapters and covers the fundamentals of machine learning using Python. This part covers techniques for feature engineering, data visualization, model building, and model evaluation using Pandas, NumPy, Matplotlib, and Scikit-learn. The examples developed in this part make use of Jupyter Notebook and are aimed at readers who are new to machine learning.

Part 2 covers building machine learning applications using AWS services. This part starts with introducing the basics of commonly used AWS services such as Amazon S3, Amazon DynamoDB, and AWS Lambda. It then proceeds to AWS services that deal specifically with machine learning such as Amazon Comprehend, Amazon Lex, Amazon Machine Learning, and Amazon SageMaker. Two chapters are dedicated to Amazon SageMaker; the first one covers building and deploying models using built-in algorithms and Scikit-learn, and the second one covers building and deploying a model with Google TensorFlow. Not all chapters in this part include source code, but where applicable, you can download the source code that accompanies each chapter using a GitHub link. Some of the chapters in this part require you to upload files to Amazon S3; you will need to substitute the names of buckets in the examples with those from your own account.

The chapters in Part 1 include:

  • Introduction to Machine Learning (Chapter 1) This is an introduction to the types of machine learning systems, their applications, and tools used to build machine learning systems.
  • Data Collection and Preprocessing (Chapter 2) This chapter covers sources that can be used to obtain training data, techniques to explore datasets, and basic feature engineering.
  • Data Visualization with Python (Chapter 3) This chapter covers techniques to visualize datasets using Matplotlib.
  • Creating Machine Learning Models with Scikit-learn (Chapter 4) This chapter covers techniques to build and train classification and regression models using Scikit-learn.
  • Evaluating Machine Learning Models (Chapter 5) This chapter covers techniques to evaluate the quality of a machine learning model.

The chapters in Part 2 include:

  • Introduction to Amazon Web Services (Chapter 6) This chapter is a brief primer on cloud computing and Amazon Web Services. It also covers commonly encountered service and deployment models.
  • AWS Global Infrastructure (Chapter 7) This chapter introduces AWS regions, availability zones, and edge locations.
  • Identity and Access Management (Chapter 8) This chapter introduces one of the key services provided by AWS to secure your resources in the Amazon cloud. It also provides instructions to sign up for an account under the AWS free tier.
  • Amazon S3 (Chapter 9) This chapter introduces one the most commonly used storage services provided by AWS, Amazon Simple Storage Service (S3).
  • Amazon Cognito (Chapter 10) This chapter introduces Amazon's cloud-based OAuth2.0-compliant identity management solution, Amazon Cognito.
  • Amazon DynamoDB (Chapter 11) This chapter introduces Amazon's managed NoSQL database service, Amazon DynamoDB.
  • AWS Lambda (Chapter 12) This chapter introduces AWS Lambda, a service designed to allow you to run code in the Amazon cloud without having to provision or manage any infrastructure.
  • Amazon Comprehend (Chapter 13) This chapter introduces Amazon Comprehend, a cloud-based natural language processing service that you can integrate into your applications to analyze the contents of text documents.
  • Amazon Lex (Chapter 14) This chapter introduces Amazon Lex, a cloud-based service that you can use to create chatbots and integrate them into your applications.
  • Amazon Machine Learning (Chapter 15) This chapter introduces Amazon Machine Learning, a fully managed cloud-based service that you can use to build and deploy simple machine learning models without any programming.
  • Amazon SageMaker (Chapter 16) This chapter introduces Amazon SageMaker, a cloud-based machine learning service that can be used to train and deploy both built-in and custom machine learning models.
  • Using Google Tensorflow with Amazon SageMaker (Chapter 17) This chapter introduces Google's Tensorflow framework and covers the use of Amazon SageMaker to build and deploy Tensorflow models.
  • Amazon Rekognition (Chapter 18) This chapter introduces Amazon Rekognition, a fully managed cloud-based service that can be used to add computer vision capabilities to your applications.

The appendices cover the following topics:

  • Anaconda and Jupyter Notebook Setup (Appendix A) This appendix provides instructions to install the Anaconda distribution and set up a Jupyter Notebook server on your local computer.
  • AWS Resources Needed to Use This Book (Appendix B) This appendix provides information on the AWS resources that you need to set up in your account in order to follow along with the examples in the book.
  • Installing and Configuring the AWS CLI (Appendix C) This appendix provides instructions to download and install the AWS CLI tool.
  • Introduction to NumPy and Pandas (Appendix D) This appendix provides an introduction to two Python libraries commonly used by data scientists: NumPy and Pandas.

What You Need to Use This Book

  • A suitable Mac or Windows computer for development
  • Basic knowledge of Python programming
  • An AWS account that you can administer

Conventions

To help you get the most from the text and keep track of what's happening, we've used a number of conventions throughout the book.

  • NOTE   Notes, tips, hints, tricks, and asides to the current discussion are offset like this.

As for styles in the text:

  • We italicize new terms and important words when we introduce them.
  • We show keyboard strokes like this: Ctrl+A.
  • We show filenames, URLs, and code within the text like so: persistence.properties.
  • We present code in two different ways:

    We use a monofont type with no highlighting for most code examples.

    We use bold type to emphasize code that is of particular importance in the present context.

Source Code

As you work through the examples in this book, you may choose either to type in all the code manually or to use the source code files that accompany the book. All of the source code used in this book is available for download at www.wiley.com/go/machinelearningawscloud. Also, you can download the code files at GitHub.

Errata

We make every effort to ensure that there are no errors in the text or in the code. However, no one is perfect, and mistakes do occur. If you find an error in one of our books, like a spelling mistake or faulty piece of code, we would be very grateful for your feedback. By sending in errata you may save another reader hours of frustration and at the same time you will be helping us provide even higher quality information.

To report errata, email to errata@wiley.com and include

  • The book's title and ISBN (Machine Learning in the AWS Cloud, 9781119556718)
  • The page number of the relevant content
  • A description of just what's wrong

Part 1
Fundamentals of Machine Learning

  • Chapter 1: Introduction to Machine Learning
  • Chapter 2: Data Collection and Preprocessing
  • Chapter 3: Data Visualization with Python
  • Chapter 4: Creating Machine Learning Models with Scikit-learn
  • Chapter 5: Evaluating Machine Learning Models