Copyright © 2019 by John Wiley & Sons, Inc., Indianapolis, Indiana
Published simultaneously in Canada
ISBN: 978-1-119-55671-8
ISBN: 978-1-119-55673-2 (ebk.)
ISBN: 978-1-119-55672-5 (ebk.)
No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 646-8600. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions
.
Limit of Liability/Disclaimer of Warranty: The publisher and the author make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation warranties of fitness for a particular purpose. No warranty may be created or extended by sales or promotional materials. The advice and strategies contained herein may not be suitable for every situation. This work is sold with the understanding that the publisher is not engaged in rendering legal, accounting, or other professional services. If professional assistance is required, the services of a competent professional person should be sought. Neither the publisher nor the author shall be liable for damages arising herefrom. The fact that an organization or Web site is referred to in this work as a citation and/or a potential source of further information does not mean that the author or the publisher endorses the information the organization or Web site may provide or recommendations it may make. Further, readers should be aware that Internet Web sites listed in this work may have changed or disappeared between when this work was written and when it is read.
For general information on our other products and services or to obtain technical support, please contact our Customer Care Department within the U.S. at (877) 762-2974, outside the U.S. at (317) 572-3993 or fax (317) 572-4002.
Wiley publishes in a variety of print and electronic formats and by print-on-demand. Some material included with standard print versions of this book may not be included in e-books or in print-on-demand. If this book refers to media such as a CD or DVD that is not included in the version you purchased, you may download this material at http://booksupport.wiley.com
. For more information about Wiley products, visit www.wiley.com
.
Library of Congress Control Number: 2019940774
TRADEMARKS: Wiley, the Wiley logo, and the Sybex logo are trademarks or registered trademarks of John Wiley & Sons, Inc. and/or its affiliates, in the United States and other countries, and may not be used without written permission. Amazon SageMaker and Amazon Rekognition are registered trademarks of Amazon Technologies, Inc. All other trademarks are the property of their respective owners. John Wiley & Sons, Inc. is not associated with any product or vendor mentioned in this book.
To my wife Sonam, for her love and support through all the years we've been together.
To my daughter Elana, for bringing joy and happiness into our lives.
—Abhishek
This book would not have been possible without the support of the team at Wiley, including Jim Minatel, Kenyon Brown, David Clark, Kim Cofer, and Pete Gaughan. I would also like to thank Chaim Krause for his keen eye for detail. It has been my privilege to work with all of you. Thank you.
Abhishek Mishra has been active in the IT industry for over 19 years and has extensive experience with a wide range of programming languages, enterprise systems, service architectures, and platforms.
He holds a master's degree in computer science from the University of London and currently provides consultancy services to Lloyds Banking Group in London as a security and fraud solution architect. He is the author of several books, including Amazon Web Services for Mobile Developers.
Chaim Krause is a lover of computers, electronics, animals, and electronic music. He's tickled pink when he can combine two or more in some project. He has come by the vast majority of his knowledge through independent learning. He jokes with everyone that the only difference between what he does at home and what he does at work is the logon he uses. As a lifelong learner he is often frustrated with technical errors in documentation that waste valuable time and cause unnecessary frustration. One of the reasons he works as the technical editor on books is to help others avoid those same pitfalls.
Amazon Web Services (AWS) is one of the leading cloud-computing platforms in the industry today. At the time this book was written, AWS offered more than 100 services, each of which resided in one of 18 different service categories. For someone who is new to cloud computing or to the AWS ecosystem, the sheer number of services on offer can be daunting. It can be difficult to know where to begin and what services to focus on.
Developers who are new to machine learning as well as experienced data scientists are often not aware of the power of the public cloud and AWS's offerings in the machine learning space in particular. In the past, cloud-based machine learning offerings have been limited in the types of algorithms they could support and the level of customization that was possible. All of this changed when Amazon announced SageMaker—a service that provided the ability to build machine learning models based on Amazon's implementation of cutting-edge algorithms, as well as the option to build custom models with frameworks such as Scikit-learn and Google TensorFlow.
Real-world use cases of cloud-based machine learning models are not based on using the model in isolation, but instead rely on a number of supporting systems such as databases, load balancers, API gateways, and identity providers, all of which are provided by AWS. This book is written to provide both seasoned machine learning experts and enthusiasts alike an introduction to a selection of AWS machine learning services that are based on pre-trained models, as well as step-by-step examples of how to train and deploy your own custom models on Amazon SageMaker. For enthusiasts who are new to machine learning, this book also provides a selection of chapters that cover the fundamentals of machine learning such as data preprocessing, visualization, feature engineering, and the use of common Python libraries such as NumPy, Pandas, and Scikit-learn.
This book at all times attempts to balance between theory and practice, giving you enough visibility into the underlying concepts and providing you with the best practices and practical advice that you can apply at your workplace right away. I have also made every attempt to keep the content up-to-date and relevant. Even though this makes the book susceptible to being outdated in a few rare instances, I am confident the content will remain useful and relevant through the next versions of the AWS services.
This book is best suited for software developers who wish to learn about machine learning in general and how to leverage machine learning–specific offerings from AWS. The book is also useful to data scientists, system architects, and application architects, who want to get an introduction to some of the commonly used AWS services in the machine learning space.
If you are new to both machine learning and AWS, I advise that you read all chapters from start to finish. If you are an experienced data scientist, you may want to skip ahead to Part 2 to learn about machine learning–specific AWS services.
This book covers building and training machine learning models with Python on the AWS cloud, as well as a number of ready-to-use machine learning services such as Amazon Rekognition, Amazon Comprehend, and Amazon Lex.
The book also covers general high-level concepts of machine learning, including feature engineering, data visualization, as well as supporting AWS services that are used to build machine learning systems such as Amazon IAM, Amazon Cognito, Amazon S3, Amazon DynamoDB, and AWS Lambda.
The model-building and evaluation code in this book is written in Python 3. Services provided by Amazon, Apple, and Google are updated frequently and therefore sometimes you may encounter a newer version of a screen when you follow the instructions in a chapter.
This book consists of 18 chapters that are grouped into two parts, and four appendices. The first part consists of five chapters and covers the fundamentals of machine learning using Python. This part covers techniques for feature engineering, data visualization, model building, and model evaluation using Pandas, NumPy, Matplotlib, and Scikit-learn. The examples developed in this part make use of Jupyter Notebook and are aimed at readers who are new to machine learning.
Part 2 covers building machine learning applications using AWS services. This part starts with introducing the basics of commonly used AWS services such as Amazon S3, Amazon DynamoDB, and AWS Lambda. It then proceeds to AWS services that deal specifically with machine learning such as Amazon Comprehend, Amazon Lex, Amazon Machine Learning, and Amazon SageMaker. Two chapters are dedicated to Amazon SageMaker; the first one covers building and deploying models using built-in algorithms and Scikit-learn, and the second one covers building and deploying a model with Google TensorFlow. Not all chapters in this part include source code, but where applicable, you can download the source code that accompanies each chapter using a GitHub link. Some of the chapters in this part require you to upload files to Amazon S3; you will need to substitute the names of buckets in the examples with those from your own account.
The chapters in Part 1 include:
The chapters in Part 2 include:
The appendices cover the following topics:
To help you get the most from the text and keep track of what's happening, we've used a number of conventions throughout the book.
As for styles in the text:
persistence.properties
.We use a monofont type with no highlighting for most code examples.
We use bold type to emphasize code that is of particular importance in the present context.
As you work through the examples in this book, you may choose either to type in all the code manually or to use the source code files that accompany the book. All of the source code used in this book is available for download at www.wiley.com/go/machinelearningawscloud. Also, you can download the code files at GitHub.
We make every effort to ensure that there are no errors in the text or in the code. However, no one is perfect, and mistakes do occur. If you find an error in one of our books, like a spelling mistake or faulty piece of code, we would be very grateful for your feedback. By sending in errata you may save another reader hours of frustration and at the same time you will be helping us provide even higher quality information.
To report errata, email to errata@wiley.com and include