Complete Book & Media Supply, LLC.

Back to Search

High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark

AUTHOR	Karau, Holden; Warren, Rachel
PUBLISHER	O'Reilly Media (07/11/2017)
PRODUCT TYPE	Paperback (Paperback)

Description

Apache Spark is amazing when everything clicks. But if you haven't seen the performance improvements you expected, or still don't feel confident enough to use Spark in production, this practical book is for you. Authors Holden Karau and Rachel Warren demonstrate performance optimizations to help your Spark queries run faster and handle larger data sizes, while using fewer resources.

Ideal for software engineers, data engineers, developers, and system administrators working with large-scale data applications, this book describes techniques that can reduce data infrastructure costs and developer hours. Not only will you gain a more comprehensive understanding of Spark, you'll also learn how to make it sing.

With this book, you'll explore:

How Spark SQL's new interfaces improve performance over SQL's RDD data structure
The choice between data joins in Core Spark and Spark SQL
Techniques for getting the most out of standard RDD transformations
How to work around performance issues in Spark's key/value pair paradigm
Writing high-performance Spark code without Scala or the JVM
How to test for functionality and performance when applying suggested improvements
Using Spark MLlib and Spark ML machine learning libraries
Spark's Streaming components and external community packages

Product Format

Product Details

ISBN-13: 9781491943205

ISBN-10: 1491943203

Binding: Paperback or Softback (Trade Paperback (Us))

Content Language: English

More Product Details

Page Count: 356

Carton Quantity: 11

Product Dimensions: 7.00 x 0.70 x 9.20 inches

Weight: 1.20 pound(s)

Feature Codes: Price on Product

Country of Origin: US

Subject Information

BISAC Categories

Computers | Data Science - Data Analytics

Computers | Programming - Open Source

Computers | Data Science - Data Warehousing

Descriptions, Reviews, Etc.

publisher marketing

With this book, you'll explore:

How Spark SQL's new interfaces improve performance over SQL's RDD data structure
The choice between data joins in Core Spark and Spark SQL
Techniques for getting the most out of standard RDD transformations
How to work around performance issues in Spark's key/value pair paradigm
Writing high-performance Spark code without Scala or the JVM
How to test for functionality and performance when applying suggested improvements
Using Spark MLlib and Spark ML machine learning libraries
Spark's Streaming components and external community packages

Author: Karau, Holden

Holden Karau is a software development engineer at Databricks and is active in open source. She is the author of an earlier Spark book. Prior to Databricks she worked on a variety of search and classification problems at Google, Foursquare, and Amazon. She graduated from the University of Waterloo with a Bachelors of Mathematics in Computer Science. Outside of software she enjoys playing with fire, welding, and hula hooping.

List Price $49.99

Your Price $35.99

Out of Stock

+ Receive Inventory Notifications

In Cart!

Paperback