Over the past couple of years, as Hadoop has become the dominant paradigm for big data processing, several facts have become clear. First, the Hadoop Distributed File System is the right storage platform for big data. Second, YARN is the resource allocation and management framework of choice for big data environments. Third, and maybe most important, there is no single processing framework that will solve every problem. Although MapReduce is an amazing technology, it doesn’t address every situation.Businesses that rely on Hadoop need a variety of analytical infrastructures and processes to find the answers to their critical questions. They need data preparation, descriptive analysis, search, predictive analysis, and more advanced capabilities like machine learning and graph processing. Also, businesses need a tool set that meets them where they are, allowing them to leverage the skill sets and other resources they already have. Until now, a single processing framework that fits all those criteria has not been available. This is the fundamental advantage of Spark.
comments (0)
Leave a reply