Hadoop on MPI
To achieve the goal of fault tolerance at massive scale, the MapReduce model makes huge sacrifices in performance, by using persistent storage to disk for interprocessor communication and synchronization.
The effect of this design decision is that, on exactly the same hardware, MapReduce can be several orders of magnitude slower than the kinds of MPI or BSP (bulk synchronous parallel) clusters that have been routinely used in supercomputing for more than 15 years. Not only does this have a huge negative impact on big data economics, it also rules out any opportunity of using the standard MapReduce model for realtime analytics.
Beyond MapReduce
Google, the inventors of the model, were the first to recognize the throughput and latency problems with the MapReduce model. To get the realtime performance they required, they recently replaced MapReduce in their Google Instant search engine.
Google have also adopted the BSP model as the heart of their Pregel engine for graph analytics at massive scale. The BSP approach to parallel programming, developed by Les Valiant of Harvard and Bill McColl, founder of Cloudscale, enables extremely fast interprocessor communication and synchronization. By structuring global communications into "supersteps", BSP not only enables the fastest possible global communications using MPI, but also enables much simpler checkpointing and recovery from hardware faults, as the global state of the massively parallel computation is easily defined and captured at superstep boundaries. BSP (in the form of Pregel) is now used for more than 20% of all big data analytics at Google, and is replacing the MapReduce model in a number of key areas.
Now every company can experience the competitive advantage that "computing like Google" can offer. HadoopBI brings MPI/BSP computing to the Hadoop ecosystem with a new model and execution engine.
Hadoop ON MPI
To overcome the cost and performance limitations of MapReduce, HadoopBI gives customers not only the big data power and scalability of Apache HDFS, Hadoop and HBase, but also Cloudscale's super-fast and super-scalable HRule engine for realtime analytics and automation. The implementation of the in-memory HRule engine is C++ and MPI, with smart compression for super-fast, node-to-node, bulk synchronous (BSP) communications.
Besides its blistering performance, the massively parallel "HadoopMPI" model also provides the checkpointing and fault tolerance required for continuous big data analytics, and runs on standard server and networking hardware for maximum price-performance.
HadoopBI gives customers the best possible big data solution not only in terms of performance - massive throughput and extremely low latency - but also in terms of economics. HadoopBI is not just the fastest Big Data BI solution, it is also the cheapest at scale.
