Access and Feeds

Data Cloud/Big Data: Google Introduces DataFlow as Successor to MapReduce

By Dick Weisinger

Do you feel left behind when it comes to technologies like Hadoop and MapReduce?  The great thing about the rapid speed that technology is changing and obsolescing is that if you miss one trend it’s not long before it’s been superseded by something else.  That lets you leapfrog directly into the newer technology without having wasted time and resources on the older technology.  Although you’ve got to jump in sometime!


Google announced in June that they’ve long ago dropped MapReduce technologies like Hadoop.  And in fact they’re even going to open up their ‘better way’ of analyzing Big Data sets to the public.  It’s part of the Google Cloud Platform.  And the components of the new Google technology called DataFlow have cool names like Flume and MillWheel.

The limitation of MapReduce strategies are that they are run as batch jobs.  To use MapReduce and standard Hadoop, all the data needs to already exist and to have been collected before the job begins.

Greg DeMichillie, Director of Product Management, wrote that “a decade ago, Google invented MapReduce to process massive data sets using distributed computing.  Since then, more devices and information require more capable analytics pipelines—though they are difficult to create and maintain.  Cloud Dataflow makes it easy for you to get actionable insights from your data while lowering operational costs without the hassles of deploying, maintaining or scaling infrastructure. You can use Cloud Dataflow for use cases like ETL, batch data processing and streaming analytics, and it will automatically optimize, deploy and manage the code and resources required.”

Brian Goldfarb, Google Cloud Platform head of marketing, said that with Big Data that “the program models are different. The technologies are different. It requires developers to learn a lot and manage a lot to make it happen.  It [Google DataFlow] is a fully managed service that lets you create data pipelines for ingesting, transforming and analyzing arbitrary amounts of data in both batch or streaming mode, using the same programming model.”

Urs Hölzle, senior vice president of technical infrastructure Google, said that “Cloud Dataflow is the result of over a decade of experience in analytics.  It will run faster and scale better than pretty much any other system out there.”


Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)

Leave a Reply

Your email address will not be published. Required fields are marked *