Hadoop ignited a "Cambrian explosion," says its creator

Matt Asay

Updated March 24, 2015 at 9:47 AM

forresters-hadoop-predictions-2015.jpg

Though streaming data tools like Apache Spark get all the press these days, batch-oriented processing tools like Hadoop will be around for a long, long time. While it's easy to assume that streaming will displace batch, the reality is much more nuanced, as Hadoop creator Doug Cutting stressed to me in an interview.

In fact, he notes that while Hadoop sparked a "Cambrian explosion" of big data innovation, we're settling into an era "of more normal evolution, as use of these technologies now spreads through industries."

Batch is(n't) best

While the industry likes to sneer at batch, Cutting and others didn't settle on batch processing because some cabal of coders decided it was the optimal way to deal with data. Rather, as Cutting informed me, it was simply the best place to start:

"It wasn't as though Hadoop was architected around batch because we felt batch was best. Rather, batch, MapReduce in particular, was a natural first step because it was relatively easy to implement and provided great value. Before Hadoop, there was no way to store and process petabytes on commodity hardware using open-source software. Hadoop's MapReduce provided folks with a big step in capability."

Looking back, it's hard to argue with how the industry has evolved. The industry has needed to walk with batch before it could run with streaming data. Back in 2012, I joined real-time analytics vendor Nodeable. Less than a year later, we had to sell the company, as the market for real time hadn't caught up to its promise.

But even in the more comfortable world of batch-oriented Hadoop, the industry is still slow to embrace big data. According to a 2014 Gartner survey, more enterprises are moving into big data pilots and production:

figa-gartnersurvey.png

At the same time, there remains a dearth of understanding of how to effectively put these big data technologies to use to use. This helps to explain why Hadoop, despite being the most well-known big data technology, continues to account for just 3% of all enterprise storage, as 451 Research details.

As Datastax chief evangelist Patrick Mcfadin explained to me in an interview, "Google, Yahoo and Facebook make it sound amazing and sadly, enterprises are looking at how to apply that analytics hammer to all the data. First: collect all the data. Second:... Third: Profit!"

If only it were that easy.

A natural evolution

Part of the problem for enterprises is that big data has moved too fast. Spark, Kafka, MongoDB, Impala, Flume... big data today incorporates a dizzying array of weirdly named technologies that demand a CIO's full attention if she wants her company to remain current.

But for those enterprises that feel they've been left behind by big data's blistering pace, Cutting offers solace:

"I expect that major additions to the stack like [Apache] Spark will slow, that over time we'll stabilize on a set of tools that provide the range of capabilities that most folks demand for their big data applications. Hadoop ignited a Cambrian explosion of related projects, but we'll likely now enter a time of more normal evolution, as use of these technologies now spreads through industries."

While it would appear that sexy new technologies like Spark render Hadoop's MapReduce obsolete, the reality is much more nuanced. As Cutting continues, "There's no either/or, no rejection of what came before, but rather a filling out of potential as this open-source ecosystem has matured."

Patrick Wendell, software engineer at Databricks, agrees. As he informed me, while he doesn't "believe [streaming analytics] is over-hyped, per se," he still feels that "we are just at the beginning of what will likely be a major expansion of streaming workloads over the next few years."

Streaming workloads that won't obviate the need for batch, according to Cutting:

"I don't think there will be any giant shift towards streaming. Rather streaming now joins the suite of processing options that folks have at their disposal. When they need interactive BI, they use Impala, when they need faceted search, they use Solr, and when they need real-time analytics, they use Spark Streaming, etc. Folks will still perform retrospective batch analytics too. A mature user of the platform will likely use all of these."

The golden years of big data

Cutting's perspective makes sense in an enterprise world that is slow to embrace new technologies and just as slow to dump them. With Cobol and mainframes still haunting the halls of private datacenters, it's perhaps too much to expect enterprises to immediately adopt and then drop Hadoop as seemingly sexier technologies come along.

Indeed, a Deutsche Bank research note finds that "CIOs are now broadly comfortable with [Hadoop] and see it as a significant part of the future data architecture. We would expect significant $ commitments in [fiscal year 2015]."

Those same CIOs are likely to discover and grow cozy with Spark, too, as Cutting and Wendell suggest. But that coziness won't eliminate reliance on slower-moving Hadoop. Not in this decade, anyway.

Also see

Yahoo Sports
Former MLB infielder, Little League World Series star Sean Burroughs dies at 43
The seven-year major leaguer collapsed while coaching his son's Little League game on Thursday.
Yahoo Sports
Dolphins owner Stephen Ross reportedly declined $10 billion for team, stadium and F1 race
The value of the Dolphins and Formula One racing is enormous.
Yahoo Sports
The best RBs for 2024 fantasy football, according to our experts
The Yahoo Fantasy football analysts reveal their first running back rankings for the 2024 NFL season.
Yahoo Sports
Juan Soto’s unapologetic intensity and showmanship are captivating the Bronx and rubbing off on teammates: ‘Literally every pitch is theater’
The 2024 Yankees have rediscovered their bravado and hold the second-best record in the AL, thanks in large part to the superstar outfielder.
Yahoo Finance
The FDIC change that leaves wealthy bank depositors with less protection
Affluent Americans may want to double-check how much of their bank deposits are protected by government-backed insurance. The rules governing trust accounts just changed.
Autoblog
Which pickup trucks get the best fuel economy? Here are the tops for gas mileage (or diesel)
Trucks aren't known for being fuel efficient, though times are changing. These are the trucks with the best gas mileage in various segments.
Yahoo Sports
Timberwolves coach Chris Finch calls Jamal Murray's heat-pack toss on court 'inexcusable and dangerous'
Murray made a bad night on the court worse during a moment of frustration on the bench.
Yahoo Sports
Former NBA guard Darius Morris dies at 33
Former NBA guard Darius Morris has died at the age of 33. He played for five teams during his four NBA seasons. Morris played college basketball at Michigan.
Yahoo Sports
Wide receiver rankings for 2024 fantasy football
The Yahoo Fantasy football analysts reveal their first wide receiver rankings for the 2024 NFL season.
Yahoo Finance
Former House Speaker Paul Ryan says he’s not voting for Trump : 'Character is too important'
Ryan says he would be writing in a Republican candidate instead of voting for Donald Trump.
Yahoo Sports
2024 Fantasy Football Mock Draft, 1.0
The Yahoo Fantasy football crew got together for their very first mock draft of 2024. Andy Behrens recaps the results.
Yahoo Sports
Ranking the best situations for the rookie quarterbacks: Start with Michael Penix in Atlanta at No. 1
It’s key to note that we’re not saying the “best team” or “best roster.” Instead, we’re talking about the best confluence of factors that can outline a path for survival and then success.
Yahoo Sports
Yahoo Fantasy staff's Mock Draft 1.0: Shocking picks are plentiful
Teams have made their big splashes in free agency and made their draft picks, it's time for you to do the same. It's fantasy football mock draft time. Some call this time of year best ball season, others know it's an opportunity to get a leg up on your competition for when you have to draft in August. The staff at Yahoo Fantasy did their first mock draft of the 2024 season to help you with the latter. Matt Harmon and Andy Behrens are here to break it all down by each round and crush some staff members in the process.
Engadget
The best budgeting apps for 2024
Budgeting apps can help you keep track of your finances, stick to a spending plan and reach your money goals. These are the best budget-tracking apps available right now.
Yahoo Sports
Post-draft NFL fantasy power rankings: Offenses we love, like and want to stay away from
With free agency and the draft behind us, what 32 teams look like today will likely be what they look like Week 1 and beyond for the 2024 season. Matt Harmon and Scott Pianowski reveal the post-draft fantasy power rankings. The duo break down the rankings in six tiers: Elite offensive ecosystems, teams on the cusp of being complete mixed bag ecosystems, offensive ecosystems with something to prove, offenses that could go either way, and offenses that are best to stay away from in fantasy.
Yahoo Sports
Cardinals lose C Willson Contreras after left arm fractured by J.D. Martinez's swing
The Cardinals' nightmare season continues.
Yahoo Finance
Mortgage rates drop for the first time in five weeks with experts adjusting their forecasts
The average 30-year fixed mortgage rate edged back toward 7% this week but remains elevated, prompting housing experts to revise their forecasts for the rest of 2024.
Yahoo Finance
Recession-proof stocks are leading the market's latest leg higher
The Utilities and Consumer Staples sectors have popped since mid-April as investors search for value.
Yahoo Sports
Report: Suns fire head coach Frank Vogel after 1st-round playoff sweep, eyeing Mike Budenholzer as replacement
Frank Vogel's out after one season in Phoenix failed to produce a playoff win.
Yahoo Sports
Blockbuster May trade by Padres, MVP Ohtani has arrived, Willie Mays’ 93rd birthday & weekend recap
Jake Mintz & Jordan Shusterman discuss the Padres-Marlins trade that sent Luis Arraez to San Diego, as well as recap all the action from this weekend in baseball and send birthday wishes to hall-of-famer Willie Mays.

News

Life

Entertainment

Finance

Sports

New on Yahoo

Hadoop ignited a "Cambrian explosion," says its creator

forresters-hadoop-predictions-2015.jpg

Batch is(n't) best

figa-gartnersurvey.png

A natural evolution

The golden years of big data

Also see

Recommended Stories

Former MLB infielder, Little League World Series star Sean Burroughs dies at 43

Dolphins owner Stephen Ross reportedly declined $10 billion for team, stadium and F1 race

The best RBs for 2024 fantasy football, according to our experts

Juan Soto’s unapologetic intensity and showmanship are captivating the Bronx and rubbing off on teammates: ‘Literally every pitch is theater’

The FDIC change that leaves wealthy bank depositors with less protection

Which pickup trucks get the best fuel economy? Here are the tops for gas mileage (or diesel)

Timberwolves coach Chris Finch calls Jamal Murray's heat-pack toss on court 'inexcusable and dangerous'

Former NBA guard Darius Morris dies at 33

Wide receiver rankings for 2024 fantasy football

Former House Speaker Paul Ryan says he’s not voting for Trump : 'Character is too important'

2024 Fantasy Football Mock Draft, 1.0

Ranking the best situations for the rookie quarterbacks: Start with Michael Penix in Atlanta at No. 1

Yahoo Fantasy staff's Mock Draft 1.0: Shocking picks are plentiful

The best budgeting apps for 2024

Post-draft NFL fantasy power rankings: Offenses we love, like and want to stay away from

Cardinals lose C Willson Contreras after left arm fractured by J.D. Martinez's swing

Mortgage rates drop for the first time in five weeks with experts adjusting their forecasts

Recession-proof stocks are leading the market's latest leg higher

Report: Suns fire head coach Frank Vogel after 1st-round playoff sweep, eyeing Mike Budenholzer as replacement

Blockbuster May trade by Padres, MVP Ohtani has arrived, Willie Mays’ 93rd birthday & weekend recap