Three reasons you need to run Spark in the cloud

Matt Asay

Updated June 15, 2015 at 4:35 PM

This article, Three reasons you need to run Spark in the cloud, originally appeared on TechRepublic.com.

The open-source project Apache Spark today is perhaps the most famous spawn from UC Berkeley's AMPLab. Working at the intersection of three massive trends--powerful machine learning, cloud computing, and crowdsourcing--the AMPLab is integrating algorithms, machines, and people to make sense of big data.

Originally written to extend the capabilities of another AMPLab project, Apache Mesos, Spark took off and its co-authors created a startup in 2013 funded by Andreessen Horowitz called Databricks to deliver Spark through a hosted cloud platform that makes it easy for data professionals to leverage the power of Spark.

Spark is hugely appealing as an alternative to Hadoop's MapReduce for munging big data. It combines speed, an easy-to-use programming model, and a unified design that enables users to combine interactive queries, streaming analytics, machine learning, and graph computation within a single system.

Put that power in the cloud with a simple, elegant user experience, and you have a killer platform for anyone doing data exploration and building end-to-end data pipelines. Use a visual analytics application built from scratch for big data, like Zoomdata, and you have a killer value proposition for doing super fast business intelligence (BI) visual analytics.

I spoke to Arsalan Tavakoli, VP of Customer Engagement at Databricks, about how Spark-plus-analytics can be a powerful combination.

TechRepublic: Why Spark on the cloud? I can download and run Spark on premise, so why do I need to rent it from Databricks?

Tavakoli: Obviously, Spark is available as open source. Anyone can download and use it from wide variety of vendors. But when we looked at customers whose big data projects were failing, they had three typical explanations for why.

First, infrastructure management is hard. With on-premise, you are looking at a six- to nine-month ramp to get big data infrastructure into production--sometimes more. Even if you are running it on Amazon Web Services (AWS), you have to write EC2 scripts and get DevOps people involved. It's brittle.

Remember, infrastructure is hard. And companies turn to Spark in large part because of its rapid innovation cycles. They want to get the benefits of a technology improving all the time with hundreds of people contributing. Well, that means it is also technology that moves fast. How long does it take your team to get the latest version deployed and running?

Second, once you get your Spark cluster up and running, what do you do with it? Data scientists tend to work with their favorite languages, like R and Python. Now, they have to figure out how to import their data and get a job up and running. The toolchain necessary to work with standalone Spark can be hard to use for these users. And how do you run your analytics and collaborate with your colleagues?

It's not trivial.

Third, after you have tested out your queries and models, you want to move into production--what does that process look like? In most companies, that means turning your model over to engineering, and that team goes back and re-implements what you think you want on all new infrastructure.

A cloud platform like Databricks removes these three obstacles to Spark adoption and success for your big data initiative by providing an integrated and hosted solution. We give you fully managed and tuned Spark clusters backed by the experts who created Spark. Our platform provides you with an interactive workspace to explore, visualize, collaborate, and publish. When you are ready for production, launch a job with one single click. We automatically create the infrastructure.

Additionally, we provide a rich set of APIs for programmatic access to the platform, which also enables seamless integration of 3rd-party applications.

TechRepublic: Tell me why customers will want to do BI visualizations in the cloud. Are there particular reasons why this delivery is best suited for BI visualization?

Tavakoli: People want to use data to get insights into their business, and data engineers and data scientists are focused on delivering these insights. But unless you are an engineering-oriented company like Pinterest, Netflix, or Facebook, they're just a small part of any organization. There is a much larger user base of business analysts and end users.

For example, the person in marketing who wants to slice and dice data at a high level but doesn't have technical skills. They just want to get their dashboards, or whatever, in a much more constrained decision space.

Smart companies know that they want to help their workers self enable. That is where the role of BI visualization comes in. That's when the questions you have or want to ask are not clearly understood yet. If they were, you'd likely have a domain-specific application.

TechRepublic: So, that's why you partnered with Zoomdata? What benefits do Databricks Cloud users get with this partnership that they would not get otherwise?

Tavakoli: We have a lot of customer use case overlap with Zoomdata. Many of these organizations are the classic early adopters who rely heavily on data engineers and data scientists. All of these organizations also have a major BI warehouse component.

But the next question these companies are asking themselves is: How can I make this simpler for more users? I have all this data that I'm processing with Spark, how can I make it available to users who are not developers?

For this, a BI visualization application is perfect, and Zoomdata proved a great fit for our cloud.

TechRepublic: What are some common use cases you see around this Databricks/Zoomdata joint offering?

Tavakoli: One common one is the AdTech vertical more broadly.

AdTech companies typically have the following flow: they build up their internal database by pulling data from a wide variety of sources, which are then run through an in-depth ETL pipeline and converted into processed form.

Then, each of their customers provides data from the CRM and marketing automation systems that needs to be joined with this internal database to answer questions about the effectiveness of their campaigns. This process is handled by the data engineers and data scientists who test out in-depth theories.

On the other hand, data analysts and product managers want to ask higher level questions, such as what feature in a product is most effective, or they want to know how a mobile ad performed. These are a class of users much more comfortable going through a BI interface like Zoomdata.

Another use case is Internet of Things (IoT). Companies like Automatic Labs take all the data from all the devices in cars. Data scientists look at deeper questions about underlying trends that correlate to the car, cost, and driving patterns.

Non-experts, like account managers, may just want to look at disparate data to correlate to insurance premiums. These people don't want to deal with spinning up a Spark cluster and writing Python or SQL code.

Would your organization consider Spark through a hosted cloud platform? Why or why not? Share your thoughts in the discussion thread below.

Also see

Yahoo Finance
Utility stocks are on fire — here are Wall Street analysts' top picks
Utility stocks are outperforming the broader markets. Here's a look at three top picks from analysts.
Yahoo Sports
Former MLB infielder, Little League World Series star Sean Burroughs dies at 43
The seven-year major leaguer collapsed while coaching his son's Little League game on Thursday.
Yahoo Sports
The best RBs for 2024 fantasy football, according to our experts
The Yahoo Fantasy football analysts reveal their first running back rankings for the 2024 NFL season.
Yahoo Finance
Here's 1 big investing mistake you are probably still making
Maybe a 5% CD isn't the best choice for your hard-earned money.
Yahoo Finance
How rich homebuyers are avoiding high mortgage rates
Homebuyers with means are turning to an old strategy to get around a new crop of high mortgage rates: all-cash deals.
Yahoo Sports
Juan Soto’s unapologetic intensity and showmanship are captivating the Bronx and rubbing off on teammates: ‘Literally every pitch is theater’
The 2024 Yankees have rediscovered their bravado and hold the second-best record in the AL, thanks in large part to the superstar outfielder.
Yahoo Sports
Dolphins owner Stephen Ross reportedly declined $10 billion for team, stadium and F1 race
The value of the Dolphins and Formula One racing is enormous.
Yahoo Finance
The FDIC change that leaves wealthy bank depositors with less protection
Affluent Americans may want to double-check how much of their bank deposits are protected by government-backed insurance. The rules governing trust accounts just changed.
Yahoo Sports
Timberwolves coach Chris Finch calls Jamal Murray's heat-pack toss on court 'inexcusable and dangerous'
Murray made a bad night on the court worse during a moment of frustration on the bench.
Yahoo Sports
Tight end rankings for fantasy football 2024
The Yahoo Fantasy football analysts reveal their first tight end rankings for the 2024 NFL season.
Yahoo Sports
Wide receiver rankings for 2024 fantasy football
The Yahoo Fantasy football analysts reveal their first wide receiver rankings for the 2024 NFL season.
Yahoo Sports
Derrick Lewis strips off shorts, moons crowd in St. Louis after KO win over Rodrigo Nascimento
“I appreciate St. Louis for letting me show my naked ass tonight."
Engadget
The best budgeting apps for 2024
Budgeting apps can help you keep track of your finances, stick to a spending plan and reach your money goals. These are the best budget-tracking apps available right now.
Yahoo Finance
Former House Speaker Paul Ryan says he’s not voting for Trump : 'Character is too important'
Ryan says he would be writing in a Republican candidate instead of voting for Donald Trump.
Yahoo Sports
2024 Fantasy Football Mock Draft, 1.0
The Yahoo Fantasy football crew got together for their very first mock draft of 2024. Andy Behrens recaps the results.
Yahoo Finance
Bud Light sales still falling as Modelo, Coors fight to keep their gains
The competition among beer giants is still brewing.
Yahoo Sports
Blockbuster May trade by Padres, MVP Ohtani has arrived, Willie Mays’ 93rd birthday & weekend recap
Jake Mintz & Jordan Shusterman discuss the Padres-Marlins trade that sent Luis Arraez to San Diego, as well as recap all the action from this weekend in baseball and send birthday wishes to hall-of-famer Willie Mays.
Yahoo Sports
Please save 'Inside the NBA'
Appreciate 'Inside the NBA' while it's still here, because if this goes away, there may never be anything as good again.
Yahoo Finance
Social Security just passed Medicare as the government's most pressing insolvency risk
An annual government report offered a glimmer of good news for Social Security and a jolt of good news for Medicare even as both programs continue to be on pace to run dry next decade.
Yahoo Sports
Yahoo Fantasy staff's Mock Draft 1.0: Shocking picks are plentiful
Teams have made their big splashes in free agency and made their draft picks, it's time for you to do the same. It's fantasy football mock draft time. Some call this time of year best ball season, others know it's an opportunity to get a leg up on your competition for when you have to draft in August. The staff at Yahoo Fantasy did their first mock draft of the 2024 season to help you with the latter. Matt Harmon and Andy Behrens are here to break it all down by each round and crush some staff members in the process.

News

Life

Entertainment

Finance

Sports

New on Yahoo

Three reasons you need to run Spark in the cloud

Also see

Recommended Stories

Utility stocks are on fire — here are Wall Street analysts' top picks

Former MLB infielder, Little League World Series star Sean Burroughs dies at 43

The best RBs for 2024 fantasy football, according to our experts

Here's 1 big investing mistake you are probably still making

How rich homebuyers are avoiding high mortgage rates

Juan Soto’s unapologetic intensity and showmanship are captivating the Bronx and rubbing off on teammates: ‘Literally every pitch is theater’

Dolphins owner Stephen Ross reportedly declined $10 billion for team, stadium and F1 race

The FDIC change that leaves wealthy bank depositors with less protection

Timberwolves coach Chris Finch calls Jamal Murray's heat-pack toss on court 'inexcusable and dangerous'

Tight end rankings for fantasy football 2024

Wide receiver rankings for 2024 fantasy football

Derrick Lewis strips off shorts, moons crowd in St. Louis after KO win over Rodrigo Nascimento

The best budgeting apps for 2024

Former House Speaker Paul Ryan says he’s not voting for Trump : 'Character is too important'

2024 Fantasy Football Mock Draft, 1.0

Bud Light sales still falling as Modelo, Coors fight to keep their gains

Blockbuster May trade by Padres, MVP Ohtani has arrived, Willie Mays’ 93rd birthday & weekend recap

Please save 'Inside the NBA'

Social Security just passed Medicare as the government's most pressing insolvency risk

Yahoo Fantasy staff's Mock Draft 1.0: Shocking picks are plentiful