Linux/Open Source

Cloudera Announces Real-Time Query Engine for Hadoop

Cloudera Announces Real-Time Query Engine for Hadoop
October 24, 2012 1:56PM

Bookmark and Share
Cloudera's Apache-licensed, open-source query engine, Cloudera Impala, is specifically designed for real-time query of data stored in a Hadoop Distributed File System, or HDFS, and in HBase, a non-relational distributed database, and the company said it is the result of two years of in-house development. The queries for Impala can be expressed as SQL.

Eliminate costly downtime!
Find out how with Free White Paper
& enter to win a Samsung Galaxy Note

www.apc.com

There's a new tool for Big Data analysis. On Wednesday, Cloudera announced a real-time query engine for Apache Hadoop, resulting from two years of in-house development efforts.

The engine is an enhancement to Cloudera's Big Data platform, known as Cloudera Enterprise. In describing the query-engine's uniqueness, Cloudera claims this is the first time both real-time and batch operations are available for unstructured and structured data in one massively scalable system.

Cloudera offers a commonly used version of Hadoop, an open-source data framework designed for handling Big Data.

In its announcement, Cloudera said that the new query engine will enable organizations to "process data at petabyte scale and, on the same system, interact with that data in real time to deliver 'speed-of-thought' insights." In short, the company said, the new tool will allow organizations to "ask bigger questions" of their data.

SQL Queries

The Apache-licensed, open-source query engine, Cloudera Impala, is specifically designed for real-time query of data stored in a Hadoop Distributed File System (HDFS) and in HBase, a non-relational distributed database. Interactive queries for Impala can be expressed as SQL.

The company said that Impala operates 10 times as fast as the existing Hive/MapReduce, and can be even faster, depending on the workload. It pointed to cost savings for analyzing Big Data with real-time queries, by using this open-source technology with commodity hardware.

Cloudera said that, in a recent survey it conducted of more than 100 customers, over 70 percent were looking at how to extract value from Big Data. Operational IT efficiency and competitive advantage were cited by the customers as reasons for adopting Hadoop, but the vast majority also indicated they needed faster methods of querying than the batch operations that had been available.

'Most Exciting' Since Hadoop

In its announcement, the company pointed to one of its clients, travel Web site Expedia, which said that it uses the Cloudera Enterprise platform to manage more than 4 petabytes of data. With Impala added, Expedia said the enhanced Enterprise Real-Time Query platform allows the creation of one single platform for Big Data, instead of having to maintain several systems for archiving, extracting, transforming, loading, and analytics.

Cloudera CEO Mike Olson said in a statement that, "until now, enterprises had to limit the work they did with Hadoop because batch-mode processing using MapReduce was just too slow for some business problems." Impala, he explained, will enable organizations to store all their data in Hadoop and "use the same hardware to do both powerful analytics and run real-time queries using industry-standard tools and the SQL language."

In fact, Cloudera co-founder and Chief Scientist Jeff Hammerbacher characterized Impala as "the most exciting open-source project since Hadoop," adding that it was "the most important framework beyond MapReduce for analyzing data stored in HDFS and Hbase."

Tell Us What You Think
Comment:

Name:



 Linux/Open Source
1. New OpenStack Release Available
2. Another Day, Another IoT Consortium
3. Microsoft Preps for Internet of Things
4. CoreOS Intros Managed Linux Service
5. Facebook Unveils Networking Switch




 Most Popular Articles
1. IBM Earmarks $3B for Next-Gen Cloud Computing Chips
2. Microsoft Targets CRM in Government Cloud
3. Amazon Unveils Zocalo for Cloud-Based Collaboration
4. Google CEO: Is the 40-Hour Workweek Really Necessary?
5. More Than Half of Networks Not Ready for Internet of Things

Have an informed opinion on this story?
Send a Letter to the Editor.
We want to know what you think.
Send us your Feedback.

 Related Topics  Latest News & Special Reports

  Lenovo Still in Small Windows Tablets
  How Chrome Eats Your Battery Life
  Cisco Woos More Devs with DevNet
  Investor Wants EMC To Spin Off VMware
  Schools Buy Million Chromebooks in Q2

 Technology Marketplace
Big Data
Unlock your enterprise data's potential. Learn how in the research report.
Are you getting everything you can out of your business data?
 
Business Intelligence
Get real-time, cloud-based information services with Neustar.
 
CIO Issues
Secure and retain skilled technology professionals. Learn how.
 
Cloud Computing
Are you getting everything you can out of your business data?
 
Data Storage
Unlock your enterprise data's potential. Learn how in the research report.
 
Enterprise Hardware
Protect your network with APC Smart-UPS battery backup
Cisco UCS Invicta Series flash memory systems
 
Enterprise I.T.
Register for an upcoming ISACA® certification exam today
Secure and retain skilled technology professionals. Learn how.
 
Enterprise Software
Unlock your enterprise data's potential. Learn how in the research report.
 
Hardware
Protect your network with APC Smart-UPS battery backup
Ferocious productivity. A fearless team of pros. Find Out More
Cisco UCS Invicta Series flash memory systems
 
Network Security
Protect your network with APC Smart-UPS battery backup
 
Small Business
Ferocious productivity. A fearless team of pros. Find Out More