Big Data: Apache Arrow for Faster and More Efficient Management of Columnar Data

By Dick Weisinger

The Apache Software Foundation recently released Arrow as a new top-level project. Arrow is embeddable software that enables columnar in-memory processing. Structured data is usually managed with SQL operating on rows of a database. Column operations are often tricky to set up, but done right, manipulating columns allows larger data sets to be processed, and certain columnar operations are as much as 100 times faster than doing the same operation in a row-based way.

Jacques Nadeau, chairman of the Apache Drill project, described Apache Arrow as “an accelerator for processing and storage systems. It’s a set of data representations that are much more CPU-efficient… Doing in-memory columnar is hard. Doing rows is easy.”

Arrow provides a common data format that enables multiple systems to share, exchange and communicate data. Nadeau said that using Arrow can eliminate frequent serialization and deserialization of data that often can consume 70 to 80 percent of the processing cycles for some workloads.

Nadeau said that “Cache locality, pipelining and superword operations frequently provide 10-100x faster execution performance. Since many analytical workloads are CPU bound, these benefits translate into performance gains, or more plainly, the potential for faster answers and higher levels of user concurrency.”

Ted Dunning, Vice President of the Apache Incubator, said that “an industry-standard columnar in-memory data layer enables users to combine multiple systems, applications and programming languages in a single workload without the usual overhead.”

March 18th, 2016

Category: Big Data, Structured Data

Leave a Reply Cancel reply

Legal Terms & Disclaimers

This blog site is accessed from the website of Formtek, Inc. All visitors to or users of this blog site are subject to the terms and conditions and privacy policy that govern the Formtek website, links for which are provided above.

Some of the individuals posting to this blog site, including the moderators, work for Formtek. Postings by these individuals are the personal opinions of these individuals, not of Formtek. Their posted content is provided for informational purposes only and is not meant to be an endorsement or representation by Formtek or any other party. Postings to this blog site may be outdated, invalid or inaccurate by the time you read them. Individuals posting to this blog site make no statements, representations or warranties as to the timing, validity, accuracy or reliability of their postings.

This blog site may contain links to third party sites. Access to any third party site linked to this blog site is at your own risk. None of Formtek, the blog site moderator(s) and the individuals posting on this blog site that work for Formtek is responsible for the timing, validity, accuracy or reliability of any information, data, opinions, advice or statements made on these third party sites. These links are provided merely as a convenience and do not imply any endorsement.

Postings to this blog site are available to the public. You should not post, link to or otherwise upload any information considered confidential to this blog site. All postings to this blog site are moderated. Postings will appear if and when they are approved by the moderator. Notwithstanding any approval by the moderator, by posting information to this blog site, you agree to be solely responsible for the information you post, link to, or otherwise upload to the blog site. You agree to release Formtek from any liability related to that information or to your use of the blog site. You grant Formtek a worldwide, perpetual, irrevocable, royalty-free, fully-paid, and transferable (including rights to sublicense) right to exercise all copyright, publicity, and moral rights with respect to any information you post, link to or otherwise upload to this blog site.

Big Data: Apache Arrow for Faster and More Efficient Management of Columnar Data

Leave a Reply Cancel reply

Company

Products and Services

News

Resources

Legal Terms & Disclaimers