Backstage Blog

RSS logo

You're browsing posts of the category Hadoop

Data pipelines with Apache Crunch and Java 8

June 1st, 2016 by David Whiting

With Java 8 now in the mainstream, Scala and Clojure are no longer the only choices to develop readable, functional code for big data technology on the JVM. In this post we see how SoundCloud is leveraging Apache Crunch and the new Crunch Lambda module to do the high-volume data processing tasks which are essential at early stages in our batch data pipeline efficiently, robustly and simply in Java 8.

Read more…

SoundCloud in Scalding case study by Concurrent Inc.

December 2nd, 2014 by Josh Devins

Recently we teamed up with Concurrent Inc., the backers of the data-processing framework Cascading, to do a case study of how we use Scalding for some of our data-driven products such as Search. Scalding enables us to iterate quickly, test easily, and it allows for loose coupling of some of our data-processing pipelines.

Check back for future posts about our use of other data-processing tools, and frameworks such as Spark.

Read more…