Apache Spark Basics – Accumulators and Broadcast Variables

In my previous post I talked about of RDDs as an abstraction of parallel data processing. Today, I’d like to briefly discuss and set an example for accumulators and broadcast variables. Accumulators counters or sums that can be reliably used in parallel processing native support for numeric types, extensions possible via API workers can modify…

Apache Spark Basics – RDDs and Operation Types

When starting with Apache Spark, a “lightning-fast cluster computing” engine, it is important to understand how Spark fits into the Hadoop ecosystem. This article provides a brief overview of Spark’s distinctive features and its ties Hadoop. Hadoop has been around for about 12 years and it dominated the space of Big Data by providing reliable distributed processing of…

|

Scala SBT project template ready to be imported into Eclipse

Surprising as it sounds, Eclipse doesn’t support sbt out of the box, not even in the Scala IDE. At least I wasn’t able to find a way of how to generate an sbt project from within Eclipse. Hence, I wrote my own bash script which generates a ready-to-use Eclipse-compliant minimalistic sbt project. The script sbt-eclipse.sh (see…