Distributed Computations

1 minute read

Comes into being after deploying distributed data storage

Scatter/Gather

Scatter the data to a lots of individual nodes where its processed and gather those results back together.

Data stored locally is the key

Spark : scatter/Gather rater than map-reduce

Map-reduce

Hadoop - legacy pattern

Apache Storm : event based processing rather than Batch processing.

Map reduce

mappers and reducers

1. **Map Phase:**

   +------------------------+      +------------------------+
   |        Input Data      | ---> |        Mapper          |
   +------------------------+      +------------------------+
                                  |   (Key, Value) Pairs    |
                                  +------------------------+
                                         |         |
                                         |         |
                                         |         |
                                  +------------------------+
                                  |     Shuffle & Sort      |
                                  +------------------------+
                                         |         |
                                         V         V
                                 +----------+  +----------+
                                 |   Key    |  |   Key    |
                                 | Partition|  | Partition|
                                 +----------+  +----------+
                                         |         |
                                         V         V
                                  +------------------------+
                                  |      Reducer           |
                                  +------------------------+
                                         |         |
                                         V         |
                                  +------------------------+
                                  |      Output Data       |
                                  +------------------------+

2. **Reduce Phase:**

   +------------------------+
   |     Intermediate      |
   |     Key-Value Pairs   |
   +------------------------+
              |
              V
   +------------------------+
   |        Reducer         |
   +------------------------+
   |    (Key, List of Values)|
   +------------------------+
              |
              V
   +------------------------+
   |      Output Data       |
   +------------------------+

Hadoop

Distributed Computing Framework

map reduce API
map reduce job management
HDFS (Hadoop distributed filesystem)
Enormous eco system
- hbase, hive, pig, zoo keeper, mahaut, sqoop, flume

HDFS

files & directories
metadata management by a replicated master
files stored in large, immutable, replicated blocks

Distributed Computations

Scatter/Gather

Map reduce

Hadoop

HDFS

You may also enjoy

Understanding US Debt and Its Major Lenders

Gatling - Microservices Load Testing

Scrum

Azure Databricks & Apache Spark