Parallel Computing: December 2013

In essence, parallel computing is composed of tasks executed in parallel. Tasks are represented by functions executed asynchronously. Unfortunately, that is not the tricky part of parallelism. Synchronizing data access from tasks executed in parallel presents a challenge and is notoriously difficult to make it efficient.

To solve the problem we need to retire the traditional memory access. Instead of using the operator ‘new’ and global/static space, where we eventually store references to allocated memory, we need declarative memory access. Memory must be queried. Such memory queries represent side-effects of their enclosing tasks and allow determining how tasks must be executed with regard to each other based on intersections of their queried data. Tasks that have intentions to modify the same data (their side-effects intersects) must be executed sequentially; otherwise, task can be executed in parallel.

Consider the following code:

class Foo {

int X, index;

int Y;

}

task A() {

auto q = yield Foo .X = 3

...

}

task B() {

auto q = yield Foo .X = 3

...

}

task C() {

auto q = select Foo .X = 3

...

}

task D() {

auto q = yield Foo .X = 5

...

}

It’s clear that side-effects of the tasks A and B intersect and they intend to modify the same data since they are using the ‘yield’ keyword. Thus, when the query of the task A or B hits a memory manager while the other task is executing, the task will be suspended and resumed when the other task is complete. On the suspension, the processing core will be yielded to another task.

The task C accesses the same data as the tasks A and B but does not modify them since it is using the ‘select’ keyword. Thus, the task C will be executed without suspension (in parallel) if a memory manager employes MVCC. Side-effect of the task D does not intersect with the tasks above, so it is safe to be executed in parallel without suspension.

Let’s look at another example. Suppose we use MapReduce to analyze big data and build a result report. In this example, we simply find the sum and generate some HTML document. The following figure illustrates decomposition of the computation:

Each node of the computation is a task that can be executed in parallel to other tasks with exception when two tasks combine their results. The following pseudocode illustrates using the memory queries to synchronize the tasks:

class Foo {

int X, index;

real Amount;

}

class Node {

int Index, index;

int Level, index;

bool EndLeaf;

real Sum;

text HTML;

}

task Main() {

int index = 0;

MapReduce<Node> context;

auto q = select Foo .X > 3 and .X < 1000;

foreach( Foo f in q ) {

async Leaf(f, index++, context);

}

async EndLeaf(index, 0, context);

...

}

task Leaf(data, index, context) {

auto q = yield context.Node .Index = index and .Level = 0;

q.insert();

q.Level = 0;

q.Index = index;

q.Sum = data.Amount;

q.HTML = “<span>@q.SUM</span>”;

async Combine(q, index/2, 1, context);

}

task EndLeaf(count, offset, context) {

int index = context.EndLeafIndex(count);

int level = context.EndLeafLevel(count) + offset;

auto q = yield context.Node .Index = index and .Level = level;

if( q == NIL ) { /* came first */

q.insert();

q.Index = index;

q.Level = level;

q.EndLeaf = true;

q.HTML = “<div class=@level>”;

} else { /* came second */

q. EndLeaf = true;

if( q.Index == 0 ) {

/* done, generate output */

context.Complete();

} else {

async EndLeaf(index, level, context);

}

task Combine(node, index, level, context) {

auto q = yield context.Node .Index = index and .Level = level;

if( q == NIL ) { /* came first */

q.insert();

q.Index = index;

q.Level = level;

q.Sum = node.Sum;

q.HTML = “<div class=@level>” + node.HTML;

} else { /* came second */

q.Sum += node.Sum;

q.HTML += node.HTML;

q.HTML += “<span>@q.SUM</span></div>”;

if( q.EndLeaf ) {

if( q.Index == 0 ) {

/* done, generate output */

context.Complete();

} else {

async EndLeaf(index, level, context);

}

} else {

async Combine(q, index/2, level+1, context);

}

The code is supposed to be self-descriptive. A thing to notice is a conflict when two nodes try to query the next level node. The conflict will be resolved automatically by the system and the tasks will be ordered since they have the same side effect. The first task will see that a result of the query is NIL and it will create the node. The system will commit the changes to shared memory automatically when the task is complete and the changes will become visible to other tasks. Than the second task will be executed and it will see an existing node and it will update it. Again, the changes will be committed at the end of the task. The following figure illustrates possible ordering of the tasks on processing cores:

In essence, the memory query technique is next generation of automatic memory management. In the traditional approach, programming languages provide the ability to dynamically allocate memory. Than, it is responsibility of a programmer to use an appropriate synchronization mechanism to control access to allocated memory. Some programming languages are enhanced with garbage collectors which keep track of allocated objects and automatically release allocated memory when objects become unreachable. Modern programming languages also supply data structures (hash-tables, red-black-trees, etc.) to organize the allocated memory in collections. The next generation of automatic memory management combines memory allocation, garbage collection, and data structures in a one service (sort of in-memory database). Than, it can easily detect task’s concurrency conflicts and schedule them appropriately.

The modern parallel programming paradigm is called Automatic Transactional Memory and implemented as C++ library Transactional Entity Framework. It is used in a production for server-side logic development and proved to be effective. The current RC version is available for free with a technical support.

Parallel Computing

Wednesday, December 11, 2013

MODERN PARALLEL PROGRAMMING PARADIGM