This model, coupled with the analysis procedure, simplifies algorithm design on the Cell and enables quick identification of potential implemen- tation bottlenecks. To estimate the execution time of the algorithm, we consider the computational complexity, memory access patterns (DMA transfer sizes and latency), and the com- plexity of branching instructions. We present a complexity model for designing algorithms on the Cell pro- cessor, along with a systematic procedure for algorithm analysis. The Sony-Toshiba-IBM Cell Broadband Engine is a heterogeneous multicore architecture that consists of a traditional microprocessor (PPE), with eight SIMD co- processing units (SPEs) integrated on-chip. We compare the models for SMPs and the MTA and discuss how the difference affects algorithm development, ease of programming, performance, and scalability. We present a performance model for each machine, and use it to analyze the performance of the two algorithms. Previous studies show that for SMPs perfor-mance is primarily a function of non-contiguous memory accesses, whereas for the MTA, it is primarily a function of the number of concurrent operations. In this paper, we consider the performance and scalability of two important combinato-rial algorithms, list ranking and connected components, on two types of shared-memory com-puters: symmetric multiprocessors (SMP) such as the Sun Enterprise servers and multithreaded architectures (MTA) such as the Cray MTA-2. Few parallel graph algorithms on distributed-or shared-memory machines can outperform their best sequential implementation due to long memory latencies and high synchronization costs. Irregular problems such as those from graph theory pose serious challenges for parallel machines due to non-contiguous accesses to global data structures with low degrees of local-ity.
0 Comments
Leave a Reply. |