Versatile Communication Cost Modelling for Multicomputer Task Scheduling
Programmers face daunting problems when attempting to design portable programs for multicomputers. This is mainly due to the huge variation in communication performance on the range of multicomputer platforms currently in use. These programmers require a computational model that is sufficiently abstract to allow them to ignore machine-specific performance features, and yet is sufficiently versatile to allow the computational structure to be mapped efficiently to a wide range of multicomputer platforms. This dissertation focusses on parallel computations that can be expressed as task graphs: tasks that must be scheduled on the multicomputer's processors. In the past, scheduling models have only considered the message delay as the predominant communication parameter. In the current generation of parallel machines, however, latency is negligible compared to the CPU penalty of communication-related activity associated with inter-processor communication. This CPU penalty cannot be modelled by a latency parameter because the CPU activity consumes time otherwise available for useful computation. In view of this, we consider a model in which the CPU penalty is significant and is associated with communication events that are incurred when applications execute in parallel. In this dissertation a new multi-stage scheduling approach that takes into account these communication parameters is proposed. Initially, in the first stage, the input task graph is transformed into a new structure that can be scheduled with a smaller number of communication events. Task replication is incorporated to produce clusters of tasks. However, a different view of clusters is adopted. Tasks are clustered so that messages are bundled and consequently, the number of communication events is decreased. The communication event tasks are associated with the relationship between the clusters. More specifically, this stage comprises a family of scheduling heuristics that can be customised to classes of parallel machines, according to their communication performance characteristics, through parameterisation and by varying the order in which the heuristics are applied. A second stage is necessary, where the actual schedule on the target machine is defined. The mechanisms implemented analyse carefully the clusters and their relationship so that communication costs are minimised and the degree of parallelism is exploited. Therefore, the aim of the proposed approach is to tackle the min-max problem, considering realistic architectural issues.