Show simple item record

dc.contributor.advisorCole, Murray
dc.contributor.authorHernandez, Jesus Israel
dc.date.accessioned2008-07-09T14:38:09Z
dc.date.available2008-07-09T14:38:09Z
dc.date.issued2008-12-04
dc.identifier.urihttp://hdl.handle.net/1842/2336
dc.descriptionInstitute for Computing Systems Architecture
dc.description.abstractEmerging technologies enable a set of distributed resources across a network to be linked together and used in a coordinated fashion to solve a particular parallel application at the same time. Such applications are often abstracted as directed acyclic graphs (DAGs), in which vertices represent application tasks and edges represent data dependencies between tasks. Effective scheduling mechanisms for DAG applications are essential to exploit the tremendous potential of computational resources. The core issues are that the availability and performance of resources, which are already by their nature heterogeneous, can be expected to vary dynamically, even during the course of an execution. In this thesis, we first consider the problem of scheduling DAG task graphs onto heterogeneous resources with changeable capabilities. We propose a list-scheduling heuristic approach, the Global Task Positioning (GTP) scheduling method, which addresses the problem by allowing rescheduling and migration of tasks in response to significant variations in resource characteristics. We observed from experiments with GTP that in an execution with relatively frequent migration, it may be that, over time, the results of some task have been copied to several other sites, and so a subsequent migrated task may have several possible sources for each of its inputs. Some of these copies may now be more quickly accessible than the original, due to dynamic variations in communication capabilities. To exploit this observation, we extended our model with a Copying Management(CM) function, resulting in a new version, the Global Task Positioning with copying facilities (GTP/c) system. The idea is to reuse such copies, in subsequent migration of placed tasks, in order to reduce the impact of migration cost on makespan. Finally, we believe that fault tolerance is an important issue in heterogeneous and dynamic computational environments as the availability of resources cannot be guaranteed. To address the problem of processor failure, we propose a rewinding mechanism which rewinds the progress of the application to a previous state, thereby preserving the execution in spite of the failed processor(s). We evaluate our mechanisms through simulation, since this allow us to generate repeatable patterns of resource performance variation. We use a standard benchmark set of DAGs, comparing performance against that of competing algorithms from the scheduling literature.en
dc.contributor.sponsorCONACYTen
dc.format.extent2665142 bytes
dc.format.mimetypeapplication/pdf
dc.language.isoenen
dc.relation.hasversionReliable DAG Scheduling with Rewinding and Migration, Israel Hernandez and Murray Cole, to appear in First International Conference on Networks for Grid Applications (GridNets07), ACM Press, 2007.en
dc.relation.hasversionScheduling DAGs on Grids with Copying and Migration, Israel Hernandez and Murray Cole, to appear in Parallel Processing and Applied Mathematics (PPAM07), Springer LNCS, 2007.en
dc.relation.hasversionReactive Grid Scheduling of DAG applications, Israel Hernandez and Murray Cole, Parallel and Distributed Computing and Networks 2007, 25th IASTED International Multi-Conference on Applied Informatics, pages 92-97, ACTA Press, 2007.en
dc.subjectDAG schedulingen
dc.subjectHeterogenous Computingen
dc.subjectReactive Schedulingen
dc.subjectParallel Processingen
dc.titleReactive Scheduling of DAG Applications on Heterogeneous and Dynamic Distributed Computing Systemsen
dc.typeThesis or Dissertationen
dc.type.qualificationlevelDoctoralen
dc.type.qualificationnamePhD Doctor of Philosophyen


Files in this item

This item appears in the following Collection(s)

Show simple item record