Show simple item record

dc.contributor.advisorCintra, Marcelo
dc.contributor.advisorNagarajan, Vijayanand
dc.contributor.authorMcPherson, Andrew John
dc.date.accessioned2016-01-29T14:59:31Z
dc.date.available2016-01-29T14:59:31Z
dc.date.issued2015-11-26
dc.identifier.urihttp://hdl.handle.net/1842/14179
dc.description.abstractModern computers are based on manycore architectures, with multiple processors on a single silicon chip. In this environment programmers are required to make use of parallelism to fully exploit the available cores. This can either be within a single chip, normally using shared-memory programming or at a larger scale on a cluster of chips, normally using message-passing. Legacy programs written using either paradigm face issues when run on modern manycore architectures. In message-passing the problem is performance related, with clusters based on manycores introducing necessarily tiered topologies that unaware programs may not fully exploit. In shared-memory it is a correctness problem, with modern systems employing more relaxed memory consistency models, on which legacy programs were not designed to operate. Solutions to this correctness problem exist, but introduce a performance problem as they are necessarily conservative. This thesis focuses on addressing these problems, largely through compile-time analysis and transformation. The first technique proposed is a method for statically determining the communication graph of an MPI program. This is then used to optimise process placement in a cluster of CMPs. Using the 64-process versions of the NAS parallel benchmarks, we see an average of 28% (7%) improvement in communication localisation over by-rank scheduling for 8-core (12-core) CMP-based clusters, representing the maximum possible improvement. Secondly, we move into the shared-memory paradigm, identifying and proving necessary conditions for a read to be an acquire. This can be used to improve solutions in several application areas, two of which we then explore. We apply our acquire signatures to the problem of fence placement for legacy well-synchronised programs. We find that applying our signatures, we can reduce the number of fences placed by an average of 62%, leading to a speedup of up to 2.64x over an existing practical technique. Finally, we develop a dynamic synchronisation detection tool known as SyncDetect. This proof of concept tool leverages our acquire signatures to more accurately detect ad hoc synchronisations in running programs and provides the programmer with a report of their locations in the source code. The tool aims to assist programmers with the notoriously difficult problem of parallel debugging and in manually porting legacy programs to more modern (relaxed) memory consistency models.en
dc.contributor.sponsorEngineering and Physical Sciences Research Council (EPSRC)en
dc.language.isoenen
dc.publisherThe University of Edinburghen
dc.relation.hasversionFence Placement for Legacy Data-Race-Free Programs via Synchronization Read Detection. Andrew J. McPherson, Vijay Nagarajan, Susmit Sarkar, Marcelo Cintra. Principles and Practices of Parallel Programming (PPoPP’15), San Francisco, California, February 2015. (Extended Abstract)en
dc.relation.hasversionStatic Approximation of MPI Communication Graphs for Optimised Process Placement. Andrew J. McPherson, Vijay Nagarajan, and Marcelo Cintra. Languages and Compilers for Parallel Computing (LCPC’14), Hillsboro, Oregon, September, 2014.en
dc.subjectparallel programmingen
dc.subjectcompliersen
dc.subjectmemory consistencyen
dc.titleEnsuring performance and correctness for legacy parallel programsen
dc.typeThesis or Dissertationen
dc.type.qualificationlevelDoctoralen
dc.type.qualificationnamePhD Doctor of Philosophyen


Files in this item

This item appears in the following Collection(s)

Show simple item record