|
|
Edinburgh Research Archive >
Informatics, School of >
Informatics thesis and dissertation collection >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/1842/5664
|
| Title: | Machine learning based mapping of data and streaming parallelism to multi-cores |
| Authors: | Wang, Zheng |
| Supervisor(s): | O’Boyle, Michael |
| Issue Date: | 24-Nov-2011 |
| Publisher: | The University of Edinburgh |
| Abstract: | Multi-core processors are now ubiquitous and are widely seen as the most viable means
of delivering performance with increasing transistor densities. However, this potential
can only be realised if the application programs are suitably parallel. Applications
can either be written in parallel from scratch or converted from existing sequential
programs. Regardless of how applications are parallelised, the code must be efficiently
mapped onto the underlying platform to fully exploit the hardware’s potential.
This thesis addresses the problem of finding the best mappings of data and streaming
parallelism—two types of parallelism that exist in broad and important domains
such as scientific, signal processing and media applications. Despite significant
progress having been made over the past few decades, state-of-the-art mapping approaches
still largely rely upon hand-crafted, architecture-specific heuristics. Developing
a heuristic by hand, however, often requiresmonths of development time. Asmulticore
designs become increasingly diverse and complex, manually tuning a heuristic
for a wide range of architectures is no longer feasible. What are needed are innovative
techniques that can automatically scale with advances in multi-core technologies.
In this thesis two distinct areas of computer science, namely parallel compiler design
and machine learning, are brought together to develop new compiler-based mapping
techniques. Using machine learning, it is possible to automatically build highquality
mapping schemes, which adapt to evolving architectures, with little human
involvement.
First, two techniques are proposed to find the best mapping of data parallelism.
The first technique predicts whether parallel execution of a data parallel candidate is
profitable on the underlying architecture. On a typical multi-core platform, it achieves
almost the same (and sometimes a better) level of performance when compared to the
manually parallelised code developed by independent experts. For a profitable candidate,
the second technique predicts how many threads should be used to execute
the candidate across different program inputs. The second technique achieves, on average,
over 96% of the maximum available performance on two different multi-core
platforms.
Next, a new approach is developed for partitioning stream applications. This approach
predicts the ideal partitioning structure for a given stream application. Based
on the prediction, a compiler can rapidly search the program space (without executing
any code) to generate a good partition. It achieves, on average, a 1.90x speedup over
the already tuned partitioning scheme of a state-of-the-art streaming compiler. |
| Keywords: | compilers multi-cores machine learning parallelism parallel programming languages |
| URI: | http://hdl.handle.net/1842/5664 |
| Appears in Collections: | Informatics thesis and dissertation collection
|
Items in ERA are protected by copyright, with all rights reserved, unless otherwise indicated.
|