|
Edinburgh Research Archive >
Informatics, School of >
Informatics PhD thesis collection >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/1842/568
| Title: | Compilation Techniques for High-Performance Embedded Systems with Multiple Processors |
| Authors: | Franke, Bjorn |
| Issue Date: | Jul-2004 |
| Publisher: | University of Edinburgh. College of Science and Engineering. School of Informatics. |
| Abstract: | Despite the progress made in developing more advanced compilers for embedded systems,
programming of embedded high-performance computing systems based on Digital
Signal Processors (DSPs) is still a highly skilled manual task. This is true for
single-processor systems, and even more for embedded systems based on multiple
DSPs. Compilers often fail to optimise existing DSP codes written in C due to the
employed programming style. Parallelisation is hampered by the complex multiple address
space memory architecture, which can be found in most commercial multi-DSP
configurations.
This thesis develops an integrated optimisation and parallelisation strategy that can
deal with low-level C codes and produces optimised parallel code for a homogeneous
multi-DSP architecture with distributed physical memory and multiple logical address
spaces. In a first step, low-level programming idioms are identified and recovered. This
enables the application of high-level code and data transformations well-known in the
field of scientific computing. Iterative feedback-driven search for “good” transformation
sequences is being investigated. A novel approach to parallelisation based on a
unified data and loop transformation framework is presented and evaluated. Performance
optimisation is achieved through exploitation of data locality on the one hand,
and utilisation of DSP-specific architectural features such as Direct Memory Access
(DMA) transfers on the other hand.
The proposed methodology is evaluated against two benchmark suites (DSPstone
& UTDSP) and four different high-performance DSPs, one of which is part of a commercial
four processor multi-DSP board also used for evaluation. Experiments confirm
the effectiveness of the program recovery techniques as enablers of high-level transformations
and automatic parallelisation. Source-to-source transformations of DSP
codes yield an average speedup of 2.21 across four different DSP architectures. The
parallelisation scheme is – in conjunction with a set of locality optimisations – able to
produce linear and even super-linear speedups on a number of relevant DSP kernels
and applications. |
| Description: | Institute for Computing Systems Architecture |
| URI: | http://hdl.handle.net/1842/568 |
| Appears in Collections: | Informatics PhD thesis collection
|
Items in ERA are protected by copyright, with all rights reserved, unless otherwise indicated.
|