*** This page is not up to date.***
The new projects include
(1) Low-Power High-level Synthesis and Code Generation.
(2) Architectural Synthesis dealing with Imprecise Specifications.
His research summary and contribution can be found in
HTML version
or
postscript file .
Dr. Edwin Sha's Research Projects:
Dr. Sha's research interests are in parallel and distributed
processing and
computer architectures in general. The list presented here
only give the subset he has been doing.
He welcomes anyone who is interested in parallel and distributed processing
or computer architectures to chat with him or to develope new
project with him. He is also interested in developing projects in
network programming and Java systems.
1. Optimal Code Generations for Parallel and Distributed Processing.
Loop nests are usually the most time and
memory consuming parts, and are commonly found in scientific
applications, such as image processing,
computational fluid dynamics, geophysical data analysis,
and computer vision.
We model uniform loop nests by multi-dimensional (MD) data-flow graphs.
Several transformation techniques have been developed, such as
MD retiming, MD rotation and MD interleaving.
We are able to transform any uniform loop with dimensions more than one
to be fully parallel with little increase of memory requirement.
We obtained the best-known results in terms of execution length, memory
requirement and computation complexity.
2. Optimal Data Scheduling and Partitioning for
Parallel and Distributed Systems.
If data are not stored or loaded carefully, memory access may become a
dominant factor of performance.
Since memory hierarchy is commonly used in practice, such as on-chip memory
(local memory) and off-chip memory (remote memory),
techniques need to be developed to maximally use on-chip memory for data access.
We propose a new problem called data scheduling which
decides when and where to store or load data.
Partitioning and execution scheduling is then applied to optimally
store and load data so that local memory misses is minimized.
A technique called carrot-hole data scheduling is now under
development.
The objective of the project is to design optimal data
scheduling algorithms for time and memory consuming
scientific applications for parallel architectures.
3. Architecture Dependent Communication and Computation Scheduling.
The performance of parallel or distributed processing applications is highly
dependent on communication delays in an architecture.
In order to obtain an optimized task/processor assignment, scheduling
algorithms considering the communication delays between processors and
loop-carry dependencies
needs to be investigated. We developed an efficient technique
called cyclo-compaction scheduling which takes into account the data
transmission delays and loop carried dependency
associated with specific target architectures.
This technique uses the retiming technique (loop pipelining),
implicitly applied, and a task remapping to appropriate processors
in order to compact the schedule length and
improve the parallelism iteratively while handling the underlying
imposed communication environment and resource constraints.
4. Real-Time Fault Tolerance and
Real-Time Communication on Parallel Architectures.
An increasing number of applications are demanding
real-time performance from their multiprocessor systems.
For many of these applications, a failure may produce
disastrous results.
Several algorithms have been developed to
selectively build as much fault tolerance as possible into an existing schedule
while observing precedence and hard real-time constraints.
For real-time communication, algorithms to optimally select paths and
transfer time to satisfy real-time constraints
are under development.
5. Hardware/Multi-Software Codesign.
Since the complexity and functionality
of the computer systems being built is increasing at
a dramatic rate, it is very difficult for custom
systems to be designed, built, and tested within an acceptable
time period even with the most
advanced computer-aided design tools. However, many systems
also have time critical parts which must be implemented in
hardware.
Based on this concept, a practical system would like to
use as many as off-shelf processors with the minimum
specialized hardware to satisfy given area and time constraints.
We call the systems built with this design style the
HMS (Hardware/Multi-Software) codesign.
Automation tools have being built to support this design style.
6, Time-Efficient and Memory-Efficient Application-Specific Parallel
Architectures for Solving Computation-Intensive Applications.
7, Optimization Techniques for Fuzzy Logic.
The applications of fuzzy logic have been widely applied
into new industry products recently.
For example, in Asia a system with fuzzy logic implies a system
with intelligence.
Most of the Fuzzy Logic designs, however, are very
ad-hoc and few CAD tools have been developed for this
purpose.
We are focusing on optimization techniques for each stage
of a typical process of a fuzzy logic implementation.
Currently we are working on the algorithms design for fuzzy logic
transformation such as the resulting fuzzy logic has the minimum number of
basic rules to be computed and has the most parallelism.
8, High-Level Synthesis and Code Generation for
Multimedia Applications.
We focus on multimedia applications such as data
compression (JPEG, MPEG, etc.), and 3-D graphics
applications.
The goal is twofold: First, design and study cost effective
hardware (embedding systems)
and new instructions for these real-time applications.
Second, investigating best compiling techniques taking
advantage of the modern
instruction-level parallelism and pipelining CPU.
9, Loop Scheduling for Multi-Dimensional Conditional Applications.
Multi-dimensional systems containing nested loops are widely used to model
scientific applications such as image processing, geophysical signal processing
and fluid dynamics. However, branches within these loops may degrade the
performance of pipelined architectures. This project developes theory,
supporting hardware and experiments of a novel technique, based on
multi-dimensional retiming, for reducing pipeline hazards caused by branches
within nested loops. This technique, called {\it Multi-Dimensional Branch
Anticipation Scheduling}, is able to achieve optimal schedule
length for nested loops containing branch instructions in polynomial time.
We further analyze the intricacies of
branch anticipation control logic and show that the incurred hardware
complexity is affordable.