*** This page is not up to date.***

The new projects include (1) Low-Power High-level Synthesis and Code Generation. (2) Architectural Synthesis dealing with Imprecise Specifications.

His research summary and contribution can be found in HTML version or postscript file .

Dr. Edwin Sha's Research Projects:

Dr. Sha's research interests are in parallel and distributed processing and computer architectures in general. The list presented here only give the subset he has been doing. He welcomes anyone who is interested in parallel and distributed processing or computer architectures to chat with him or to develope new project with him. He is also interested in developing projects in network programming and Java systems.

1. Optimal Code Generations for Parallel and Distributed Processing.

Loop nests are usually the most time and memory consuming parts, and are commonly found in scientific applications, such as image processing, computational fluid dynamics, geophysical data analysis, and computer vision. We model uniform loop nests by multi-dimensional (MD) data-flow graphs. Several transformation techniques have been developed, such as MD retiming, MD rotation and MD interleaving. We are able to transform any uniform loop with dimensions more than one to be fully parallel with little increase of memory requirement. We obtained the best-known results in terms of execution length, memory requirement and computation complexity.

2. Optimal Data Scheduling and Partitioning for Parallel and Distributed Systems.

If data are not stored or loaded carefully, memory access may become a dominant factor of performance. Since memory hierarchy is commonly used in practice, such as on-chip memory (local memory) and off-chip memory (remote memory), techniques need to be developed to maximally use on-chip memory for data access. We propose a new problem called data scheduling which decides when and where to store or load data. Partitioning and execution scheduling is then applied to optimally store and load data so that local memory misses is minimized. A technique called carrot-hole data scheduling is now under development. The objective of the project is to design optimal data scheduling algorithms for time and memory consuming scientific applications for parallel architectures.

3. Architecture Dependent Communication and Computation Scheduling.

The performance of parallel or distributed processing applications is highly dependent on communication delays in an architecture. In order to obtain an optimized task/processor assignment, scheduling algorithms considering the communication delays between processors and loop-carry dependencies needs to be investigated. We developed an efficient technique called cyclo-compaction scheduling which takes into account the data transmission delays and loop carried dependency associated with specific target architectures. This technique uses the retiming technique (loop pipelining), implicitly applied, and a task remapping to appropriate processors in order to compact the schedule length and improve the parallelism iteratively while handling the underlying imposed communication environment and resource constraints.

4. Real-Time Fault Tolerance and Real-Time Communication on Parallel Architectures.

An increasing number of applications are demanding real-time performance from their multiprocessor systems. For many of these applications, a failure may produce disastrous results. Several algorithms have been developed to selectively build as much fault tolerance as possible into an existing schedule while observing precedence and hard real-time constraints. For real-time communication, algorithms to optimally select paths and transfer time to satisfy real-time constraints are under development.

5. Hardware/Multi-Software Codesign.

Since the complexity and functionality of the computer systems being built is increasing at a dramatic rate, it is very difficult for custom systems to be designed, built, and tested within an acceptable time period even with the most advanced computer-aided design tools. However, many systems also have time critical parts which must be implemented in hardware. Based on this concept, a practical system would like to use as many as off-shelf processors with the minimum specialized hardware to satisfy given area and time constraints. We call the systems built with this design style the HMS (Hardware/Multi-Software) codesign. Automation tools have being built to support this design style.

6, Time-Efficient and Memory-Efficient Application-Specific Parallel Architectures for Solving Computation-Intensive Applications.

7, Optimization Techniques for Fuzzy Logic.

The applications of fuzzy logic have been widely applied into new industry products recently. For example, in Asia a system with fuzzy logic implies a system with intelligence. Most of the Fuzzy Logic designs, however, are very ad-hoc and few CAD tools have been developed for this purpose. We are focusing on optimization techniques for each stage of a typical process of a fuzzy logic implementation. Currently we are working on the algorithms design for fuzzy logic transformation such as the resulting fuzzy logic has the minimum number of basic rules to be computed and has the most parallelism.

8, High-Level Synthesis and Code Generation for Multimedia Applications.

We focus on multimedia applications such as data compression (JPEG, MPEG, etc.), and 3-D graphics applications. The goal is twofold: First, design and study cost effective hardware (embedding systems) and new instructions for these real-time applications. Second, investigating best compiling techniques taking advantage of the modern instruction-level parallelism and pipelining CPU.

9, Loop Scheduling for Multi-Dimensional Conditional Applications.

Multi-dimensional systems containing nested loops are widely used to model scientific applications such as image processing, geophysical signal processing and fluid dynamics. However, branches within these loops may degrade the performance of pipelined architectures. This project developes theory, supporting hardware and experiments of a novel technique, based on multi-dimensional retiming, for reducing pipeline hazards caused by branches within nested loops. This technique, called {\it Multi-Dimensional Branch Anticipation Scheduling}, is able to achieve optimal schedule length for nested loops containing branch instructions in polynomial time. We further analyze the intricacies of branch anticipation control logic and show that the incurred hardware complexity is affordable.