## Algorithm Analysis and Mapping Environment for Adaptive Computing Systems: Further Results

Eric K. Pauer, Paul D. Fiore, John M. Smith Sanders, a Lockheed Martin Company P.O. Box 868, Nashua, NH 03061 pauer@sanders.com, pfiore@sanders.com, jmsmith@sanders.com

## **Extended Abstract**

We are developing an integrated algorithm analysis and mapping environment particularly tailored for signal processing applications on Adaptive Computing Systems (ACS). Our environment allows a designer to map signal processing algorithms to an ACS faster, by an order of magnitude, than is currently possible.

Our approach has been to focus on three areas of capability critical to the success of adaptive computing and to integrate these capabilities into an open, extensible software framework [1]. The development of the three areas, algorithm analysis, algorithm mapping, and smart generators, are taking advantage of the special characteristics of signal processing algorithms to reduce the time to field the ACS. Figure 1 shows a conceptual view of our environment. These capabilities are being implemented as extensions to the Ptolemy design environment developed at the University of California, Berkeley [2].

Algorithm implementation for ACS requires careful consideration of the appropriate signal representation and the costs of operations. The algorithm analysis capabilities being developed on this program will reduce the effort required to find good ACS implementation choices for a signal processing algorithm. The environment will provide algorithm designers with information about operation counts, including adds, multiplies, and memory accesses, and with analyses of quantization effects related to ACS implementations.

For many DSP problems, reduced precision arithmetic will maintain acceptable system performance. A mapping of an algorithm to an FPGA architecture will be successful if the designer can limit wordlength growth without sacrificing algorithm performance. Wordlength reduction introduces noise into the data stream, so the designer must balance the need for an efficient implementation with output quality. Our environment supports both analytical and simulationbased wordlength optimization. With these capabili-



Figure 1: Algorithm Analysis and Mapping Environment.

ties an algorithm designer will be able to quickly determine the appropriate number of bits for signal representations at all points in the design and to quantify the performance of various implementation choices.

Signal processing algorithm mapping for ACS involves the assignment of functions to different processing elements. On this program we are developing mapping techniques for ACS tailored to signal processing. These capabilities include performance analysis, partitioning assistance, and automatic scheduling and partitioning. The scheduling and partitioning functions will recognize the coarse-grain nature of signal processing dataflow graphs to optimize partitioning for ACS.

Current methods for logic generation for ACS are either built around libraries of functions or around general-purpose logic synthesis. As part of this effort, we are implementing "smart generators" [3] that are extensions of the concept of a parameterized library. These generators will be tailored to signal processing functions and will include rules that capture



Figure 2: Binary FSK Receiver using Winograd FFT.

specific implementation techniques and trade-offs. For example, a smart generator for a complex multiplier is able to trade between a three-multiplier implementation and a four-multiplier implementation according to area and latency constraints. Additionally, we are providing mechanisms to automatically generate both hardware and software interfaces for the resultant ACS. Our target system is a Xilinx XC4062XLbased board [4] from Annapolis Micro Systems [5].

All of the algorithm analysis, mapping, and logic generation capabilities are being developed as extensions to Ptolemy. Ptolemy provides a well documented, object-oriented, open software architecture with implementations in C++ and Java. Our extensions to Ptolemy are being captured in a new ACS domain that separates the interface specification from implementation for each signal processing functional block. The algorithms of interest to this project are represented by dataflow graphs comprised of these functional blocks, following a synchronous dataflow model of computation. We are using a Corona/Core architecture, where each block has a common interface known as the Corona, and one or more implementations, known as the Cores. A retargeting mechanism allows the users change Cores and hence implementation, which moves the dataflow graph between various simulation models (floating point, fixed point) and implementations (C code, VHDL code).

Recently, these ACS tools have been used to automatically implement a Winograd DFT as part of a channelized FSK receiver in ACS (Figure 2). The Winograd algorithmic structure has the minimum number of multiplications for any DFT approach [6], and is thus ideal for FPGA implementation. The tools have also been used to develop an FPGA implementation of a high speed linear FM detector (Figure 3). In both cases, our ACS tools were used to simulate the algorithm, select appropriate fixed point representations, and generate the VHDL implementations. The final FPGA designs were obtained by synthesizing the



Figure 3: Linear FM Detector.

VHDL and performing place and route with commercial tools. The next release of the ACS domain (May 1999) will be part of Ptolemy 0.7.2 and will include these ACS capabilities and demonstrations.

Acknowledgements. Portions of this work were supported by Sanders, a Lockheed-Martin Company internal research and development funding, and by the Defense Advanced Projects Research Agency (DARPA) and the United States Air Force Research Laboratory (AFRL) under Contract No. F33615-97-C-1174.

## References

- [1] E. Pauer, C. Myers, P. D. Fiore, C. M. Crawford, E. A. Lee, J. A. Lundblad, and C. X. Hylands. Algorithm analysis and mapping environment for adaptive computing system. In Proc. Second Annual Workshop on High Performance Embedded Computing (HPEC). MIT Lincoln Laboratory, September 1998.
- Ptolemy home page, University of California, Berkeley. http://ptolemy.eecs.berkeley.edu, 1999.
- [3] P. D. Fiore, C. Myers, J. M. Smith, and E. Pauer. Rapid implementation of mathematical and DSP algorithms in configurable computing devices. In Proc. Configurable Computing: Technology and Applications, part of SPIE Intl. Symposium on Voice, Video and Data Comm., November 1998.
- [4] Xilinx Corporation. The programmable logic data book. http://www.xilinx.com, 1999.
- [5] Annapolis Microsystems. Wildfire: A family of reconfigurable computing engines. http://www.annapmicro.com, 1999.
- [6] P. D. Fiore. Low complexity implementation of a polyphase filter bank. Digital Signal Processing, A Review Journal, 8(2):126-135, April 1998.