Faculty

NCTU CSIE

Computer Science and Information Engineering

Faculty
     Eduation  |  Professional Background  |  Research  |  Publications  |  Research and Project Abstracts





CHUNG-PING CHUNG, PROFESSOR
鍾崇

EC524 54728
Homepage Send Email

Education:

June 1981 - August 1986

Ph. D. in electrical engineering, major field: digital, Texas A&M University, College Station, Texas, U. S. A.

January 1980 - May 1981

M. E. in electrical engineering, major field: digital, Texas A&M University, College Station, Texas, U. S. A.

October 1972 - June 1976

B. E. in electrical engineering, National Cheng-Kung University, Taiwan, R. O. C.

 

Professional Background:

August 1978 - August 1979

Programmer, Tatung Company, Taipei, Taiwan, R. O. C.

September 1979 - December 1979

Programming Consultant, Computer Center, University of Detroit, Detroit, Michigan, U. S. A.

September 1980 - December 1985

Teaching Assistant, Department of Electrical Engineering, Texas A&M University, College Station, Texas, U. S. A.

January 1986 - August 1986

Lecturer, Department of Electrical Engineering, Texas A&M University, College Station, Texas, U. S. A.

August 1986 to July 1993

Associate Professor, Department of Computer Science and Information Engineering, National Chiao Tung University, Hsinchu, Taiwan, R. O. C.

September 1991 to July 1992

Visiting Associate Professor, Department of Computer Science, Michigan State University, East Lansing, Michigan, U. S. A.

August 1993 to date

Professor, Department of Computer Science and Information Engineering, National Chiao Tung University, Hsinchu, Taiwan, R. O. C.

 

Research Interests:

Computer architecture, parallel processing, VLSI system design, system simulation.

 

Research Plans

1. Computer architecture design:

Contemporary processor design trends will be studied, and feasible new computer architectures will be proposed. Research topics include: uniprocessor architectural design, multiprocessor architectural design, distributed systems architectural design, and massively parallel processor architectural design.

 

2. Study of parallel processing:

Based on the characteristics of different granularities in exploiting parallelism, the principles and methodologies of the various parallel processing techniques will be investigated. Such techniques include multiprocessing, multithreaded processing, concurrent instruction execution, and pipelined instruction execution.

 

3. Study of compiler techniques:

This topic concerns the parallelizing compiler design, which outlines the most difficult and challenging issues in high-performance computer systems design. Issues to be tackled should involve instruction scheduling, register allocation, procedure partitioning, synchronization, etc. Both modeling and simulation will be used in this aspect in different scenarios.

 

4. Parallel memory design:

As the design philosophy of computers evolve rapidly, there is always a great deficiency in memory design. Research interests here include designs of multiple, multilevel, or multibank cache memories, memory allocation, data distribution/consistency, and memory interleaving in the various memory hierarchies.

 

5. System integration:

This topic concerns the task of system design by putting altogether the research efforts on the many system components. As a result, some basic characteristics of operating system need also be studied. And the simulation and cost/performance analysis and evaluation will be stressed.

 

Publication List

A. Refereed Papers

    1. Chung-Ping Chung, Shyi-Chyi Cheng, Hong-Chich Chou, and Cheng Chen, “Design of The Dual-ALU CRISC and Its Concurrent Execution,” Journal of Information Science and Engineering, Vol. 5, No 3, pp. 251-274, July 1989.
    2. Hong-Chich Chou, Chung-Ping Chung and Shyi-Chyi Cheng, “Dual-ALU CRISC Architecture and Its Compiling Technique,” Computers and Electrical Engineering, Vol. 17, No. 4, pp. 297-312, 1991.
    3. Cheng Chen, Chung-Ping Chung, Cheng-Chin Chian, Hsin-Chia Fu, and S. J. Wang, “An Or-Parallel Inference Model Based on Multi RISC-Style Processing System,” Journal of Information Science and Engineering, Vol. 7, No. 4, pp. 487-512, Dec. 1991.
    4. Hong-Chich Chou and Chung-Ping Chung, “A Bound Analysis of Scheduling Instructions on Pipelined Processors with a Maximal Delay of One Cycle,” Parallel Computing, Vol. 18, No. 4, pp. 393-399, April 1992.
    5. Yuh-Horng Shiau and Chung-Ping Chung, “Adoptability and Effectiveness of Microcode Compaction Algorithms in Superscalar Processing,” Parallel Computing, Vol. 18, No. 5, pp. 497-510, May 1992.
    6. Yuh-Horng Shiau and Chung-Ping Chung, “The Statistical model of CRAY X-MP Vector Accesses,” Journal of Chinese Institute of Engineers, Vol. 15, No. 5, pp. 611-616, September 1992.
    7. Chung-Ping Chung and Wen-Yang Lin, “Vectorization of Sorting Algorithms,” International Journal of High Speed Computing, Vol. 4, No. 3, pp. 213-232, September 1992.
    8. Hong-Chich Chou and Chung-Ping Chung, “Upper Bound Analysis of Scheduling Arbitrary-Delay Instructions on Typed Pipelined Processors,” International Journal of High Speed Computing, Vol. 4, No. 4, pp. 301-312, Dec. 1992.
    9. Ren-Lianq Cheng and Chung-Ping Chung, “Reaching Approximate Agreement on Hypercube,” Parallel Computing, Vol. 19, No. 7, pp. 765-775, July 1993.
    10. Ruey-Liang Ma, Chung-Ping Chung and Cheng Chen, “A Register Window Scheduling Method for Prolog,” Journal of the Chinese Institute of Engineers, Vol. 16, No. 6, pp. 793-806, November 1993.
    11. Hong-Chich Chou and Chung-Ping Chung, “Modeling of Superscalar Instruction Scheduling and Analysis of a Heuristic Scheduling Algorithm,” BIT, Vol. 33, pp. 354-371, 1993.
    12. Yuh-Horng Shiau and Chung-Ping Chung, “Effects and Handling of Instruction Class Contention in Superscalar Processing,” International Journal of High Speed Computing, Vol. 6, No. 3, pp. 357-373, Sep. 1994.
    13. Yuh-Horng Shiau and Chung-Ping Chung, “Effects of Hardware Enhancements of Superscalar Performance,” Journal of Information Science and Technology, Vol. 2, No. 3, 1993.
    14. Yuh-Horng Shiau and Chung-Ping Chung, “Benchmarking and Analysis of Superscalar Architecture,” Journal of the Chinese Institute of Engineers, Vol. 17, No. 2, pp. 169-177, March 1994.
    15. Hong-Chich Chou and Chung-Ping Chung, “Optimal Multiprocessor Task Scheduling Using Dominance and Equivalence Relations,” to appear in Computers and Operations Research, Vol. 21, No. 4, pp. 463-475, 1994.
    16. Hong-Chich Chou and Chung-Ping Chung, “An Optimal Instruction Scheduler for Superscalar Processor,” to appear in IEEE Transactions on Parallel and Distributed Systems, Vol. 6, No. 3, pp. 303-313, March, 1995.
    17. Ruey-Liang Ma and Chung-Ping Chung, “Periodic Adaptive Branch Prediction for Superscalar Processing in Prolog,” the Computer Journal, Vol. 38, No. 6, pp. 457-470, 1995.
    18. Ruey-Liang Ma and Chung-Ping Chung, “Branch Prediction for Enhancing Fing-grained Parallelism in Prolog,” Journal of Information Science and Technology, Vol. 4, No. 2, 1995.
    19. Hong-Chich Chou and Chung-Ping Chung, “On the Upper Bound of Scheduling Instructions on Pipelined Processors with Delay,” Journal of the Chinese Institute of Engineers, Vol. 18, No. 1, pp. 101-108, Jan. 1995.
    20. Neng-Pin Lu and Chung-Ping Chung, “Memory System Design in Superscalar Processing,” International Journal of High Speed Computing, Sep. 1995, Vol. 7, No. 3, pp. 421-443.
    21. Ren-Liang Cheng and Chung-Ping Chung, “An Approximate Agreement Algorithm for Wraparound Meshes,” International Journal of High Speed Computing, Vol. 7, No. 3, pp. 407-419, Sep. 1995.
    22. Neng-Pin Lu and Chung-Ping Chung, “A Fault Tolerant Multistage Combining Network,” Journal of Parallel and Distributed Computing, Vol. 34, pp. 14-28, 1996.
    23. Ren-Liang Cheng and Chung-Ping Chung, “Local Interactive Convergence on Hypercube,” International Journal of Computers & Applications, Vol. 19, No. 1, pp. 1-5, 1997.
    24. Tang-Show Hwang, Neng-Pin Lu and Chung-Ping Chung, “Delay Precise Invalidation A Software Cache Coherence Scheme,” IEE Proceedings: Computers and Digital Techniques, Vol. 143, No. 5, pp. 337-344, Sep. 1996.

 

B. Conference Papers

    1. C. P. Chung, T. T. Tsai, et al., “VLSI Design of Fast RISC-Style Prolog Machine,” Proceedings of 1987 International Symposium on VLSI Technology, System, and Applications, Taipei, Taiwan, R. O. C., May 13-15, 1987, pp. 369-372.
    2. C. P. Chung, H. C. Fu, et al., “Study of Artificial Intelligence Multiprocessor System,” Proceedings of the Seventh Workshop on Computer System Technology, Nantou, Taiwan, R. O. C., Aug 12-15, 1987, pp.307-354.
    3. C. P. Chung, C. C. Chiang, et al., “A Further Performance Evalua-tion on LISCP, A Fast RISC-Style Prolog Machine,” Proceedings of National Computer Symposium 1987, Taipei, R. O. C., Dec. 17-18, 1987, pp. 11-20.
    4. C. P. Chung, T. T. Tsai, et al., “VLSI Design and Implementation of LISCP--A Fast RISC-Style Prolog Machine,” Proceedings of National Computer Symposium 1987, Taipei, R. O. C., Dec. 17-18, 1987, pp. 30-39.
    5. C. P. Chung and R. L. Cheng, “An Efficient Cache Consistency Protocol for the Shared-Bus Multiprocessor Systems,” Proceedings of National Computer Symposium 1987, Taipei, R. O. C., Dec. 17-18, 1987, pp. 95-101.
    6. C. P. Chung, C. C. Chung, et al., “A Study of Parallel Execution Model for Prolog on a RISC-Style Multiprocessor System,” Proceedings of National Computer Symposium 1987, Taipei, R. O. C., Dec.17-18, 1987, pp. 102-111.
    7. C. P. Chung, H. C. Fu, et al., “A Multiprocessor System for Prolog Processing,” Proceedings of the Second IEEE Conference on Computer Workstations, Santa Clara, CA, U. S. A., Mar. 7-10, 1988, pp. 60-69.
    8. C. P. Chung, H. C. Fu, et al., “The Study of AI Multiprocessor System,” Proceedings of The Eighth Workshop on Computer System Technology, R. O. C., Aug. 7-9, 1988, pp. 101-124.
    9. C. P. Chung, S.C. Jeng and C. Chen, “Design of the Dual-ALU CRISC and Its Dual-Stream Instruction Execution,” Proceedings of Acer Student Thesis Awards, Taipei, R. O. C., Sept 1988, pp. 115-137.
    10. C. P. Chung, Z. C. Hwang, et al., “Memory Subsystem of the MCRISC,” Proceedings of International Computer Symposium 1988, Taipei, R. O. C., Dec. 15-17, 1988, pp. 342-348.
    11. C. P. Chung, H. C. Chow, et al., “The Study and Realization of The CRISC Code Compaction Methodology,” Proceedings of International Computer Symposium 1988, Taipei, R. O. C., Dec. 15-17, 1988, pp. 349-354.
    12. C. P. Chung, C. C. Chiang, et al., “Design and Implementation of a Feasible Run-Time Intelligent Backtracking Scheme for Prolog,” Proceedings of International Computer Symposium 1988, Taipei, R. O. C., Dec. 15-17, 1988, pp. 659-664.
    13. C. P. Chung, T. C. Chang, et al., “Design of LISCP-II: An Improved RISC-Style Processor for Prolog,” Proceedings of International Computer Symposium 1988, Taipei, R. O. C., Dec. 15-17, 1988, pp. 665-670.
    14. C. P. Chung, Y. M. Hsu, et al., “A New AND-Parallel Execution Model for Prolog--Forward Execution Method,” Proceedings of International Computer Symposium 1988, Taipei, R. O. C., Dec. 15-17, 1988, pp. 800-805.
    15. C. P. Chung and S. J. Fu, “Degenerate Corner Stitching — A Data-Structuring Technique for Interactive VLSI Layout Tools,” Proceedings of International Computer Symposium 1988, Taipei, R. O. C., Dec. 15-17, 1988, pp. 1123-1128.
    16. C. P. Chung, Q. Z. Wu, and K. T. Sun, “ANDROR: A Parallel Execution Model for Logic Programs,” Proceeding of IASTED Expert Systems Theory and Applications, Zurich, Switzerland, June 26-28, 1989.
    17. C. P. Chung, “Study and Design of A High-Speed Computer Architecture,” Proceedings of MIST Workshop of 1989, Hsinchu, Taiwan, R. O. C., Oct 17 and 18, 1989, pp. C1-1 to C1-21.
    18. C. P. Chung and R. L. Ma, “Design and Considerations of A Prolog Compiler for LISCP-II,” Proceedings of Acer Student Thesis Awards, Taipei, R. O. C., Oct 1989, pp.347-358.
    19. C. P. Chung and Y. H. Shiau, “Study of Cray X-MP Vector Accesses Using Different Storage Schemes,” Proceedings of National Computer Symposium 1989, Taipei, R. O. C., Dec 21 and 22, 1989, pp. 385-394.
    20. C. P. Chung and S. W. Tung, “Performance Evaluation of LISCP-II, a Prolog Machine,” Proceedings of National Computer Symposium 1989, Taipei, R. O. C., Dec 21 and 22, 1989, pp.478-487.
    21. C. P. Chung and S. W. Tung, “Register File Design of LISCP-II, a Prolog Machine,” Proceedings of National Computer Symposium 1989, Taipei, R. O. C., Dec 21 and 22, 1989, pp. 564-573.
    22. C. P. Chung and R. L. Ma, “Dynamic Database Management of LISCP-II Prolog Compiler,” Proceedings of National Computer Symposium 1989, Taipei, R. O. C., Dec 21 and 22, 1989, pp. 584-593.
    23. Q. Z. Wu, K. T. Sun, J. Z. Lin and C. P. Chung “ANDROR: A Parallel Execution Model for Logic Programs,” Proceedings of National Computer Symposium 1989, Taipei, R. O. C., Dec. 21-22, 1989, pp. 657-665.
    24. I. K. Chou, C. P. Chung and C. Chen, “A Loop Partitioning Method for Multitasking in a High Speed Multiprocessing Architecture,” Proceedings of MIST Workshop of 1990, Hsinchu, Taiwan, R. O. C., Oct. 17, 1990, pp. C5-1-C5-29.
    25. C. P. Chung and C. H. Tsai, “Study of Vectorization of FFT,” Proceedings of International Computer Symposium 1990, Hsinchu, Taiwan, R. O. C., Dec.17-19, 1990.
    26. I. K. Chou, C. P. Chung and C. Chen, “A Dependence-Based Loop Partitioning Method for Multitasking in a Vector Computer,” Proceedings of International Computer Symposium 1990, Hsinchu, Taiwan, R. O. C., Dec. 17-19, 1990.
    27. C. Chen, C. P. Chung, C. C. Chiang, H. C. Fu, T. C. Chang, R. L. Ou and S. J. Wang, “Parallel Inference Model Based on Multiple RISC-Style Processing System,” Proceedings of First Workshop on Parallel Processing, Hsinchu, Taiwan, R. O. C., Dec. 20-21, 1990.
    28. C. P. Chung, Y. K. Chen and Y. H. Shiau, “A Hardware Approach to Parallel Instruction Decoding and Issuing,” Proceedings of National Computer Symposium 1991, Chungli, Taiwan, R. O. C., Dec. 1991, pp. 117-124.
    29. R. L. Ma, C. P. Chung and C. Chen, “A Register File management Method for Prolog System,” Proceedings of 1992 International Computer Symposium, Taichung, Taiwan, R. O. C., Dec. 13-15, 1992, pp. 127-134.
    30. H. C. Chou and C. P. Chung, “Optimal Multiprocessor Task Scheduling Using Dominance and Equivalence Relations,” Proceedings of 1992 International Computer Symposium, Taichung, Taiwan, R. O. C., Dec. 13-15, 1992, pp. 707-714.
    31. Y. H. Shiau and C. P. Chung, “Effects of Class Conflicts in Superscalar Processing,” Proceedings of 1992 International Computer Symposium, Taichung, Taiwan, R. O. C., Dec. 13-15, 1992, pp. 1182-1188.
    32. C. Z. Lin, C. C. Tseng and C. P. Chung, “Analyzing Cache Performance on Multi-Stream Execution Processor,” Proceedings of IEEE TENCON'93, Beijing, China, Oct. 19-21, 1993.
    33. N. P. Lu and C. P. Chung, “Memory System Design in Superscalar Processing,” Proceedings of National Computer Symposium 1993, Chaiyi, Taiwan, R. O. C., Dec. 1993, pp. 95-105.
    34. C. P. Chung and Y. H. Shiau, “Constructing Register Live Ranges with Maximum Instruction Parallelism Retained,” Proceedings of National Computer Symposium 1993, Chaiyi, Taiwan, R. O. C., Dec. 1993, pp. 901-911.
    35. N. P. Lu, T. S. Hwang and C. P. Chung, “Design of Memory System Supporting Speculative Store,” Proceedings of 1994 Workshop on Computer System Applications, Nantou, R. O. C., April 22 and 23, 1994, pp. 33-37.
    36. R. L. Ma, D. L. Liu and C. P. Chung, “Reducing Branch Overhead and Enhancing Fire-Grained Parallelism in Prolog System,” Proceedings of 1994 Workshop on Computer System Applications, Nantou, R. O. C., April 22 and 23, 1994, pp. 38-42.
    37. Y. Y. Chiang, C. Wu and C. P. Chung, “Implementation of A Low Cost Distributed Debugger,” Proceedings of 1994 Workshop on Computer System Applications, Nantou, R. O. C., April 22 and 23, 1994, pp. 43-47.
    38. Y. Y. Chiang, C. Wu and C. P. Chung, “The Design of a Computer Assisted Instruction System for Computer Organization Learning,” Proceedings of 1994 International Conference on Engineering Education, Taipei, pp. 229-234.
    39. C. P. Chung and R. L. Ma, “The Analysis of Instruction Level Parallelism in Prolog Superscalar Processor,” 第一屆三軍官校基礎學術研討會論文集, 高雄鳳山, 中華民國八十三年六月三日, pp. 699-706.
    40. T. S. Hwang, N. P. Lu and C. P. Chung, “A Software-Based Cache Coherence Scheme with Delay Invalidation,” Proceedings of 1994 Workshop on Advanced Information Systems, Hsinchu, Taiwan, R. O. C., May 25, 1994, pp. 47-67.
    41. Y. Y. Chiang, C. Wu and C. P. Chung, “A Distributed System Model for the Training Simulator of Marine Diesel Propulsion Systems,” Proceedings of The Third International Conference on Automation Technology, Taipei, Taiwan, R. O. C., July 1994, Vol. 6, pp. 69-73.
    42. R. L. Ma and C. P. Chung, “The Simulation and Analysis of Instruction Level Parallelism in Prolog Superscalar Processor,” Proceedings of International Computer Symposium 1994, Hsinchu, Taiwan, R. O. C., Dec. 12-15, 1994, pp. 55-60.
    43. N. P. Lu and C. P. Chung, “A Cache Coherence Protocol for Speculative Execution in Multiprocessors,” Proceedings of International Computer Symposium 1994, Hsinchu, Taiwan, R. O. C., Dec. 12-15, 1994, pp. 179-186.
    44. C. W. Chen and C. P. Chung, “Time Interval-Based Coloring Approach to Register Allocation,” Proceedings of International Computer Symposium 1994, Hsinchu, Taiwan, R. O. C., Dec. 12-15, 1994, pp. 315-321.
    45. T. S. Hwang and C. P. Chung, “Delay Precise Invalidation -- A Software Cache Coherence Scheme,” Proceedings of the 1994 International Conference on Parallel and Distributed Systems, Hsinchu, Taiwan, R. O. C., Dec. 19-21, 1994, pp. 524-529.
    46. R. L. Ma and C. P. Chung, “Branch Prediction for Enhancing Fine-Grained Parallelism and Speedup Prolog Execution,” Proceedings of the 1994 International Conference on Parallel and Distributed Systems, Hsinchu, Taiwan, R. O. C., Dec. 19-21, 994, pp. 744-751.
    47. C. P. Chung and N. P. Lu, “A Speculative Memory Access Technique: Speculative Store,” 八十四年度陸軍官校電機資訊基礎學術研討會, 1995, pp. 159-166.
    48. R. L. Ma and C. P. Chung, “Architectural Tradeoffs between SPS--a Superscalar Prolog System and PUMTS--a Parallel Unification Multi-Thread System,” Workshop on CPU Research and Development, 1995, pp. 3-10.
    49. Kelvin Lin, N. P. Lu, Y. C. Ma, Pei Ouyang and C. P. Chung, “A Cache Coherence Protocol for Clustered Multiprocessors,” Workshop on Distributed System Technologies and Applications, 1995, pp. 51-57.
    50. C. P. Chung, N. P. Lu, and T. S. Hwang, “Study of Memory System Design for Superscalar Multiprocessors,” Workshop on High Performance Multiprocessor Systems, 1995, pp. 3-7.
    51. C. C. Liu, R. M. Shiu and C. P. Chung, “Register Renaming for x86 Superscalar Design,” Proceedings of the 1996 International Conference on Parallel and Distributed Systems, Tokyo, Japan, June 3-6, 1996, pp. 336-343.
    52. Neng-Pin Lu and Chung-Ping Chung, Apr. 1996, “Speculative Store in Distributed-Memory Multiprocessors,” Proceedings of 1996 International Conference Computer Systems Technology for Industrial Applications, pp. 81-88.
    53. Neng-Pin Lu and Chung-Ping Chung, May 1996, “Evaluating Cache-Coherent Write-Policies for Speculative Memory Access,” Proceedings of 1996 Workshop on Distributed System Technologies and Applications, pp. 91-98.
    54. Hsiou-Ping Tsai, Yen-Yuan Chiang, and Chung-Ping Chung, Kong Dar Fan, July 1996, “A Novel Medium Access Control Protocol and Its Implementation for Wireless PCNs,” Second Workshop on Real Time and Media Systems, pp. 41-47.
    55. Kelvin Lin, Neng-Pin Lu, Yeong-Chang Maa, and Chung-Ping Chung, December 1996, “Enhancing The SCI Cache Coherence Protocol for Multiprocessor Clusters,” Proceedings of International Conference on Computer Architecture, pp. 185-192.
    56. W. Y. Shieh, Y. U. Chiang, and C. P. Chung, January 1997, “A hypercube-Style Video-on-Demand server Architecture,” The 11th International Conference on Information Networking.
    57. Lee-Ren Ton, Lung-Chung Chang, Min-Fu Kao, Han-Min Tseng, Shi-Sheng Shang, Ruey-Liang Ma, Dze-Chaung Wang and Chung-Ping Chung, “Instruction Folding in Java Processor,” Proceedings of the 1997 International Conference on Parallel and Distributed Systems, December 10-13, 1997, pp.138-143.
    58. Shyh-An Chi, R-Ming Shiu, Jih-Chiang Chiu, Si-En Chang, and Chung-Ping Chung, “Instruction Cache Prefetching with Extended BTB,” Proceedings of the 1997 International Conference on Parallel and Distributed Systems, December 10-13, 1997, pp. 360-365
    59. Neng-Pin Lu and Chung-Ping Chung, “Parallelism Exploitation in Superscalar Multiprocessing,” Proceedings of National Computer Symposium 1997, pp. C-82 – C-88.

 

C. Other Publication List

    1. C. P. Chung, “A VLSI Cache RISC for the C Language,” Ph.D. dissertation, Texas A&M University, Texas, U. S. A., August 1986.
    2. “Study of AI Multiprocessor Systems,” National Science Council research project report, March 1987.
    3. “Design of a Multiprocessor System for the C Language,” ERSO, ITRI research project report, July 1987.
    4. “First Quarter of Research in System Simulation Techniques, on Continuous System Simulation,” Sun Yet-Sien Science Institute research project report, August 1987.
    5. “Second Quarter of Research in System Simulation Techniques, on Continuous System Simulation,” Sun Yet-Sien Science Institute research project report, November 1987.
    6. “Third Quarter of Research in System Simulation Techniques, on Continuous System Simulation,” Sun Yet-Sien Science Institute research project report, February 1988.
    7. “Fourth Quarter of Research in System Simulation Techniques, on Continuous System Simulation,” Sun Yet-Sien Science Institute research project report, May 1988.
    8. “Design of a Multiprocessor System for the C Language,” ERSO, ITRI research project report, July 1988.
    9. “Study of AI Multiprocessor Systems,” National Science Council research project report, August 1988.
    10. “Study and Design of High-Speed Computer System,” ERSO, ITRI research project report, March 1989.
    11. “Research on Parallelism and Parallel Processing in Logic Programming,” ATC, ERSO, ITRI research project report, March 1989.
    12. “High-Speed Computing in Continuous System Simulation,” Sun Yet-Sien Science Institute research project report, May 1989.
    13. “Study and Design of High-Speed Computer Architecture,” ERSO, ITRI research project report, August 1989.
    14. “Study of AI Multiprocessor Systems,” National Science Council research project report, August 1989.
    15. “Study and Design of High-Speed Computer System,” ERSO, ITRI research project report, August 1989.
    16. “High-Speed Computing in Continuous System Simulation,” Sun Yet-Sien Science Institute research project report, August 1989.
    17. “Research on Parallelism and Parallel Processing in Logic Programming,” ATC, ERSO, ITRI research project report, October 1989.
    18. “High-Speed Computing in Continuous System Simulation,” Sun Yet-Sien Science Institute research project report, November 1989.
    19. “Study of High Performance Multiple Functional Unit Computer Architecture,” Sun Yet-Sien Science Institute research project report, February 1990.
    20. “High-Speed Computing in Continuous System Simulation,” Sun Yet-Sien Science Institute research project report, February 1990.
    21. “Study and Design of High-Speed Computer Architecture,” ERSO, ITRI research project report, August 1990.
    22. “Study of AI Multiprocessor Systems,” National Science Council research project report, August 1990.
    23. “Study of High Performance Multiple Functional Unit Computer Architecture,” Sun Yet-Sien Science Institute research project report, August 1990.
    24. “Analysis of Vector Processing Characteristics and Study of Its Library Routine Coding,” Sun Yet-Sien Science Institute research project report, August 1990.
    25. “Study of a Fault-Tolerant, Real-Time Distributed System,” National Science Council research project report, August 1990.
    26. “High-Speed Computing in Continuous System Simulation,” Sun Yet-Sien Science Institute research project report, February 1991.
    27. “Research on a Superscalar and Superpipeline Based High-Performance Computer Architecture Design,” CCL, ITRI research project report, August 1991.
    28. “Study of High-Performance Multiple Functional Unit Computer Architecture,” Sun Yet-Sien Science Institute research project report, August 1991.
    29. “Study of Dataflow Computer Characteristics and Their Applications in Real System Designs,” Sun Yet-Sien Science Institute research project report, August 1991.
    30. “Study of AI Multiprocessor Systems,” National Science Council research project report, August 1991.
    31. “Study of Superscalar Processing Techniques,” National Science Council research project report, February 1992.
    32. “A Debugging System for Concurrent Distributed Processing,” National Science Council research project report, August 1992.
    33. “Study of Superscalar Processing Techniques,” National Science Council research project report, February 1993.
    34. “Study of Fine-Grained Parallel Processing of Logic Programs (I),” National Science Council research project report, August 1993.
    35. “Software Approach to Cache Coherence in Superscalar Multiprocessor (I),” ATC, CCL, ITRI research project semiannual report, January 1994.
    36. “The Study of Memory System Design in Superscalar Processing (I),” National Science Council research project report, February 1994.
    37. “Study of Superscalar Processing Techniques (II),” National Science Council research project report, February 1994.
    38. “Study of Fine-Grained Parallel Processing of Logic Programs (II),” National Science Council research project report, August 1994.
    39. “The Study of Memory System Design in Superscalar Processing (II),” National Science Council research report, March 1995.
    40. “Design of a Multiprocessor Architecture Simulation Environment,” ATC, CCL, ITRI research project report, Jun. 1995.
    41. “Study and Design of Memory System in Superscalar Multiprocessors,” National Science Council research report, Aug. 1995.
    42. “Study of Parallel Processing and Performance Evaluation Environment,” ATC, CCL, ITRI research report, Aug. 1996.
    43. “Study of Memory System Techniques for Clustered Multiprocessors,” National Science Council research report, Aug. 1996.
    44. “Study of Superscalar Processor and Superscalar-based Multiprocessor Code Scheduling Techniques,” National Science Council research report, Aug. 1996.

 

D. Supervised Student Theses

    1. Ren-Liang Cheng, “The Study of Consistency Problem for Bus-Connected Multiprocessor Systems,” master thesis, Institute of Computer Engineering, National Chaio Tung University, Hsinchu, Taiwan, R. O. C., June 1987.
    2. Shi-Jiar Baw, “The VLSI Design and Implementation of LISCP, Part I: The Alu, Tag Manipulator and Interface Circuits,” master thesis, Institute of Electronics, National Chaio Tung University, Hsinchu, Taiwan, R. O. C., June 1987.
    3. Ter-Tsung Tsai, “The VLSI Design and Implementation of LISCP, Part II: The Register File Subsystem and Control Unit,” master thesis, Institute of Electronics, National Chaio Tung University, Hsinchu, Taiwan, R. O. C., June 1987.
    4. Chi-Yu Fu, “A Parallel Execution Model of Concurrent C on Multiprocessors,” master thesis, Institute of Computer Engineering, National Chaio Tung University, Hsinchu, Taiwan, R. O. C., June 1988.
    5. Hong-Chich Chou, “The Study and Realization of the CRISC Code Compaction Methodology,” master thesis, Institute of Computer Engineering, National Chaio Tung University, Hsinchu, Taiwan, R. O. C., June 1988.
    6. Shyi-Chyi Jeng, “Design of The Dual-ALU CRISC and Its Dual-Stream Instruction Execution,” master thesis, Institute of Electronics, National Chaio Tung University, Hsinchu, Taiwan, R. O. C., June 1988.
    7. Zeng-Chen Hwang, “Architectural Specifications of the CRISC and Cache Support for The MCRISC,” master thesis, Institute of Electronics, National Chaio Tung University, Hsinchu, Taiwan, R. O. C., June 1988.
    8. Sheng-Jen Fu, “Degenerate Corner Stitching--A Data Structuring Technique for Interactive VLSI Layout Tools,” master thesis, Institute of Applied Mathematics, National Chaio Tung University, Hsinchu, Taiwan, R. O. C., June1988.
    9. Cheng-Chin Chiang, “Design of an Efficient OR-Parallel Execution Model with Intelligent Backtracking for Prolog,” master thesis, Institute of Computer Engineering, National Chaio Tung University, Hsinchu, Taiwan, R. O. C., June 1988.
    10. Shih-Jay Wang, “Design of MIEP: A Multiple Inference Engines for Prolog,” master thesis, Institute of Computer Engineering, National Chaio Tung University, Hsinchu, Taiwan, R. O. C., June 1988.
    11. Tungchi Chang, “Design of LISCP-II: An Improved RISC-Style Processor for Prolog,” master thesis, Institute of Computer Engineering, National Chaio Tung University, Hsinchu, Taiwan, R. O. C., June 1988.
    12. Yung-Ming Hsu, “Study and Design of a New AND-Parallel Execution Model for Prolog,” master thesis, Institute of Computer Engineering, National Chaio Tung University, Hsinchu, Taiwan, R. O. C., June 1988.
    13. Shing-Wu Tung, “Design Considerations About a Prolog RISC Processor: LISCP-II,” master thesis, Institute of Computer Science and Information Engineering, National Chaio Tung University, Hsinchu, Taiwan, R. O. C., July 1989.
    14. Hsi-Long Tsai, “The Study and Implementation of a Prolog Language Processor LISCP-II,” master thesis, Institute of Computer and Information Science, National Chaio Tung University, Hsinchu, Taiwan, R. O. C., July 1989.
    15. Ruey-Liang Maa, “Design and Consideration of a Prolog Compiler for LISCP-II,” master thesis, Institute of Computer Science and Information Engineering, National Chaio Tung University, Hsinchu, Taiwan, R. O. C., July 1989.
    16. Wen-Lung Lin, “CSMP--A Cluster-Structured Multiprocessor System for Continuous System Simulation,” master thesis, Institute of Computer Science and Information Engineering, National Chaio Tung University, Hsinchu, Taiwan, R. O. C., June 1989.
    17. Chin-Wei Chen, “Integrated Considerations of Real-Time and Fault-Tolerant Requirements in Distributed Computing Systems,” master thesis, Institute of Computer Science and Information Engineering, National Chaio Tung University, Hsinchu, Taiwan, R. O. C., June 1989.
    18. Po-Chi Chen, “Task Allocation on Distributed Computing Systems—A Simulated Annealing Approach,” master thesis, Institute of Computer Science and Information Engineering, National Chaio Tung University, Hsinchu, Taiwan, R. O. C., June 1989.
    19. Quen-Zong Wu, “ANDROR: A New AND/OR-Parallel Execution Model For Prolog,” master thesis, Institute of Computer Science and Information Engineering, National Chaio Tung University, Hsinchu, Taiwan, R. O. C., June 1989.
    20. Chun-Hung Wen, “Design of CDFA--A Controlled Data-Flow Architecture,” master thesis, Institute of Computer Science and Information Engineering, National Chaio Tung University, Hsinchu, Taiwan, R. O. C., July 1989.
    21. Yuh-Horng Shiau, “Study of CRAY X-MP Vector Accesses Using Different Storage Schemes,” master thesis, Institute of Computer Science and Information Engineering, National Chaio Tung University, Hsinchu, Taiwan, R. O. C., July 1989.
    22. Yeang-Ming Shih, “Dynamic Parallel Instruction Scheduling andSynchronization in Vector Supercomputing,” master thesis, Institute of Electronics, National Chaio Tung University, Hsinchu, Taiwan, R. O. C., July 1989.
    23. Der-Cherng Lee, “A Design Methodology of Vector Unit Architecture and Dynamic Reconfigure Vector Register Design,” master thesis, Institute of Electronics, National Chaio Tung University, Hsinchu, Taiwan, R. O. C., July 1989.
    24. Yeong-Sheng Chen, “An Instrumentation Tool for Parallel Processing--Design, Implementation and Applications,” master thesis, Institute of Computer Science and Information Engineering, National Chiao Tung University, Hsinchu, Taiwan, R. O. C., June 1990.
    25. Chin-Yao Chiang, “A Distributed Hard Real-Time Task Scheduling Based on Criticalness or Alternative Algorithms,” master thesis, Institute of Computer Science and Information Engineering, National Chaio Tung University, Hsinchu, Taiwan, R. O. C., June 1990.
    26. Yau-Shan Chen, “Clock Synchronization in a Hypercube Distributed System,” master thesis, Institute of Computer Science and Information Engineering, National Chiao Tung University, Hsinchu, Taiwan, R. O. C., June 1990.
    27. Che-Hsien Tsai, “The Study of Vectorization of the FFT,” master thesis, Institute of Computer Science and Information Engineering, National Chiao Tung University, Hsinchu, Taiwan, R. O. C., June 1990.
    28. Wen-Yang Lin, “The Study of Vectorization of Sorting Algorithms,” master thesis, Institute of Computer Science and Information Engineering, National Chiao Tung University, Hsin- chu, Taiwan, R. O. C., June 1990.
    29. Ruey-Lung Ou, “Realiation of OR Parallelism on a Hypercube MIEP,” master thesis, Institute of Computer Science and Information Engineering, National Chiao Tung University, Hsinchu, Taiwan, R. O. C., June 1990.
    30. Chi-Tien Yeh, “Cut and Some Other Considerations on a Hypercube MIEP System,” master thesis, Institute of Computer Science and Information Engineering, National Chiao Tung University, Hsinchu, Taiwan, R. O. C., June 1990.
    31. Cheng-Zen Yang, “Task Assignment on CSMP for Continuous System Simulation--A Grain Aggregating Approach,” master thesis, Institute of Computer Science and Information Engineering, National Chiao Tung University, Hsinchu, Taiwan, R. O. C., June 1990.
    32. Terry Chi, “Design of LISCP-II Prototype,” master thesis, Institute of Computer Science and Information Engineering, National Chiao Tung University, Hsinchu, Taiwan, R. O. C., June 1990.
    33. Yung-Ming Tzeng, “A Data-Driven Based Architectural Approach to Parallel Execution of Sequential Programs,” master thesis, Institute of Computer Science and Information Engineering, National Chiao Tung University, Hsinchu, Taiwan, R. O. C., July 1990.
    34. Shyh-Ming Wang, “Design and Implementation of LISCP-II Memory System and Its Interface to Host,” master thesis, Institute of Computer Science and Information Engineering, National Chiao Tung University, Hsinchu, Taiwan, R. O. C., July 1990.
    35. I-Kuang Chou, “A Dependency-Based Loop Partitioning Method for Multitasking in a Vector Computer,” master thesis, Institute of Computer Science and Information Engineering, National Chiao Tung University, Hsinchu, Taiwan, R. O. C., June 1990.
    36. Chiou-Luenn Lin, “Analysis of Availability and Reliability of AT&T No. 5ESS,” master thesis, Institute of Computer Science and Information Engineering, National Chiao Tung University, Hsinchu, Taiwan, R. O. C., June 1991.
    37. Yuan-Kai Chen, “A Hardware Approach to Parallel Instruction Decoding and Issuing,” master thesis, Institute of Computer Science and Information Engineering, National Chiao Tung University, Hsinchu, Taiwan, R. O. C., July 1991.
    38. Kuo-Kuang Teng, “Design and Performance Evaluation of a Cluster-Structured Multiprocessor System for Continuous System Simulation,” master thesis, Institute of Computer Science and Information Engineering, National Chiao Tung University, Hsinchu, Taiwan, R. O. C., July 1991.
    39. Yen-Yuan Chiang, “Integration of AT Bus and Multibus in a Single-Board Computer,” master thesis, Institute of Computer Science and Information Engineering, National Chiao Tung University, Hsinchu, Taiwan, R. O. C., July 1991.
    40. Kun-Chen Wu, “Design and Implementation of a Prolog Processor LISCP-II,” master thesis, Institute of Computer Science and Information Engineering, National Chiao Tung University, Hsinchu, Taiwan, R. O. C., July 1991.
    41. Neng-Pin Lu, “A Fault-Tolerant Multistage Combining Network,” master thesis, Institute of Computer Science and Information Engineering, National Chiao Tung University, Hsinchu, Taiwan, R. O. C., July 1991.
    42. Chi-Der Von, “A Study of Multi-Operation Instruction Set Architecture,” master thesis, Institute of Computer Science and Information Engineering, National Chiao Tung University, Hsinchu, Taiwan, R. O. C., July 1991.
    43. Hong-Chich Chou, “Study of Superscalar Instruction Scheduling,” Ph.D. dissertation, Institute of Computer Science and Information Engineering, National Chiao Tung University, Hsinchu, Taiwan, R. O. C., October 1992.
    44. Yuh-Horng Shiau, “Study of Superscalar Processing,” Ph.D. dissertation, Institute of Computer Science and Information Engineering, National Chiao Tung University, Hsinchu, Taiwan, R. O. C., September 1993.
    45. Ren-Liang Cheng, “The Approximate Agreement of Massively Parallel Systems,” Ph.D. dissertation, Institute of Computer Science and Information Engineering, National Chiao Tung University, Hsinchu, Taiwan, R. O. C., January 1994.
    46. Tang-Show Hwang, “Delayed Invalidation -- A Software-Based Cache Coherence Scheme,” master thesis, Institute of Computer Science and Information Engineering, National Chiao Tung University, Hsinchu, Taiwan, R. O. C., June 1994.
    47. Ching-Wei Chen, “Time Interval-Based Coloring Approach to Register Allocation,” master thesis, Institute of Computer Science and Information Engineering, National Chiao Tung University, Hsinchu, Taiwan, R. O. C., June 1994.
    48. Ding-Long Liu, “PUMTS: A Parallel Unification Multi-Threaded Superscalar Processor for Prolog,” master thesis, Institute of Computer Science and Information Engineering, National Chiao Tung University, Hsinchu, Taiwan, R. O. C., June 1994.
    49. Chung-Cheng Feng, “A Study of Prolog AND Parallel Execution Model on Multi-Threaded Superscalar System,” master thesis, Institute of Computer Science and Information Engineering, National Chiao Tung University, Hsinchu, Taiwan, R. O. C., June 1995.
    50. Chuan-Cheng Hsu, “Prolog OR Parallel Execution Model on Multi-Threaded Superscalar Processor,” master thesis, Institute of Computer Science and Information Engineering, National Chiao Tung University, Hsinchu, Taiwan, R. O. C., June 1995.
    51. Chang-Chung Liu, “A Study of Register Renaming in x86 Superscalar Processor,” master thesis, Institute of Computer Science and Information Engineering, National Chiao Tung University, Hsinchu, Taiwan, R. O. C., June 1995.
    52. Cheng-Shon Kuo, “Design of Instruction Queue for x86 Superscalar Processor with Branch Prediction,” master thesis, Institute of Computer Science and Information Engineering, National Chiao Tung University, Hsinchu, Taiwan, R. O. C., June 1995.
    53. Ruey-Liang Ma, “A Study of Fine-Grained Parallel Processing for Logic Programs,” Ph.D. dissertation, Institute of Computer Science and Information Engineering, National Chiao Tung University, Hsinchu, Taiwan, R. O. C., June 1995.
    54. Jih-Shiung Gau, “The Study of Branch Prediction Strategies Used by High Performance Processors,” master thesis, Institute of Computer Science and Information Engineering, National Chiao Tung University, Hsin Chu, Taiwan, R. O. C., June 1996.
    55. Shei-Sin Ju, “A Study of Parallelization in Multi-Processor System Simulation,” master thesis, Institute of Computer Science and Information Engineering, National Chiao Tung University, Hsin Chu, Taiwan, R. O. C., June 1996.
    56. Che-Sheng Cheng, “Performance Evaluation of Various Types of Decoded Instruction Caches for x86 Superscalar Processor” master thesis, Institute of Computer Science and Information Engineering, National Chiao Tung University, HsinChu, Taiwan, R. O. C., June 1996.
    57. Shyh-An Chi, “Instruction Cache Prefetching Based on An Extended BTB,” master thesis, Institute of Computer Science and Information Engineering, National Chiao Tung University, HsinChu, Taiwan, R. O. C., June 1997.
    58. Jieh-Nan Yang, “Instruction Fetcher Design for an X86 Superscalar Processor,” master thesis, Institute of Computer Science and Information Engineering, National Chiao Tung University, HsinChu, Taiwan, R. O. C., June 1997.
    59. Han-Min Tseng, “ILP analysis on the Java machine,” master thesis, Institute of Computer Science and Information Engineering, National Chiao Tung University, HsinChu, Taiwan, R. O. C., June 1997.

 

Research and Project Abstracts

計畫時間: 85.6 – 88.5

計畫名稱: 前瞻性微處理機設計與製造之子計畫八:指令解讀安排單元及資料存取單元 Design and Implementation of an Advanced Microprocessor Sub-project 8: Instruction Decode/Schedule Unit and Data Load/Store Unit

支援單位: 國科會

 

  本計畫的目標,在設計合乎x86指令集特色的指令擷取解讀單元及資料存取單元。指令擷取解讀單元的研究項目包括指令的擷取、解碼及分配;而資料存取單元的研究項目則包括各種定址模式的計算、保護及資料存取的平行化。指令擷取解碼及資料存取的速度,向來為微處理機效率的瓶頸。x86指令集的格式、語意及定址模式均異常複雜,更加重了這些瓶頸對效能的影響。現今之高效能x86指令集處理機,皆用種種特殊的設計來解決這些瓶頸。我們將評估這些設計對效能的影響,並提出新的設計以提供更高的解碼及資料存取頻寬,以配合多重執行單元的運算能力。本計畫為三年之研究發展計畫中的第二年。在第一年的研究中,我們提出了各單元的運作模型設計,包括指令預先解碼器、指令擷取器、指令解碼器、指令分配器、資料存取單元及資料位址產生單元。並以軟體模擬評估重要實作方案的可行性、複雜度、硬體成本及效能,來確定重要的設計決策。並訂定了各單元RTL層次的介面及設計。本年度的研究重點,在延續第一年的研究成果,將設計落實至邏輯閘層次;於總計畫的整合下驗證各單元RTL層次的設計;並更精確的評估實作上的時脈限制、成本限制,回饋改進原來的設計。第三年的研究方向,在以前兩年的研究為基礎,參與實作驗證的整合,以實作的限制及效果,從事系統參數調整甚至結構修正,並參與技術轉移。

 

The objective of this subproject is to design an instruction fetch/decode unit and a data load/store unit which are compatible with x86 instruction set. The functions of the instruction fetch/decode unit include instruction fetching, instruction decoding, and instruction dispatching. The functions of the data load/store unit include data address computing/protecting, and parallel data access. The instruction fetch/decode rates and data access rates are always bottlenecks for the performance of ILP microprocessors. The complex formats, semantics and addressing modes of X86 instruction set increase the influence of these bottlenecks. We will design an instruction fetch/decode unit and a data load/store unit with high bandwidth to match the execution rate of multiple execution units. This is the second year project of the three years project. In the first year, we have studied and designed the instruction fetch/decode unit and the data load/store unit. The blocks we designed including the instruction predecoder, the instruction fetcher, the instruction decoder, the instruction dispatching, the data access unit, and the data address generator A primitive software simulation was built to evaluate the effect of every solution. Furthermore, we defined the interface with other units, and refine our designs to the RTL level. In this year, we will refine our designs to logic gate level, with considerations of the implementation restrictions. In the third year, we will push the previous designs for the sake of chip implementation and system integration.

 

計畫時間: 85.8 – 88.7

計畫名稱: 單晶片多處理機設計之研究 Study of Single Chip Multiprocessor Design

支援單位: 國科會

 

  隨著超大型積體電路技術之進步,單晶片中可包含之硬體資源大幅增加。過去的研究者在單指令引線中發掘指令間之平行性以善用硬體資源獲取效益,而發展出如超純量及超長指令計算機之設計。然而隨著積體電路技術之進步,單引線中有限之指令間平行性已不足以完全發揮半導體技術所提供之大量硬體資源,我們必須思考新的設計方向,以善用硬體資源提昇系統效能。
  本計畫嘗試著在半導體技術足以支援之前提下,研究在單晶片上如何建構多個處理單元,並同時執行多個引線之程式。藉著將多個處理單元更緊密的結合在單一晶片上,我們除可在過去之基礎上繼續發掘單引線內指令之平行性外,更可開發單晶片上引線間之平行性;同時我們亦可以此多處理機晶片為基礎延伸而成大量平行性架構。我們將分別以處理單元間之連接、控制機構之設計,處理單元間之工作分配、排程與程式碼之產生,記憶體系統之設計,記憶體效能評估等議題研究多處理機晶片架構設計及以此為核心之相關軟硬體支援,評估其系統整體效能並以
FPGA方式實作驗證。
  我們規畫此三年計畫以達成上述目標。在第一年中,我們預定完成排程問題理論模型、晶片上快取記憶體標籤陣列與目錄之設計方案、以及以
FPGA展示板實現晶片上連結網路。自下年度起,我們將這個計畫規劃為以下三個子計畫,分別進行其重點研究,並整合之以達成整體研究目標:

一、標竿程式分析、測試與效能評估

二、處理機晶片內記憶體結構之設計

三、單晶片多處理機可程式實驗平台之設計及實現

  各個子計畫之研究重點如下:子計畫一規畫多處理支援機構,發展工作排程之技術,建立完整單晶片多處理機之編譯器環境。子計畫二將進行晶片內快取記憶體設計:規畫晶片內快取記憶體結構,擴展可延伸式記憶體階層設計觀念,實現多處理支援機構。子計畫三則將以FPGA實現連結網路,建立實驗平台雛形,並撰寫軟硬體介面程式,實現PC至網路實驗平台之連結。我們期望藉此整合型計畫能對單晶片多處理機之各項相關軟、硬體架構設計有一完整深入之探討,希望我們整合的研究成果能成為新一代計算機系統之設計觀念,為學術界與工業界所採用。

 

Because of the progress in VLSI technology, more and more hardware resources can be built on a single silicon chip. Researchers have developed architectures such as superscalar and VLIW to exploit instruction level parallelism within a single instruction stream to benefit from this progress. However, the limited parallelism in a single instruction stream can not keep up with the progress of resource availability as the VLSI technology moves further forward. Computer architects are challenged to exploit much more parallelism to benefit from the ever increasing hardware resources on a single chip.

We try to implement more than one processing element embedded on a single chip, and to execute more than one instruction stream concurrently provided that the circuit density will allow us to do so. By making more than one processing element tightly coupled on a single chip, the communication latency can be reduced and mechanisms can be designed to exploit thread level parallelism as well as instruction level parallelism. Moreover, we will design mechanisms that makes the multi-processor chip easy to be used for constructing massively parallel machines. Issues in our approach to the multiprocessor chip design and related supporting in both hardware and software include: the interconnection and control of processing elements, task assignment and scheduling on processing elements, object code generation, memory system design and performance evaluation. Finally, we will evaluate the overvall system performance and implement it using FPGA.

To achieve this goal, we proposed the 3-year-long project to investigate above issues. This is the 2nd year in our research. The research results in the first year include: a theoretical model for task assignment, a design scheme for on-chip cache directory and tag-array, and an FPGA demo board that implements the on-chip interconnection network. From the 2nd year, we organize this joint project to include three projects, each having its specific research themes; and to integrate these projects such that the overall system design and performance picture can be conceived. These projects are:

1.Benchmarking and Performance Evaluation.

2.Design of Scalable Memory Architecture on CPU Chip.

3.Design and Implementation of a Programmable Platform for a Single Chip with
Multiple CPUs.

The research topics of the three subprojects are outlined as follows. Subproject 1 will design mutiprocessing supporting mechanisms, develop the techniques of task scheduling, and consturct the compiler environment for the on-chip multiprocessor. Subproject 2 will design the on-chip caches, extend the design concept of scalable memory hierarchy, and implement the multiprocessing supporting mechanisms. Subproject 3 will implement the on-chip interconnection network with FPGA, construct the FPGA experiment platform, develop the interface programs to connects PCs and the interconnection network protype, and carry out the interconnection between PCs via the protype. With the integrated project, we hope we can propose the related software and hardware architectures for the single-chip multiprocessor. Furthermore, we hope the proposed solutions will be the design concept of next-generation computer systems and be adopted by the academy and industry.

 

計畫時間: 85.8 – 88.7

計畫名稱: 標竿程式分析.測試與效能評估 Benchmarking and Performance Evaluation

支援單位: 國科會

 

  藉由將多個處理單元建構於單晶片上,我們可降低處理單元間之通訊延遲;並可建構通訊法則,以提昇處理單元使用率。此多處理機晶片可被更進一步的用以建構階層式多處理機系統,不同階層間具有不同之通訊網路,使得此多處理機系統具有更佳之延展性。為了發揮架構特性以提昇效能,我們將針對其階層特性,開發工作排程技術,並開發多處理支援機構,以進一步降低程式執行時間。

  我們並將建構編譯與模擬環境以評估我們所發展之技術。我們選擇Stanford SUIF作為編譯器之基礎,並做加強與改進,與我們自行發展之排程技術相結合,以編譯標竿程式。我們所選用的標竿程式將同時包含一般應用程式與多處理機系統之平行程式。最後,我們將進行對標竿程式之模擬以評估系統效能。

  本計畫為期三年。在第一年中,我們已建構一模擬環境,架設基本SUIF模組,並建立一靜態排程模型。在往後二年中,我們將持續進行靜態/動態排程之研究,並設計如同步、資料預先擷取等多處理支援機構。

 

By embedding more than one processing elements in a single chip, communication latency can be reduced and mechanisms can be constructed to improve the processor utilization. Moreover, the MP-chip can be used as the basis to construct heirarchical multiprocessor systems with different communication latency in different layers. To benefit from these features , we will develope task scheduling techniques that take account of the hierarchical structure and design on-chip multiprocessing supporting mechanisms to further reduce the program execution time.

A simulation and compilation environment will be build to evaluate the techniques we developed. The Stanford SUIF system has been choosen as the basis of our compilation environment.We will perform further enhancement on SUIF and integrate with the task scheduling techniques we developed to compile the benchmark programs. The set of benchmark programs contains general purpose applications as well as parallel programs for multi-processor systems. Finally, a simulation with the benchmark programs will be given to evaluate the system performance.

This is the 2nd year in our research. In the first year, we have constructed a simulation environment, ported SUIF, and developed a theoretical model for static task scheduling. In the 2nd and 3rd year, we will keep on developing static/dynamic scheduling techniques for hierarchical systems and design multiprocessing supporting mechanisms such as data prefetch and synchronization primitives.

 

計畫時間:86.7 – 87.6

計畫名稱:Java處理器之多媒體技術研究 Study of Multimedia Technologies in a Java Processor

支援單位:工研院電通所

 

  本Java處理器之多媒體技術研究計畫將承續上年度改良式Java處理器的研究成果,擬以一年時間,針對Java處理器在日益普遍的多媒體應用環境中執行多媒體指令的效能進行改良之研究。主要研究之問題為Stack-based架構的Java處理器與SIMD的多媒體計算核心間各種平行執行與搭配的組合。

  在今年度的計畫中,本研究小組將考慮兩個將Java處理器核心與SIMD的多媒體計算核心融合在一個處理器中的不同研究方向。第一個方向是試著把多媒體暫存器與運算元堆疊合併,並提供高頻寬的堆疊/多媒體暫存器,但必須同時兼顧堆疊存取指標與多媒體資料非循序存取之特性,並分析共用堆疊/多媒體暫存器的優缺點;第二個方向就是採用分離式的設計,但必須另外再加上多媒體暫存器與資料快取記憶體,以及多媒體暫存器到運算元堆疊之間的傳輸界面,以分析Java處理器核心與多媒體計算核心之平行執行、同步、以及多重引線等組合運作模式的可行性。最後希望提出一個SIMD多媒體計算核心與多媒體結構化暫存器的架構在Java處理器中的良好設計,並取得工業的標準多媒體測試程式以驗證其執行效能。

 

In this one-year project, we will base our study on the research results of the enhanced Java processor in the previous research project of ours to further enhance its execution performance in modern popular multimedia application environment. The main research direction is to analyze and try to design the parallel architecture combinations between a stack-based Java processor core and an SIMD multimedia computation core.

In this project, our research team will consider two schemes to combine a Java processor core and a multimedia computation core into one processor chip. The first is to combine the multimedia registers into operand stack of a Java processor, and try to support high bandwidth multimedia data access and flexible stack pointer maintenance. The second is to adopt separate design for operand stack and multimedia registers. This scheme requires extra interfaces from multimedia registers to data memory, and from multimedia registers to operand stack. Besides, we will evaluate the advantages and disadvantages of the shared and separate operand stack and multimedia registers design, respectively. This includes the possibilities of parallel execution, synchronization, and multithreading. Finally, we will propose the well designed architecture of the Java processor for the SIMD multimedia computation core and the corresponding structured multimedia register and simulate their performance by the industrial standard multimedia benchmarks.