GENERAL PRINCIPLES

 

 

of improving the "impedance match" betwen compilers and

the target computer:

 

      Regularity

      Orthogonality

      Composability

      One versus All

      Provide Primitives, not Solutions

      Addressing Principle

      Environment Support Principle

      Deviations Principle

 

 

 

 

VIOLATIONS ®

           ®CASE ANALYSIS

 

 

ENORMOUS CASE ANALYSIS ®

          ® RESTRICTED OPTIMIZATION.

 

 

 

FOR MORE DETAILS

     Reference: Wulf, W.A., "Compilers and computer

     Architecture," IEEE Computer, July 1981.

 

 

 

CISC VERSUS RISC

 

      CISCs provide solutions which are not general enough.

      RISCs provide primitives.

 

      CISCs recompute data.

      RISCs reurse data*.

 

      CISCs provide several ways, but not all ways.

      RISCs provide one way only.

 

 

 

THE RISC APPROACH

 

 

Basic issue:

      How to invest the avialable VLSI complexity?

 

 

Positive effects:

      Quantity of existing resources increases

      Qualitavely new resources can be incorporated

      Precision increases

 

 

Negative effects:

      Less power per gate ® less speed per gate

      Larger chip size ® wire delays increase

      Decoding delays increase ® CPU speed decreases

 

 

 

INSTRUCTION SET DESIGN PHASES

 

 

 

#1 The most frequent instructions

     are identified.

 

#2 Data path and timing are optimized

     for the most frequent instructions, only.

 

#3 Other frequent operations are included

     only if they can fit into the elaborated scheme.

 

 

CPU DESIGN PHASES

 

#1 Basically, the same strategy applies here.

 

#2 It is important to simplify the CODE OPTIMIZER,

     not the CODE GENERATOR.

 

 

 

The UCB-RISC

 

HARDWERE SUPPORT FOR CALL/RETURN

 

Call/Return is found to be the most time consuming operation

in a typical HLL environment

 

 

 

Figure 6.1. Statistical analysis of benchmark programs that were used as a basis for the UCB-RISC architecture. Symbol P refers to the benchmark written in the PASCAL language, and symbol C refers to the benchmark written in the C language. These results are derived from a dynamic analysis.

 

 

 

 

Support is existing in the form of REGISTER WINDOWS!

 

 

 

 

HARDWERE SUPPORT FOR LOCAL SCALARS

 

Local scalar variables

are found to be the most frequently used type of operations.

 

      Dynamic percentage of operands in Pascal and C.

 

Figure 6.2. Statistical analysis of the frequencies of various data types, in benchmarks that were used as a base for the UCB-RISC architecture. Symbols  (, 2, 3, 4) refer to the benchmarks written in the PASCAL language, and symbols  (, 2, 3, 4) refer to the benchmarks written in the C language. These are the results of the dynamic analysis.

 

 

 

 

 

Support is existing in the form of OPTIMAL WORD COUNT

     per window field!

 

 

OTHER FEATURES

 

PIPELINING

     and DELAYED BRANCHING

 

SINGLE FIXED-SIZE INSTRUCTIONS

     and SIMPLIFIED DECODING

 

LOAD/STORE ARCHITECTURE

     and 2-CYCLE MEMORY ACCESSING

 

SIMPLE ADDRESSING MODES

     and DEFAULT ADDITION

 

VARIABLE-SIZE DATA

     bytes, half-words, and works

 

NO MULTIPLICATION

     neither HW MULTIPLIER nor BOOTH STEP

 

NO STACK

     neither, ARITHMETIC nor CONTEXT SWITCHING

 

 

 

 

 

 

 

 

 

Figure 6.3. An example of the execution of the branch instruction, in the pipeline where the interlock mechanism can be realized in hardware or in software

 

 

 

 

 

 

 

Figure 6.4. Realization of the software interlock mechanism, for the pipeline from Figure 6.3: (a) normal branch-the program fragment contains a sequencing hazard, in the case of the pipelined processor; (b) delayed branch-the program fragment does not contain any sequencing hazard, if it is executed on a processor with the pipeline as shown in Figure 6.3; this solution is inefficient, because it contains an unnecessary nop instruction; (c) an optimized delayed branch-the previous example has been optimized, and the nop instruction has been eliminated (its place is now occupied by the add 1, A instruction).

 

 

 

 

Figure 6.5. An example of the execution of the load instruction in a pipeline with either hardware or software interlock. Symbol X shows that the third stage is not being used (which is true of all instructions except the load instruction).

 

 

 

Figure 6.6. Realization of the software interlock mechanism, for the case of pipeline shown in Figure 6.5: (a) normal data load-the program fragment contains a timing hazard, in the case of execution on a pipelined processor; (b) delayed data load-the program fragment does not contain a timing hazard, but it is inefficient, because it contains an unnecessary nop instruction; (c) optimized delayed load-the previous example has been optimized by eliminating the nop instruction (its place is occupied by the instruction sub R3, R2, R1).

 

 

 

Figure 6.7. The pipeline structure in the UCB-RISC I and UCB-RISC II processors: R-register read; ALU-arithmetic/logic operation; W-register write; P-precharge; IDLE-idle interval.

 

 

 

 

 

Figure 6.8. An example of the critical data path: I/O BUF-input/output buffer; , , -internal busses; GRF-general purpose register file; ALU-arithmetic and logic unit; , -read port 1, read port 2; , -write port 1, write port 2.