GENERAL PRINCIPLES

of improving the "impedance match" betwen compilers and

the target computer:

Regularity

Orthogonality

Composability

One versus All

Provide Primitives, not Solutions

Addressing Principle

Environment Support Principle

Deviations Principle

VIOLATIONS ®

®CASE ANALYSIS

ENORMOUS CASE ANALYSIS ®

® RESTRICTED OPTIMIZATION.

FOR MORE DETAILS

Reference: Wulf, W.A., "Compilers and computer

Architecture," IEEE Computer, July 1981.

CISC VERSUS RISC

CISCs provide solutions which are not general enough.

RISCs provide primitives.

CISCs recompute data.

RISCs reurse data*.

CISCs provide several ways, but not all ways.

RISCs provide one way only.

THE RISC APPROACH

Basic issue:

How to invest the avialable VLSI complexity?

Positive effects:

Quantity of existing resources increases

Qualitavely new resources can be incorporated

Precision increases

Negative effects:

Less power per gate ® less speed per gate

Larger chip size ® wire delays increase

Decoding delays increase ® CPU speed decreases

INSTRUCTION SET DESIGN PHASES

#1 The most frequent instructions

are identified.

#2 Data path and timing are optimized

for the most frequent instructions, only.

#3 Other frequent operations are included

only if they can fit into the elaborated scheme.

CPU DESIGN PHASES

#1 Basically, the same strategy applies here.

#2 It is important to simplify the CODE OPTIMIZER,

not the CODE GENERATOR.

The UCB-RISC

HARDWERE SUPPORT FOR CALL/RETURN

Call/Return is found to be the most time consuming operation

in a typical HLL environment

Figure 6.1. Statistical analysis of benchmark programs that were used as a basis for the UCB-RISC architecture. Symbol P refers to the benchmark written in the PASCAL language, and symbol C refers to the benchmark written in the C language. These results are derived from a dynamic analysis.

Support is existing in the form of REGISTER WINDOWS!

HARDWERE SUPPORT FOR LOCAL SCALARS

Local scalar variables

are found to be the most frequently used type of operations.

Dynamic percentage of operands in Pascal and C.

Figure 6.2. Statistical analysis of the frequencies of various data types, in benchmarks that were used as a base for the UCB-RISC architecture. Symbols (, 2, 3, 4) refer to the benchmarks written in the PASCAL language, and symbols (, 2, 3, 4) refer to the benchmarks written in the C language. These are the results of the dynamic analysis.

Support is existing in the form of OPTIMAL WORD COUNT

per window field!

OTHER FEATURES

PIPELINING

and DELAYED BRANCHING

SINGLE FIXED-SIZE INSTRUCTIONS

and SIMPLIFIED DECODING

LOAD/STORE ARCHITECTURE

and 2-CYCLE MEMORY ACCESSING

SIMPLE ADDRESSING MODES

and DEFAULT ADDITION

VARIABLE-SIZE DATA

bytes, half-words, and works

NO MULTIPLICATION

neither HW MULTIPLIER nor BOOTH STEP

NO STACK

neither, ARITHMETIC nor CONTEXT SWITCHING

Figure 6.3. An example of the execution of the branch instruction, in the pipeline where the interlock mechanism can be realized in hardware or in software

Figure 6.4. Realization of the software interlock mechanism, for the pipeline from Figure 6.3: (a) normal branch-the program fragment contains a sequencing hazard, in the case of the pipelined processor; (b) delayed branch-the program fragment does not contain any sequencing hazard, if it is executed on a processor with the pipeline as shown in Figure 6.3; this solution is inefficient, because it contains an unnecessary nop instruction; (c) an optimized delayed branch-the previous example has been optimized, and the nop instruction has been eliminated (its place is now occupied by the add 1, A instruction).

Figure 6.5. An example of the execution of the load instruction in a pipeline with either hardware or software interlock. Symbol X shows that the third stage is not being used (which is true of all instructions except the load instruction).

Figure 6.6. Realization of the software interlock mechanism, for the case of pipeline shown in Figure 6.5: (a) normal data load-the program fragment contains a timing hazard, in the case of execution on a pipelined processor; (b) delayed data load-the program fragment does not contain a timing hazard, but it is inefficient, because it contains an unnecessary nop instruction; (c) optimized delayed load-the previous example has been optimized by eliminating the nop instruction (its place is occupied by the instruction sub R3, R2, R1).

Figure 6.7. The pipeline structure in the UCB-RISC I and UCB-RISC II processors: R-register read; ALU-arithmetic/logic operation; W-register write; P-precharge; IDLE-idle interval.

Figure 6.8. An example of the critical data path: I/O BUF-input/output buffer; , , -internal busses; GRF-general purpose register file; ALU-arithmetic and logic unit; , -read port 1, read port 2; , -write port 1, write port 2.