GENERAL PRINCIPLES
of improving the "impedance match" betwen compilers and
the target computer:
Regularity
Orthogonality
Composability
One versus All
Provide Primitives, not Solutions
Addressing Principle
Environment Support Principle
Deviations Principle
VIOLATIONS
®®CASE ANALYSIS
ENORMOUS CASE ANALYSIS
®® RESTRICTED OPTIMIZATION.
FOR MORE DETAILS
Reference: Wulf, W.A., "Compilers and computer
Architecture," IEEE Computer, July 1981.
CISC VERSUS RISC
CISCs provide solutions which are not general enough.
RISCs provide primitives.
CISCs recompute data.
RISCs reurse data*.
CISCs provide several ways, but not all ways.
RISCs provide one way only.
THE RISC APPROACH
Basic issue:
How to invest the avialable VLSI complexity?
Positive effects:
Quantity of existing resources increases
Qualitavely new resources can be incorporated
Precision increases
Negative effects:
Less power per gate ® less speed per gate
Larger chip size ® wire delays increase
Decoding delays increase ® CPU speed decreases
INSTRUCTION SET DESIGN PHASES
#1 The most frequent instructions
are identified.
#2 Data path and timing are optimized
for the most frequent instructions, only.
#3 Other frequent operations are included
only if they can fit into the elaborated scheme.
CPU DESIGN PHASES
#1 Basically, the same strategy applies here.
#2 It is important to simplify the CODE OPTIMIZER,
not the CODE GENERATOR.
The UCB-RISC
HARDWERE SUPPORT FOR CALL/RETURN
Call/Return is found to be the most time consuming operation
in a typical HLL environment
Figure 6.1.
Statistical analysis of benchmark programs that were used as a basis for the UCB-RISC architecture. Symbol P refers to the benchmark written in the PASCAL language, and symbol C refers to the benchmark written in the C language. These results are derived from a dynamic analysis.
Support is existing in the form of REGISTER WINDOWS!
HARDWERE SUPPORT FOR LOCAL SCALARS
Local scalar variables
are found to be the most frequently used type of operations.
Dynamic percentage of operands in Pascal and C.
Figure 6.2.
Statistical analysis of the frequencies of various data types, in benchmarks that were used as a base for the UCB-RISC architecture. Symbols (, 2, 3, 4) refer to the benchmarks written in the PASCAL language, and symbols (, 2, 3, 4) refer to the benchmarks written in the C language. These are the results of the dynamic analysis.
Support is existing in the form of OPTIMAL WORD COUNT
per window field!
OTHER FEATURES
PIPELINING
and DELAYED BRANCHING
SINGLE FIXED-SIZE INSTRUCTIONS
and SIMPLIFIED DECODING
LOAD/STORE ARCHITECTURE
and 2-CYCLE MEMORY ACCESSING
SIMPLE ADDRESSING MODES
and DEFAULT ADDITION
VARIABLE-SIZE DATA
bytes, half-words, and works
NO MULTIPLICATION
neither HW MULTIPLIER nor BOOTH STEP
NO STACK
neither, ARITHMETIC nor CONTEXT SWITCHING
Figure 6.3.
An example of the execution of the branch instruction, in the pipeline where the interlock mechanism can be realized in hardware or in software
Figure 6.4.
Realization of the software interlock mechanism, for the pipeline from Figure 6.3: (a) normal branch-the program fragment contains a sequencing hazard, in the case of the pipelined processor; (b) delayed branch-the program fragment does not contain any sequencing hazard, if it is executed on a processor with the pipeline as shown in Figure 6.3; this solution is inefficient, because it contains an unnecessary nop instruction; (c) an optimized delayed branch-the previous example has been optimized, and the nop instruction has been eliminated (its place is now occupied by the add 1, A instruction).
Figure 6.5.
An example of the execution of the load instruction in a pipeline with either hardware or software interlock. Symbol X shows that the third stage is not being used (which is true of all instructions except the load instruction).
Figure 6.6.
Realization of the software interlock mechanism, for the case of pipeline shown in Figure 6.5: (a) normal data load-the program fragment contains a timing hazard, in the case of execution on a pipelined processor; (b) delayed data load-the program fragment does not contain a timing hazard, but it is inefficient, because it contains an unnecessary nop instruction; (c) optimized delayed load-the previous example has been optimized by eliminating the nop instruction (its place is occupied by the instruction sub R3, R2, R1).
Figure 6.7.
The pipeline structure in the UCB-RISC I and UCB-RISC II processors: R-register read; ALU-arithmetic/logic operation; W-register write; P-precharge; IDLE-idle interval.
Figure 6.8.
An example of the critical data path: I/O BUF-input/output buffer; , , -internal busses; GRF-general purpose register file; ALU-arithmetic and logic unit; , -read port 1, read port 2; , -write port 1, write port 2.