Veljko Milutinovic
IOB:
Understanding the Essence
UNDERSTANDING THE IOB
The overall system performance is frequently limited by I/O devices!
Often, monitoring the I/O process consumes substantial processor capability.
Types of I/O:
(a) Data presentation devices (user interface: processor to user)
(b) Data transport devices (network interface: processor to processor)
(c) Data storage devices (storage interface: processor to storage)
Devices with dual or triple role are not uncommon!
Devices |
Data Rates |
Sensors |
1 Bps–1 KBps |
Keyboard Entry |
10 Bps |
Communications Line |
30 Bps–200 KBps |
CRT Display |
2 KBps |
Line Printer |
1–5 KBps |
Tape Cartridge |
0.5–2 MBps |
Figure IOBU1:
Data rates for some traditional data presentation devices (source: [Flynn95])Legend:
CRT—Cathode Ray Tube.
Devices |
Data Rates |
Maximum Delay |
Graphics |
1 MBps |
1–5 s |
Voice |
64 KBps |
50–300 ms |
Video |
100 MBps |
» 20 ms |
Figure IOBU2:
Data rates for some traditional data transport devices (source: [Flynn95])Legend:
s—second.
Devices |
Access Time |
Data Rate |
Capacity |
Disk |
20 ms |
4.5 MBps |
1 GB |
Tape |
O (s) |
3–6 MBps |
0.6–2.4 GB (per cartridge) |
Figure IOBU3:
Data rates for some traditional data storage devices (source: [Flynn95])Legend:
s—second.
Types of I/O organization:
(a) Program-controlled I/O
(b) Interrupt-driven I/O
(c) DMA-managed I/O
Most processor workload comes from data storage devices.
Internal microprocessor design is crucial for data storage devices support!
The P/M bus forms the major processor interface to the I/O controller/device.
The P/M bus is the major design challenge!
Issues of importance:
(a) Finding the physical location
(b) Finding the path
(c) Finding the data, with no/minimal central processor involvement
I/O coprocessor types:
(a) Multiplexer channel (many low-speed devices)
(b) Selector channel
(one high-speed device with hi data assembly/disassembly)
(c) Block channel (combination of the above two)
I/O for multiprocessors
(devices accessible to all processors in the system):
(a) via a single low-speed asynchronous bus
(b) via multiple high-speed synchronous buses
(c) same as above, with intelligence
STORAGE SYSTEM DESIGN
Disk cache buffers: Access time reducers
Spatial locality increased
Prediction during the silence periods
Figure IOBU4:
Three possible locations for disk cache buffers (source: [Flynn95])Legend:
P—Processor,
IOP—Input/Output Processor.
Figure IOBU5:
Miss ratios for three different locations of disk cache buffers (source: [Flynn95])Legend: Self-explanatory.
CD—Disk,
CIOP—Storage controller,
CU—Cache in memory.
Disk arrays: Files distributed across several synchronized disks
Bytes of each block are distributed across all disks;
Time to read a file and the buffer-to-processor transfer rate—N times better
Figure IOBU6:
Structure of a disk array (source: [Flynn95])Legend:
P—Processor,
s—number of disks acting as a single unit.
For (1, s) configurations, n = E( f ) and
For (1,16):
For (1,8), we would have:
Figure IOBU7:
Numerics of a disk array (source: [Flynn95])Legend:
n—number of blocks in a file.
I/O IN MULTIPROCESSOR SYSTEMS:
THE BOTTLENECK
OF TERAFLOP COMPUTING
Some processing related issues:
Contribution of the virtual memory traffic to I/O
Contribution of metacomputing in heterogeneous systems to I/O
Application |
I/O Requirements |
Storage |
Environmental and Earth sciences |
||
Eulerian air-quality modeling |
Current 1 Gbyte/model, 100 Gbytes/application; projected 1 Tbyte/application. 10 Tbytes at 100 model runs/application. |
S A |
4D data assimilation |
100 Mbytes–1 Gbytes/run. 3-Tbyte database. |
S A |
Computational physics |
||
Particle algorithms Radio synthesis imaging |
1–10 Gbytes/file; 10–100 files/run. 20–200 Mbps. 1–10 Gbytes. HiPPI bandwidths minimum. 1 Tbyte. |
S IOB S IOB A |
Computational biology |
||
Computational quantum materials |
150 Mbytes (time-dependent code) 40–100 Mbps. |
S IOB |
Computational fluid and plasma dynamics |
||
High-performance aircraft simulation |
4 Gbytes of data/4 h. 40 Mbytes to 2 GBps disk, 50–100 MBps disk |
S IOB |
Computational fluid |
1 Tbyte 0.5 Gbps to disk, 45 MBps to disk for visualization |
A IOB |
Figure IOBU8:
I/O requirements of Grand Challenge applications (source: [Patt94])Legend:
S—Secondary,
A—Archival,
IOB—I/O Bandwidth.
Some disk interface models:
Intel Touchstone Delta
A 2D array (16*32) of PE nodes with 16 I/O nodes on the sides
Intel Paragon
Inexpensive I/O nodes can be placed anywhere within a mesh
Thinking Machines CM-5
Fewer high-bandwidth I/O processors
Encore InfinitySP
The highest performance I/O pump on planet
Figure IOBU9:
Parallel I/O subsystem architecture (source: [Patt94])Legend:
n—number of drives.
Some network interface technologies:
Network capacity may become a critical consideration
Relevant for multimedia applications
Type |
Bandwidth |
Distance |
Technology |
Fiber Channel |
100–1,000 Mbps |
LAN, WAN |
Fiber optics |
HiPPI |
800 Mbps or 1.6 Gbps |
£ 25 m |
Copper cables (32 or 64 lines) |
Serial-HiPPI |
800 Mbps or 1.6 Gbps |
£ 10 km |
Fiber-optics channel |
SCI |
8 Gbps |
LAN |
Copper cables |
Sonet/ATM |
55–4.8 Gbps |
LAN, WAN |
Fiber-optics |
N-ISDN |
64 Kbps, 1.5 Mbps |
WAN |
Copper cables |
B-ISDN |
£ 622 Mbps |
WAN |
Copper cables |
Figure IOBU10:
Network capacities and characteristics (source: [Patt94])Legend:
LAN—Local Area Networks (up to several meters),
WAN—Wide Area Networks (up to several kilometers).
REFERENCES
[Flynn95] Flynn, M.J.,
Computer Architecture: Pipelined and Parallel Processor Design,
Jones and Bartlett Publishers,
Boston, Massachusetts, 1995.
[Patt94] Patt, Y. N.,
“The I/O Subsystem—A Candidate for Improvement,”
IEEE Computer, Vol. 27, No. 3, March 1994 (special issue).
Veljko Milutinovic
IOB:
State of the Art
DISK CACHING DISK
Problem:
Optimizing the I/O write performance
Solution:
Using a small log disk (cache disk) as a secondary disk cache
to build a disk hierarchy
A RAM buffer collects write requests; passes to cache disk when it is idle
Performance close to the same size RAM, for the cost of a disk
Conditions:
The higher the temporal locality - the higher the performance
Reference:
[Hu96] Hu, Y., Yang, Q.,
“DCD—Disk Caching Disk:
A New Approach for Boosting I/O Performance,”
Proceedings of the ISCA-96, Philadelphia, Pennsylvania, May 1996,
pp. 169–178.
POLLING WATCHDOG
Problem:
Efficient handling of incoming messages in message-passing systems
Solution:
A hardware extension which limits the generation of interrupts,
to the cases where polling fails to handle the message quickly
Conditions:
Message arrival frequency is the criterion for selection (interrupt/polling)
Reference:
[Maquelin96] Maquelin, O., Gao, G.R., Hum, H.H.J., Theobald, K., Tian, X.,
“Polling Watchdog: Combining Polling & Interrupts
for Efficient Message Handling,”
Proceedings of the ISCA-96, Philadelphia, Pennsylvania,
May 1996, pp. 12–21.
Veljko Milutinovic
IOB:
IFACT
A Distributed Shared I/O
on the Top
of Distributed Shared Memory
Essence:
References:
[Milutinovic96a] Milutinovic, V.,
“Some Solutions for Critical Problems
in Distributed Shared Memory,”
IEEE TCCA Newsletter, September 1996.
[Milutinovic96b] Milutinovic, V.,
“The Best Method for Presentation of Research Results,”
IEEE TCCA Newsletter, September 1996.