Tech! archive #40

D_iBR

обычная кошмарная
домашняя страничка

Ежекакполучится околокомпьютерное обозрение

<<< ^{предыдущий}

Tech! archive #40

^{следующий} >>>

	Последний выпуск	Архив	Ссылки	Полезности	humor.filtered	Фотки	О сайте

Это - достаточно беспорядочный архив сообщений конференций сети fidonet, которые на момент их прочтения мной показались полезными или интересными. Многие устарели, многие узкоспецифичны и малоинтересны, но может оказаться и что-то новое...

- __techs (2:5015/42) ----------------------------------------------- __techs -
Msg : 40 of 1000                          Scn
From : Peter Sobolev                       2:5030/84       24 Jun 96 23:59:00
To   : Mike Shirobokov                                     25 Jun 96 22:34:36
Subj : RISC (was:Оптимизация)
-------------------------------------------------------------------------------
@AREA:RU.HACKER
Hello Mike!

24 Jun 96 01:06, Mike Shirobokov wrote to Peter Sobolev:

PS>> ну не наезжаю я на RISC! Тебе втоpому уже говоpю. Мне самому RISC'и
больше
PS>> чем CISC'и нpавятся. А MISC вообще пpедел мечтаний :)

MS> misc - mono instruction ... ? ;-)

minimal (instruction set computer ;) фоpт пpоцессоpы, дети Муpа

вот текстик (я его сюда как-то кидал, но все pавно - познавательный и пpиятный)

       MuP21--A High Performance MISC Processor

    _________________________________________________________________

                   Chen-hanson Ting and Charles H. Moore

     ______________________________________________________________________

   Offete Enterprises, Inc.

   1. MISC vs. RISC vs. CISC

   The controversy between RISC (Reduced Instruction Set Computer) and CISC
   (Complicated Instruction Set Computer) has pretty much settled, and RISC
   has won. Most newer and more powerful processors developed recently are all
   RISC processors, like SPARC, MIPS, Alpha from DEC, PA from HP, and PowerPC
   from IBM. However, CISC processors persist due to momentum, like the Intel
   x86 family, and in the microcontroller area where raw speed is not an
   important factor.

   The basic principles behind the original RISC processors are valid, such
   as:

   a. Simple instruction set is faster
   b. Complicated memory accessing instructions are not necessary
   c. A large register file facilitates software
   d. Complicated functions are best handled in compiler
   e. Simpler processor is easier to design and to build

   However, RISC is a good idea falling in the wrong hands. The emphasis on
   simplicity is all but forgotten. The RISC processors we see now are more
   complicated than many of the CISC processors. The relentless push towards
   higher speed left behind a bloody trail. Some of the problems in the RISC
   architecture are quite evident:

   a. RISC processors are inherently slow, because each instruction still
   needs many machine cycles to execute. Instruction pipelines are used to
   accelerate the execution. However, the pipeline must be flushed and
   refilled when a branch instruction is encountered.

   b. Increasing speed in the RISC processor creates a large disparity between
   the processor and the slower memory. To increase the memory accessing
   speed, it is necessary to use cache memory to buffer instruction and data
   streams. The cache memory brings in a whole set of problems which
   complicate the system design and render the system more expensive.

   c. RISC processors are very inefficient in handling subroutine calls and
   returns. Efficient subroutine mechanism is critical to the performance of a
   processor in supporting high level languages. Many RISC processors use a
   large register file, which is windowed to facilitate subroutine call and
   return. However, the register window must be big enough to handle a large
   set of input, output, and local parameters. The large register window
   wastes the most precious resource in the RISC processor. A large register
   file also slows down the computer system during a context switch, which
   must save the register file and later restore it.

   Our opinion is that in RISC, the reducing of the size of the instruction
   set is effective in reducing the complexity of the processor and improving
   its performance. However, the principle of simplicity was not enforced
   enough to realize the full benefit from this principle. In the MISC
   architecture, we like to explore to power of simplicity to its limit, to
   see how far we can push the CMOS technology in reducing the costs of
   building computer systems and increasing their performance. We like to have
   answers to the following questions:

   a. What is the minimum set of instructions in a microprocessor to make it
   useful in solving practical programming problems?

   b. What will be the performance of a microprocessor with such a minimum set
   of instructions?

   c. What facilities in a microprocessor are necessary to reduce the
   complexity and the system costs of a computer?

   d. How to best utilize the current CMOS technology to build such MISC
   processors?

   2. The MISC Instruction Set

   What is the minimum set of instructions in a practical microprocessor? The
   CISC processors generally have 100 or more instructions. The RISC
   processors have about 50 instructions. In our investigations, it was
   obvious that 16 instructions are not sufficient to support all the
   necessary functions required in a microprocessor. 50 instructions are too
   many. The minimum number of instructions is somewhere between 16 and 32. A
   convenient choice is to limit the number of instructions to 32 and
   implement a microprocessor with 5 bit instructions.

   Here is the instruction set implemented in MuP21:

   MuP21 INSTRUCTION SET

Transfer Instructions: JUMP, CALL, RET, JZ, JCZ
Memory Instructions:    LOAD, STORE, LOADP, STOREP, LIT
ALU Instructions:       COM, XOR, AND, ADD, SHL, SHR, ADDNZ
Register Instructions: LOADA, STOREA, DUP, DROP, OVER, NOP

   So far, we have implemented only 24 instructions, leaving some room for
   future expansion. This MISC instruction set seems to be adequate in the
   applications we have coded, including quite elaborate operating systems and
   demonstration programs.

   It is interesting that we have ADD instruction but not subtraction, that we
   have XOR but not OR instruction, and that we have OVER but not SWAP.
   Obviously, subtraction can be synthesized by compliment and addition. OR
   can be synthesized by compliment, AND, and XOR. OVER and SWAP are very
   similar, in that they allow accessing the top of the data stack. However,
   it is difficult to determine which is more fundamental in a stack machine.

   3. MuP21 Architecture

   MuP21 is the first member of a series of MISC microprocessors. The primary
   constraint on the design of this microprocessor were that it had to be
   housed in a 40 pin DIP package, and that the silicon die had to be less
   than 100 mils square. We determined that a 20 bit microprocessor could be
   implemented within these physical constraints. There would not be enough
   I/O pins to support a processor with wider data and address buses.

   MuP21 must use DRAM as its primary memory, as DRAM offers the best bit
   density and the lowest cost per bit. However, it has to boot from ROM or
   other 8 bit memory devices, and it also has to address various I/O devices.
   Therefore, we need a memory coprocessor to handle the buses and to generate
   the proper control signals to the memory and I/O devices.

   A very unique feature of MuP21 is to generate NTSC signals to drive a color
   TV monitor, because it will be targeted to many applications which uses the
   TV monitor as the principal display device. A video coprocessor was
   designed to run in parallel with the main processor to display video frames
   stored in the main DRAM memory.

   The main CPU in MuP21 thus includes the following components:

   a. A Return Stack to nest subroutine return addresses
   b. A Data Stack to store parameters passing between subroutines
   c. A T (Top) Register as the central holding register for operands
   d. An ALU which takes operands from T and the top of Data Stack and returns
   the results of ALU operation to T Register
   e. An A (Address) Register to hold a memory address for fetching or storing
   data from/to memory
   f. A PC (Program Counter) Register to hold the address of the next
   instruction
   g. An Instruction Latch which holds four 5-bit instructions to be executed
   in sequence

   The memory and data buses are 20-bit wide. The instructions are 5-bit wide.
   Therefore, four instructions can be packed in each 20-bit word fetched from
   memory. This is a natural instruction pipeline. After 4 instructions are
   executed, the slower external memory is ready to supply the next set of 4
   instructions. The processor can be four times faster than the memory. Fast
   cache memory and the associated control circuitry are not needed.

   The execution speed of MuP21 is very fast because of the simple instruction
   set and the dual stack architecture. The ALU instructions can be executed
   very fast because operands are taken from the T register and the top of the
   data stack, and the results are returned to the T register. There is no
   need to decode the source and destination registers. Actually, the ALU
   operates continuously. Once the data in T register an the top of the data
   stack are stable, ALU results from COM (complement of T), SHL, SHR, XOR,
   AND, ADD, and conditional ADD are generated spontaneously. The ALU
   instruction only selects the proper results and gates them back into the T
   register. The operations of the MuP21 processor can thus be summarized in
   two steps:

   a. Read a 20-bit word from memory and latch it into the instruction latch.
   b. Execute the 5-bit instructions by latching proper results into the T
   register.

   MuP21 is thus much faster than RISC machines, because the RISC processor
   must follow the following sequence to execute one instruction:

   a. Read an instruction from memory and latch it.
   b. Decode the instruction and select the operand registers.
   c. Execute the instruction.
   d. Store results back into the selected designation register.

   A stack based processor is more advantageous than a register based
   processor because the source and destination registers are defined in
   hardware and no register decoding is necessary.

   MuP21 executes instructions at a speed of 10 ns per instructions. The peak
   execution rate is thus 100 MIPS. It achieves this remarkable performance
   using only the now outdated 1.2 micron CMOS process, because of the
   simplicity in its architecture and the MISC instruction set. Accessing the
   slower DRAM memory derates its performance to about 80 MIPS.

   4. Video Coprocessor

   MuP21 has a video coprocessor which runs in parallel with the main CPU. The
   video coprocessor read 20-bit words from the DRAM memory and interprets a
   20-bit word as four 5-bit instructions, similar to the main CPU. However,
   the video coprocessor instructions changes the output voltage at the VIDEO
   output pin to generate NTSC color video signal suitable for displaying on a
   standard TV monitor.

   The video processor is synchronized to a 14.39 MHz external clock to
   maintain precise timing of the video output. Whenever it is ready to fetch
   a new word from the DRAM memory, it gets a word via the memory coprocessor
   without delay, because the video coprocessor has a higher priority over the
   main CPU, and the memory coprocessor will grant its memory request as soon
   as possible. After the video coprocessor gets a word from DRAM, it will
   execute four instructions before fetching the next word. During this
   interval, the main CPU can request memory access from the memory
   coprocessor. Hence, when the video coprocessor is turned on, it consumes
   25% of the memory bandwidth of MuP21.

   The instruction set of the video coprocessor is as follows:

Opcode Hex   Name     Slot   Cycles
B       00    Black    x      1
S       17    Sync     x      1
R       1F    Refresh 2      1
K       13    Skip     0      1
C       15    Burst    x      1
P       0x    Pixel    x      1
J       18    Jump     0      0

   When the MSB in a 5-bit video instruction is set, the instruction causes
   special action in the video signal generator. When the MSB in an
   instruction is reset, the other four bits specify the color of one pixel to
   be displayed on the monitor. The assignments of bits are: 0 I G R B
   where G, R, B stand for green, red and blue, and I stands for intensity.

   A video frame is first constructed in DRAM memory from the video
   instructions. When the video coprocessor is turned on by setting the LSB in
   the Configuration Register, the video coprocessor fetches the instructions
   in sequence and execute them. The results are a continuous stream of analog
   signals at the VIDEO output pin. When this pin is connected to the input of
   a video monitor, color pictures will be shown on the monitor. The main
   processor can change the pixel instructions in the video frame to cause the
   picture to change dynamically.

   Since the video frame is completely constructed in the DRAM memory, it is
   easy to produce video signals either in NTSC format or in PAL format. This
   feature makes MuP21 a very powerful and versatile device to produce TV
   images. It will thus find many applications where video output is needed.

   5. Memory Coprocessor

   The Memory coprocessor in MuP21 is mostly hidden from the user. It performs
   the following tasks in the background:

   a. It arbitrates DRAM access requests from the video coprocessor and the
   main CPU. The memory request from the video coprocessor has priority over
   that from the main CPU.

   b. It generates the proper control signals to DRAM and SRAM memories, and
   also the I/O enable signal to I/O devices. A DRAM RAS cycle is 50 ns. SRAM
   and I/O have two accessing speed: slow cycle of 250 ns, and fast cycle of
   15 ns. The memory coprocessor allows MuP21 to use a variety of memory and
   I/O devices without additional interface circuitry.

   c. It controls the address and data buses to the memory and I/O devices.
   When accessing DRAM memory, the 20-bit addresses are multiplexed over pins
   A0-A9, and data bus consists of D0-D9 and AD10-AD19. When accessing SRAM
   memory during booting, the address bus consists of A0-A9 and AD10-AD19,
   while the 8-bit data bus is on D0-D7. When accessing I/O devices, the
   addresses are on A0-A9, and data are on D0-D9 and AD10-AD19.

   Memory and I/O accesses are controlled by address lines and two bits in the
   Configuration Register. The memory map of different memory and I/O devices
   are:

       Address             Device
       0-FFFFF             20-bit DRAM memory
       12000-1203FF        Slow 20-bit I/O devices
       14000               Configuration Register
       16000-1603FF        Fast 20-bit I/O devices
       18000-1BFFFF        Fast 8-bit SRAM memory
       1C000-1FFFFF        Slow 8-bit SRAM memory

   Internally, MuP21 maintains a 21 bit data/address bus. The MSB bit 20 is
   the carry bit in ALU operations. It also selects DRAM memory when low, and
   SRAM or I/O when high. According to the memory map, MuP21 addresses
   directly only 256 KB of SRAM memory. However, Bits 18-19 in the
   Configuration Register are forced on the address bus when reading or
   writing SRAM. This paging mechanism allows MuP21 to access 1 MB of external
   SRAM memory.

   6. Applications

   MuP21 is a very powerful microprocessor because it is fast, and it has a
   fairly large addressing space. It also uses very little power. It is
   therefore suitable for a wide variety of applications in which high speed,
   low power consumption, and large addressing space are important factors in
   the design. Here is a list of potential applications for MuP21:

   Advanced video games
   TV signage
   Video test pattern generators
   CAD design system
   Telephone switching system
   Handheld computers
   High speed communications systems
   Intelligent hard disk controllers
   Robotic controllers

   7. Conclusion

   MuP21 is the first member of a family of microprocessors based on the MISC
   principles. It proves that there is still room to improve on the RISC
   architecture. By insisting on the minimum set of instructions,
   microprocessors can be further simplified and its performance improved. We
   were amazed that MuP21 can run at a peak speed of 100 MIPS, using the
   currently outdated 1.2 micron CMOS process,. With the more advanced 0.8
   micron process, MuP can be made to run at 200 MIPS rate. Moving on to 0.5
   micron, the speed can be increased further to 300 MIPS without much
   efforts.

   MuP21 is a 20-bit microprocessor, constrained by the 40-pin DIP package.
   Using packages with more pins, the design can be easily expanded to 32-bits
   and beyond. A wider data/address bus will improved the throughput and also
   allow greater addressable memory space for applications dealing with
   massive amount of data. This is another direction to evolve the MISC
   architecture.

   With a simpler and more efficiency architecture, the MISC processors can be
   built with smaller silicon dies and thus the yield will be much higher than
   the more complicated RISC and CISC processors. The MISC processors will
   also consume much less power when running at equivalent speed. MISC
   processors will be much cheaper than RISC and CISC processors, and can
   compete effectively against them on the basis of favorable
   price/performance ratio.

     ______________________________________________________________________

  Related Info on MISC Chips
    * Computer Cowboys
    * Offete Enterprises
    * MISC Chips
    * P8 Microcontroller
    * P21 Microprocessor Press Release
    * F21 Microprocessor
    * S21 Simulator for MuP21 and F21
    * P32 Microprocessor

     ______________________________________________________________________

   Information compiled by:
   Jeff Fox
   Ultra Technology
   2510 10th Street
   Berkeley, CA 94710
   (510) 848-2149 or 848-0565
   jfox@netcom.com
   jfox@dnai.com
   http://www.dnai.com/~jfox

PS>> Пpосто компилятоpы генеpят код для RISC'ов если не менее эффективно,
PS>> то уж выигpыша пеpед CISC'ами не дают.

MS> еще как дают. там конвееpов и pегистpов куча, оптимизить - одно
MS> удовольствие.

это называется потенциальный выигpыш. А pеального не видно.

MS> у меня остались пpямо пpотивоположные впечатления.

ничего не могу сказать ;)

CodeRipper

--- DiskEdit 7.02 under OS/2 v2.2
* Origin: frq RUHACKER.ZIP - RU.HACKER FAQ. 00.00-07.00msk, 9600+ (2:5030/84)

<<<

архив dibr

>>>'