A Virtualizable MAchine for Multiprogrammed Operation Based on Non-Virtualizable Microprocessors

Microcomputers are proliferating in d ed ic ated applications and as single-user general-purpose digital computers. Many common applications on larger machines are inherently multi-user and require a multiprogrammed mode of operation. Multiprogrammed operating systems, although desirable for this reason and to maximize utilization of expensive system components, have not yet been satisfactorily implemented on m ic rocom put er s. It is shown that a typical microprocessor -the Intel 8080 is inherently incapable of supporting a multiprogrammed operating system due to a lack of any privileged instruction set whatsoever. Other disadvantages of microprocessor-based systems that affect their capability for multiprogramming are discussed, including the limited memory address space, lack of relocation aids and lack of a "test and set" instruction for synchronization purposes. A machine architecture is proposed that utilizes two or more 8080s in a master/slave relationship to effectively implement a privileged instruction set. The architecture is shown to be virtualizable -that is, capable of supporting a virtual machine monitor -and to have good storage protection and fault-tolerance characteristics. A "Dynamic Memory Banking" system is included in the architecture that relieves the 64K limitation on memory


operation.
Multiprogrammed operating systems, although desirable for this reason and to maximize utilization of expensive system components, have not yet been satisfactorily implemented on m ic rocom put er s. It is shown that a typical microprocessor --the Intel 8080 is inherently incapable of supporting a multiprogrammed operating system due to a lack of any privileged instruction set whatsoever. Other disadvantages of microprocessor-based systems that affect their capability for multiprogramming are discussed, including the limited memory address space, lack of relocation aids and lack of a "test and set" instruction for synchronization purposes. A machine architecture is proposed that utilizes two or more 8080s in a master/slave relationship to effectively implement a privileged instruction set. The architecture is shown to be virtualizable --that is, capable of supporting a virtual machine monitor --and to have good storage protection and fault-tolerance characteristics. A "Dynamic Memory Banking" system is included in the architecture that relieves the 64K limitation on memory ii resources, makes program relocation unnecessary and allows the assignment of memory to whatever process requires it at whatever address. This memory system simplifies problems involved in implementation of virtual storage; the central concepts are applicable to larger machines as well. Required and optional aspects of operating system software for the proposed architecture are discussed and specific suggestions for implementation are made.
iii  This would be a mistake· In some ways they · are a "throwback" to earlier machines.
To understand their limitations and the reasons for them, a brief review of pertinent aspects of computer evolution is useful.

Machine evolution
The primary "evolutionary force" in computer systems design has been the advancement in solid state physics [Osborne 1975].
The earliest stored-program computers were essentially one-of-a-kind designs based on vacuum tube circuitry the "state of the art" of the day. They were slow and consumed impressive amounts of electrical power.
Programs were written strictly in machine language, usually "toggled in" through the front panel switches or read in from punched cards. They were successes nonetheless, as no combination of man and desk calculator could approach their raw "number-crunching" power. Due to their construction, of course, they were extremely expensive. For primarily this reason, their use was limited to applications where a high premium was put on speed of computation, or where the sheer mass of computation required made manual calculation totally impractical [Rosen 1969]. Several manufacturers introduced vacuum tube based commercial machines such as the IBM 709.
These machines gradually gained acceptance, but their high cost and unreliability limited their range of practical applications.
The development of the semiconductor made the use of the computer practical for tasks without massive computational and data-handling needs. Discrete semiconductors (transistors) and other discrete components were used to construct computers like the IBM 7070 and IBM 1401 that brought the capabilities of data processing within the economic reach of most 1 arger businesses. Development of programming aids such as assemblers and high-level languages received much emphasis. · computer systems were still, like their ancestors, single-job machines -they ran one program at a time.
Solid-state technology continued to advance. Methods for "microminiaturizing" components were developed; these methods had great impact in the computer design field.
Small and Medium Scale Integration (tens or hundreds of devices on a single "chip") made two kinds of computers possible. It has been observed that each technological advance in solid-state devices results in two directions of development -systems comparable in performance to already existing machines can be made smaller and less expensive, and machines of about the same size and cost can be developed with greater capabilities [Osborne 1975]. Microminiaturized circuitry resulted in machines like the IBM 360 with capabilities significantly advanced from earlier machinesand, eventually, a new class of machines called minicomputers, which had the same characteristics of earlier machines but which, as implied by their name, were much smaller in physical size and also cost significantly less.
Large Scale Integration (LSI), or the combining of the equivalent of literally thousands of components onto a single chip, brought the next stage in computer hardware development. Using LSI, it was possible to develop a complete CPU in one package. The essential features of a CPU an Arithmetic & Logical Unit (ALU), Accumulator, Registers and Control Unit -were all included, although the chip still required significant support circuitry depending on the particular application involved. Microprocessors will undoubtably find their most common application as intelligent controllers of devices heretofore limited to control by logic circuits designed specifically for the application or even by mechanical switching assemblies.
Designers find that it is often easier to adapt a microprocessor to the task with software and a minimum of hardware than it is to design, debug and implement essentially new control circuitry. "It has been suggested that any digital system employing more than fifty gates is a candidate for application of a microcomputer" [Hilburn 1976]. Examples of these applications include microwave ovens and sewing machines, to name two applications already converted to CPU control. Heating systems, traffic signals, washing machines and many automotive systems are among those which will be seeing control by microprocessors before long.
In addition to its advantages as an inexpensive alternative to custom logic designs for controlling common devices, the microprocessor has always held promise as a general-purpose digital computer. In fact, the first 8-bit microprocessor --the Intel 8008 --was contracted for by the Datapoint Corporation, a manufacturer of intelligent terminals and small computer systems [Osborn 1975]. Since 1 975, when MITS, Inc., of Albuquerque, New Mexico, introduced the first microcomputer system in kit form, general purpose microcomputers have proliferated. As software --both systems and applications --has been developed, these new entries at the low end of the computer spectrum have enabled businesses which could not before afford a computer to automate. In addition, they have been making inroads in what used to be minicomputer areas, forcing minicomputer manufacturers to upgrade the performance and versatility of their machines to that of former medium-scale digital computers [Rao 1978].

Computer architecture economics
Large scale integration of computer CPUs and memory has resulted in a significant reversal of economic considerations in computer use.

Earlier Economics
Until rather recently, the central electronics (CPU circuitry and main memory) of a computer system accounted for most of the system's cost. On-line disk storage accounted for a significant proportion of the remainder, with mechanical peripherals making up the difference. Operating systems were written to maximize usage of the most expensive parts of the system, primarily the CPU and main memory. In earlier systems --the IBM 1401, for example --the CPU was waiting for a mechanical peripheral much of the time. In heavily "I/O-bound" jobs, the CPU was idle for a very high percentage of the time.
In the next generation of computers the IBM System 360, for example --features were included in the design that made it possible for more than one program to be in memory and executing at the same time. In this way, one program could be waiting for a card to be read or a line to be printed while the other program was receiving CPU time. This process is called "multiprogramming"; it shall be discussed in more detail later.

Microcomputers
Large scale integration has resulted in extremely inexpensive CPUs.
main memory costs.
In addition, it has had great effect on Today, a 16K (bytes) memory board for a microcomputer can be purchased for less than four hundred dollars.
Meehan ic al peripherals, on the other hand, cost close to what they did several years ago. The result is a real inversion of relative costs. It is not uncommon to find a complete microcomput~r system with 64K of main memory attached to a printer that costs more than the system itself! Disk systems, with their mechanical aspects, remain expensive; a microcomputer system with a single hard disk drive may cost several times the amount of the same system without the disk.
Complicating the economic comparison of microcomputer systems with earlier computers is the fact that microcomputers, as they exist today, are pr imar il y single-user machines. In most cases, the user's software has complete control of the machine; no "supervisory program" ex is ts in m em or y that i s ab 1 e to , for ex am p 1 e , d i st r i b u t e CPU time between two or more programs in memory at the same time.
There are two primary reasons for this. First, systems software in the form of good high-level language processors and single-user disk operating systems have only become available during 1977. It is a simple fact that good software for a new machine takes significant time and effort to develop. Multi-user systems software is more complex than single-user software, and will take more time to develop.
Secondly --and most importantly --microprocessors in common use today do not readily support multi-user systems.
will later be shown in detail.

This
Multi-user capabilities desirable The current inability of microcomputers to support multiprogramming blunts the sharp cost advantages of such a system.
reason Multiprogramming it was many years is still desirable for the same a g o max i mum u t i 1 i z a t ion o f expensive system components.
Merely the identity of the expensive components has changed.
for ex am p 1 e , i f s e v er a 1 user s It would be advantageous, could share access to an expensive hard disk system, as an alternative to each user requiring his own, dedicated disk system. The same comment holds for mechanical "unit record" devices like card readers, punches and line printers. Al though main memory has come down in price due to large scale integration, it is not cheap; 64K of memory still costs approximately $1500.00. As mentioned above, a CPU can be obtained for less than twelve dollars. Utilization of main memory is still a consideration that urges implementation of multiprogramming.
As multiprogramming has been commonly available on large machines since the mid 1960's, some applications have been developed which take advantage of shared auxiliary storage resources. Instead of merely sharing the hardware, these appl ic at ions share the use of in format ion av ail ab 1 e from auxiliary storage. This process is referred to as sharing a "common data base." There are two ways of accomodating this class of applications on microcomputers.
One method is to provide a machine controlling the auxiliary storage devices. The sole purpose of this machine would be to service requests from other machines for auxiliary storage operations. This approach involves an addition al sys tern and addition al I/O interfaces (for communications betwe~n the data base machine and the machines being served). The other method, of course, is to implement a multiprogramming capability on microcomputers.

Intent of the study Overcome limitations of microprocessors
Th e intent o f th i s stud y , there for e , i s to d i s c us s the di ff ic ult ies that would be encountered in the implementation of a multiprogramming capability on a microprocessor-based system. The concepts of multiprogram-ming and multi processing will be defined and described in detail. The concept of machine "virtualizability" will be described, and multiprogramming the implications of the concept for will be discussed. The Intel 8080, a typical microprocessor, will be closely examined as to its virtualizability and inferences drawn regarding its ability to support true multiprogramming.
A computer system architecture using the 8080 will be introduced and defined, and it will be shown that it meets the requirements for virtualizability and can therefore support multiprogramming.
After the architecture has been defined, basic requirements for multiprogrammed operating systems software will be specified and discussed.

Main memory architecture
In the course of specifying the machine architecture, a main memory organization will have been described. It will be shown that this organization, referred to as "Dynamic Memory Banking," makes several "classical" computer science problems trivial, as well as reducing the complexity of solutions to other problems as well. Although it will have been structured for microcomputer use, it will be apparent that the basic concept is adaptable to large machines as well, providing the same benefits as those provided to a microcomputer system. It is a natural tendency for users to see a computer from their point of view --as a machine to do their job.
The user may well think of the system as a strictly sequential device; it processes one job after another the same way it processes instructions in the user's program --in consecutive fashion. Indeed it is possible for a computer to process jobs in exactly this way --early computers were limited to this mode.
One very noticeable result of this method of processing jobs was a waste of CPU time. Mechanical peripherals, for example, were slow. When a program directed 1 1 that a card be read, the CPU's processing power was suspended while the input operation took place. For jobs requiring a 1 arg e number of unit record operations, it was not unusual for the CPU to be idle well over ninety percent of the total job time. "Even when processors are kept busy most of the time, the utilization of other computer resources is often poor; for example, any main storage not occupied by the current job (and some minimal part of the operating system) is essentially a wasted resource" [Shaw 1974]. The search for a method to make use of the wasted resources of CPU time and main storage --both very expensive commodities in those days was what resulted in multiprogramming.
More than one independent process The basic concept underlying multiprogramming is maintaining more than one independent sequential process in an active state in main storage [Shaw 1974]. For the purposes of understanding, the informal definition of "process" given in Shaw is acceptable: ''A sequential process (sometimes called 'task') is the activity resulting from the execution of a program with its data by a sequential processor." Examples of sequential processes (hereafter referred to as simply "processes") are jobs, programs, or even special-purpose routines in the operating system itself.
A key characteristic of a multiprogrammed mode of operation is an appearance of simultaneity in running of the processes in main memory. All jobs or programs seem to be executing at the same time. Depending on the time scale an observer wishes to use, this simultaneity may disappear.
The single-processor multiprogrammed system can still execute only one instruction at a time. The illusion of simultaneous execution of all programs is due to the rapid "multiplexing" of the CPU between the various active processes awaiting processing. The CPU may execute instructions from one process for a few milliseconds, stop and save registers and status from that process, load registers and status from a second process, and then commence executing the instructions of that second process. A few milliseconds later, a "process switch" to a third process may occur, and so on. Depending on the design philosophy of the operating system, the process switches may occur only when a process that is executing requests an I/O operation or an I/O operation requested by a higher priority process is completed. Alternately, these switches may occur whenever a timer preset by the operating system signals. time-slice system.

Other benefits
The latter case is referred to as a Although the increased utilization of expensive CPU time and main memory was by itself sufficient motivation for the implementation of multiprogramming, other benefits are also realized. Since more than one job can be entered into main memory at one time, multiple "entry points" are possible -users can enter jobs at remote job entry stations or from their own individual terminals [Shaw 1974].
In addition, hard ware is not the only "resource" that can be shared -principal software resources, such as language compilers, I/0 routines and system utlities, can be utilized by more than one process in main storage. Another benefit of multiprogramming is that processing time can be scheduled; more CPU time can be allocated to higher-priority programs by the multiprogramming operating system [Shaw 1974].
Operating systems software In the preceeding discussion, we have mentioned an "operating system" several times. Processes occupying main storage simultaneously cannot be expected to allocate CPU time and main memory resources to themselves. A usually complex piece of software is written for this purpose and essential parts of it remain present, or "resident," in main storage at all times. This software is referred to as the "operating sys tern," and more specifically as the "multiprogrammed operating system." For the purposes of this thesis, we shall use the term "operating system" to refer to control programs having responsibility for task (process), job and data management, and shall exclude processing programs 1 ike language translators, service programs and user programs, although a complete definition of the term may include them [Katzan 1973].
Managing resouces of the machine. As "multiprogramming involves the sharing of time on processors and the sharing of space in main storage, as well as the potential sharing of other resources" [Shaw 1974], the prime purpose of the operating system must be to manage the resources of the system. CPU time must be distributed among the active processes in main storage in accordance with some allocation strategy; we shall refer to the collection of operating system routines which perform this function as the "processor scheduler." The main storage resources belonging to the system must be allocated to jobs requesting memory. This may develop into a major system bottleneck.
It has been recognized that poor use of direct access storage is a common cause of inefficiency in multiprogrammed systems, and much work has been done in developing algorithms to make optimal use of these devices [Teorey 1972].
Isolation of users from each other. An essential aspect of a multiprogrammed system is the measure of isolation it provides between processes. Remember that the user still views the system as a sequential device, and the operation of his program must be identical to what would be expected were it to have exclusive use of the machine, with the exception of running time.
If other processes that happen to be in mai n storage at the same time can affect in any way the operation of a user's job, the above condition is not met.
Since several processes coexist in storage, one obvious function of isolation is to prevent one process from inadvertently or willfully altering the memory belonging to any other process or the operating system itself. This function is referred to as "memory protection." It may take the form of store protection, the most common form, in which a process may "see" what is in the storage allocated to another process but is prevented from changing it. This protection is sufficient to ensure that a process is not destroyed by another process, but can not satisfy privacy considerations. As confidential data (business and personnel records) are often temporarily in storage for processing, another process could continuously read this data from memory and store it on its own auxiliary storage for later perusal.
When memory protection is implemented such that it prevents reading of other processes' memory, it is known as 11 store-and-fetch protect ion." Another readily understood requirement for isolation deals with I/0 devices. Peripherals allocated to one process must not be used by another process until the first process has released them. One can easily imagine the i rr i tat ion that would ensue if User A was to cause a printed message to appear in the middle of a long printed report being produced by User B's job! User files on direct access auxiliary storage must also be protected in much the same way that main storage is.
The privacy concerns referred to above are even more important in preventing unauthorized access to files; the owner of the file must have control over which, if any, classes of users other than himself may use or modify it [Shaw 1974].
To summarize the "man ag em ent" and "isolation" functions of the multiprogramming operating system, we can say that the system must first allocate resources to processes and then enforce those allocations.
Can be simple or sophisticated in concept. It was mentioned above that storage allocation routines can be simple or very complex. This "continuum of complexity" can be carried throughout most of the operating system structure.
The more sophisticated the desired allocation strategies, or the more flexible the enforcement of those allocations, the more complex must be the operation system. The advantages of multiprogramming --even in its simplest realization -is not without its price, and that price increases with sophistication.
A processor scheduler, for example, can use a simple or very complex algorithm to determine which process will receive CPU time next, and how much it will receive. Overhead caused by the operating system can be significant during execution [Shaw 1974]. It is well to remember that increased sophistication in operating system capabilities requires a "trade off" in the form of increased overhead.
"Perfect" processes. It is certainly possible for any computer system to support a multiprogramming system, provided some very stringent conditions are enforced. The first condition is that all processes actively cooperate. A process must not attempt to destroy other processes nor interfere with them in any way. After some short amount of processing, the process must call an operating system routine that will determine the next process to be run. All requests for I/O should be forwarded to the operating system via a call. There must be no attempts to "monopolize" the CPU or any other system resource.
The second condition is that all processes coexisting in main storage be fully debugged.
The need for this is obvious, as it is apparent that if this condition is not fulfilled, there is no way to gu_ arantee that the first condition will be, despite the best of intentions. It is difficult at best to ensure that these conditions particularly the second --exist.
"Imperfect" processes. Imperfect programs could, for ex am p 1 e , enter a hard 1 o op . In this situation, the "scheduler routine" would never be called; the imperfect process would have halted the other processes. An imperfect process, of course, can attempt to write into memory not allocated to it, as any programmer whose job cancelled due to a "protection exception" can attest! Systems programmers have long realized that there are occasionally presented to the system, processes that can only be described as "hostile." The primary intent of such processes seems to be to "break" the system --to circumvent its protection and allocation mechanisms. These processes often severely test the capabilities of the best operating systems; not all computers have the hardware features necessary to support an operating system that can repel any of them.

Hardware enhancements required
In practice, multiprogramming is not often attempted on hardware lacking certain essential features; the stringent conditions imposed on processes admissable to such machines make the effort impractical except for occasional dedicated applications. Shaw [1974] specifies some hardware features that are required for or that simplify multiprogrammed operation.
We will briefly discuss each. in line must be started and, second, the processor scheduler must be "notified'' that the process that was waiting for the completed operation is no longer waiting for I/O. Interrupts can occur from "external sources" --the operator pressing the "interrupt key" on the front panel, for example. It is not essential, but is advantageous, for the interrupts to be "prioritized." This allows interrupts to be assigned to different priority classes; an interrupt that requests a service that can be delayed can be assigned to a lower priority class than one that deals with a distinctly time-sensitive one. An example of the former class would be an interrupt signifying that a line printer is ready to accept more output. An interrupt caused by the receipt of a character from a console keyboard might belong to the latter class; it is possible that the character might be lost if another arrives before the interrupt is "serviced." Storage and Instruction Protection. We have already discussed the need for memory not allocated to a process to be protected from the actions of that process. There must be hardware features, controllable by the operating system, to protect and unprotect areas of main storage. "Instruction protection" refers to the need to limit the use of certain machine instructions to operating system use only.
As an obvious example, consider the instructions used by the operating system to protect other users' memory, as described above.
If any user can use these particular instructions, the protection mechanism is rendered ineffective. From . s discussion it can be deduced that instructions that prev iou inhibit the interrupt sys tern of the machine or d irec tl y affect I/0 devices should also be placed in this class of instructions, commonly called "privileged instructions." Dynamic Address Relocation.
Address relocation in itself can certainly be termed a "classical" problem in operating systems design. In early computers, machine code was produced that would work only at one location in memory.
This situation persisted for some time even into the multiprogramming era. Early versions of IBM's Disk Operating System (DOS) required that several copies of the same program be available --one copy assembled or compiled for each of the possible memory areas in which it might be required to run. This was clearly an unacceptable situation. Language processors and load er s were d ev el oped that permitted the "binding" of a process to a particular address to be delayed until the time when it was to be loaded into memory for execution. We can reasonably say that this much capability is a necessity for a practical implementation of a multiprogramming system. For reasons that will be discussed later, there are further advantages to be gained by delaying the binding time until the address is actually referenced by the CPU in the execution of the process. This capability is known as "dynamic address relocation" and is not strictly necessary but may be very convenient for implementation of a multiprogrammed system.
Timer. The existence of a timer capable of generating interrupts to the CPU is convenient in the implementation of multiprogramming, and is essential for a time-slice system. Jobs can be ranked by priority by the system rather than by the operator and processed in that order.
Printed and punched output can be stored temporarily on disk until the appropriate peripheral is available for use -this is referred to as "output spooling." Although direct access auxiliary storage is not strictly necessary for multiprogramming, it is certainly extremely useful, and maximizes efficiency of the peripheral devices.

Summary
To summarize, multiprogramming is that condition that exists when several independent processes occupy main storage simultaneously and receive "multiplexed" service from the CPU to achieve an appearance of simultaneity in execution.
An operating system is required to allocate the resources of the machine to processes competing for them and to en force those allocations; the more so phi st ic ated the allocation strategies and flexible the enforcement of allocations, the more overhead can be expected. Provided that processes meet certain stringent requirements, any computer can be multiprogrammed; in practice, however, several hardware features are required or advantageous for implementation of multiprogramming. Given a fixed technology, parallel execution of hardware units can, in principle, dramatically improve system performance as compared with sequential operation. Several independent processors are often connected to common storage and control circuitry; these include central processors, I/O processors, such as data channels, and special purpose processors, such as a r i t hm et i c u n its • " [ Sh aw 1 9 7 4 J More than one CPU for user processes .
For the purposes of this thesis, we shall define multiprocessing as the use of two or more CPUs to operate on user processes simultaneously. This may or may not, depending on the degree of sophistication of the operating system, include the operation of two or more CPUs on the same user job.
More processing power.
An expected consequence of multiprocessing is that the raw processing power of the system is multiplied by approximately c, where c is the number of CPUs. The factor cannot be ex ac tl y c, as there b O ccasions when both CPUs wish to fetch from the same ~ill e m odule"·, "memory this will result in a momentary delay for one of the CPUs due to "memory contention." True simultaneity of execution. Another consequence is, of course, true simultaneity of process execution. Two processes, each being executed by its own processor, are executing literally simultaneously. This true simultaneity can actually cause difficulties, especially as regards "critical sections" of code in the operating system. Critical sections are areas of code usually dealing with resource allocation or enforcement that, due to their function, should only be "occupied" by one process at a time.
Al though a discussion of critical sections sufficient to impart understanding would be too lengthy to include here, a simple example may help to give an intuitive feeling for the problem. Suppose that both CPUs in a dual CPU system remove themselves from their respective processes simultaneously for some reason (just a coincidence, for example) and enter the CPU scheduling routine simultaneously. If one or the other is not prevented from proceeding, it is inevitable that both CPUs will assign themselves to the same user process simultaneously. The result of this will, in general, be disastrous to the user process for ex am p 1 e , an y additions of one area in memory to another will be done twice.
Just this example alone should convince one of the need for exclusion from "critical sections" like the processor synchronization (as mentioned above) and scheduling in general are areas where the operating system must be significantly more complex [Baer 1976].

Hardware enhancements required
Multiprocessing requires some additional hardware features over and above those required for multiprogramming.
Control of processors. Since there are two or more CPUS, the rel at ion ship between them must be defined and arranged in hardware. CPUs may be arranged in a master/slave relationship, where one serves as a processing peripheral to the other, or may be arranged as equals in what is referred to as a 11 symetrical" multiprocessing system.
Memory access. There must be some hardware provision made to enable both CPUs to access a common memory. The memory system must be able to resolve contention conflicts.

Synchronization.
Exclusion from critical sections can be done via software mechanisms. This, however, involves complex and confusing routines, as well as an increase in overhead [Shaw 1974]. A hardware solution is much more desirable.
To enable exclusion from critical areas as defined above, an instruction must exist that tests a byte and sets it in one instruction. This is necessary even in uniprocessor systems, but the multiprocessor environment puts even tighter requirements on this instruction. In a uniprocessor system it was sufficient to ensure that this operation could be completed within one machine instruction, as interrupts (and, therefore, process switches) could occur only between instructions. In a multiprocessor environment, however, it is conceivable that two CPUs could be executing a "test-and-set" at the same time on the same byte in memory.
The hardware must therefore ensure that the entire test-and-set is performed without any possibility of a memory access from the other processor(s) during it.
summary To summarize, multiprocessing differs from uniprocessor multiprogramming in one primary way --more than one CPU is in use working on user jobs. The amount of raw processing power is therefore multiplied without having to include the peripheral and memory resources that would be required in simply obtaining a second independent computer system. The computer that is presented to the user, then, could be thought of as a different machine than the "bare" hardware he is actually using --one possessing a subset of the instructions available on the bare machine. Some instructions are provided to alert the operating sys tern to the fact that service is desired that only the operating system, with its access to the privileged instruction set, can provide --I/O, for example. The machine that the user "sees" is sometimes referred to as a "virtual machine" [Shaw 1974] and sometimes as an "extended machine" [Goldberg, R. 1974]. This paper shall adopt the convention of referring to this machine as an "extended machine," as the term "virtual machine" will be given a more stringent definition later.
In most cases, the user should be quite satisfied with his extended machine; he has access to almost the entire instruction set and can "call" the operating system to perform those functions for him that he is prohibited from performing for himself. The lack of a few machine instructions would seem to be a sm al 1 price to pay for the advantages to all users provided by multiprogramming.
What of the user, however, who is writing an operating system, or other software that requires use of the privileged instruction set? How is he to test his system? Under a multiprogrammed operating system, only one body of privileged software can be run [Goldberg,R. 197  wishing to convert from one system to another to gain the advantages of a more sophisticated extended machine are usually faced with a monumental task in rewriting and/or retranslating their entire program library. The process sometimes takes months, during which one operating system must be "up" some of the time, and the other the rest of the time.
As the time at which it is desirable to run a particular program does not always coincide with the time at which the operating system it can run under is up, practical difficulties of great magnitude can appear, especially in a heavily utilized facility. The root of this problem is also the inability to run more than one privileged software II 1 nuc eus" at one time.
Obviously, what would solve this problem is a special a ting system capable of supporting multiple extended 0 per machines that would each look like the real machine, complete with the full instruction set. The operating systems writer could then test his software during normal operating hours.
The installation converting from one operating system to another could run both, each on a different extended machine provided by this special operating system. This type of ex tended machine -appearing to the user to be identical to the real machine -will be termed a "virtual machine" (VM). The special operating system that creates these virtual machines is called a "virtual machine monitor" (VMM) [Popek 1974].
The concept of machine "virtualizability" can now be introduced.
One can readily see that a VMM --an operating system that allows users access to the full instruction set and still maintains control of the machine resources --must be different in concept from a simple multiprogramming system.
It should not be surprising to learn that this operating system requires a greater level of hardware support than is required for a multiprogrammed system.

Support of Virtual Machine Monitor (VMM)
Simply put, a machine is "virtualizable" if it is capable of supporting a VMM [Popek 1974]. This might appear to be too simple a definition, as it would seem to admit a "loophole" --a software simulator. simulation as a method It is possible for any machine to run a "simulator." A simulator is software that can run as a user program that simulates a complete computer system. In fact, such a method is often used in so ft ware development for a machine that has not yet been physically constructed. several copies of the simulator can run on the current machine under an ordinary multiprogramming system. Either way, the current system is able to support several "virtual machines" identical to the bare hardware comprising the new machine, and isolate them from each qther.
The factor that will prevent us from calling the simulator a VMM is speed or 1 ac k of it. When a new machine is being simulated by a different machine, it is not uncommon to find the virtual machines thus provided slowed down by as much as 1000 to 1 [Goldberg, R. 1974].
The same situation still applies if we have a current machine simulate multiple copies of itself. Due to the fact that the instruction set of the "host" extended machine is identical (except for the privileged instructions) to the instruction t of the machine being simulated, routines to simulate se · can be much more efficient. It may be possible to oper at1on construct a simulator that exhibits only a 20 to 1 slowdown (Goldberg, R. 1974].
This may be a significant improvement over a 1000 to slowdown, but it is still excessive.
purely software-based simulators, then, are inappropriate as VMMs, and properties of a virtual machine can be specified that close the "loophole" of software simulators. Then the equivalence to be guaranteed is that between running on an actual smaller hardware machine and the environment we have created." [Popek 1974] Definitions It is now possible to present better definitions for . tual machine monitor" and "virtual machine." 11v1r "We say that a virtual machine monitor (VMM) is y control program that satisfies the three a~operties of efficiency, resource control, and pquivalence. Then functionally, the environment e hic h any program sees when running with a virtual :achine monitor present is called a virtual machine." [Popek 1974] overhead considerations Just as multiprogramming exacts a price, in the form of "overhead," so too does a VMM.
The VMM is, of course, software. CPU time is required to run it, and main memory to hold it and its data. A collection of jobs, or "jobstream," requires additional time to run under a VMM as compared to the bare machine.
Robert P. Goldberg, in his paper titled Survey of Virtual Machine Research, describes some principal sources of overhead concerned with VMMs: "Maintaining the status of the The complete integrity of all status bits, and reserved memory locations must be preserved.
virtual processor. visible registers, (interrupt control) Support of privileged instructions. Third-generation virtual machine systems have expended processor overhead to trap and simulate privileged instructions. Support of Paging Within Virtual Machines. Software techniques are currently used to transform a paged address in a VM into an address in the VM and finally into a real memory address. Console Functions. The operator's panel and 1 ights are simulated in so ft ware.
This overhead is not invoked as frequently as the others cited above.
Addition al sources of overhead include the reflection of exceptions and I/O interrupts to the virtual machines, support of virtual timers and clocks, and the translation of I/O channel programs before the VMM initiates I/0." [Goldberg, R. 1974] A few areas in the above quotation may require some clarification, as they involve concepts not already discussed in this thesis.
In regard to "paging," the term refers to a Yet it is also required that the actions of a program running on a virtual machine only have an effect on that virtual machine; therefore, access · to that real area of memory cannot be permitted. The solution is to "map" an address referenced within a VM to an address in real, physical storage. It is therefore possible for a system to provide an address range starting at zero to each virtual mach1· ne i· t t suppor s.
The responsibility for this mapping, in the IBM 370, f al 1 s to both the hard ware and the VMM. A hardware feature referred to as "Dynamic Address Translation (DAT)" maps memory addresses specified by processes to Physical memory addresses in main storage by reference to tables in main storage or very fast memory buffers. The On sibility of the VMM in regard to DAT is to keep these resp translation tables updated with the proper correspondence between VM addresses and physical addresses.
The term "page" itself refers to a subdivision of main memory for which this address translation is indivisable; that is, an entire page range of addresses in a VM is translated to a corresponding page range of addresses in physical storage.
The The VMM must therefore translate addresses in the commands to the channels into physical addresses. "For 1 m achines supported with paged memory mapping, channel virtua translation can be a significant source of overhead" program (Goldberg, R. 1974].

Implications for multiprogramming
Virtual machine monitors and the virtual machines they create go rather beyond the modest requirements of a simple multiprogramming system.
It is clear, however, that if it can be shown that a machine architecture can support a virtual machine monitor, it can support a multiprogrammed system. At the least, each job could be "run" on a separate virtual machine.
Practically that would not be necessary; this point, however, emphasizes that machine virtualizability is a sufficient condition for support of a multiprogrammed system.

Hardware requirements
Popek and Goldberg's paper goes on to define precisely the requirements a machine must fulfill in order to be capable of supporting a VMM.

Definition: third generation machine
Popek and Goldberg's theorems regarding virtualizability apply specifically to third generation machines, so it would be wise to review their definition of this class of computer systems: "The processor is a conventional one with two modes of o per at ion, supervisor and user. In supervisor mode, the complete instruction repertoire is available to the processor.
In user mode, it is not.
Memory addressing is done relative to the ontents of a relocation register. The instruction ~et consists o! the ~sual com?lement of ~nstructi?ns for doing arithmetic, ~esting, branching, m~v~ng data in memory, and the like • . . . After superficial complexities in such systems are removed, what remains is generally a primitive protection system built around a supervisor/user mode concept, and a simple memory allocation system built around a relocation-bounds system." [Popek 1974] A brief explanation of the concept of a "relocation register" is required. In a machine with a single relocation register, all memory accesses are done relative to the contents of that register. In other words, the address specified by the CPU is added to the contents of the relocation register in order to obtain the physical address.
Different virtual machines could each refer to address zero, but, provided that the relocation register was loaded with the beginning address of the memory block allocated to each virtual machine before it was given control of the CPU (via a process switch), they would each be referencing different areas of physical storage.

Instruction classification
Popek and Goldberg define three classes of instructions on the basis of their behavior. The relationship between these classes determines whether or not the machine is virtualizable.
Privileged. An instruction is privileged if and only if it "traps" when executed in the user mode and does not "trap" when executed in the supervisor mode. Briefly, a trap is an interrupt generated by the CPU when it attempts to access out-of-range memory addresses or, as in this case, attempts to execute an instruction reserved for operating system use while it is in the user mode [Popek 1974]. sensitive. The next class of instructions are called "sensitive" instructions. There are two types of sensitivity: "control sensitivity" and "behavior sensitivity." An instruction is control sensitive if it can affect the amount of resources allocated or can change the processor mode.
control sensitive instructions are those that can affect the control that a VMM must have over the resources of the system. An example of a control sensitive instruction would be "Mask Timer Interrupt." Execution of this instruction would enable a process to continue indefinitely, monopolizing a resource --CPU time --the VMM must be able to control.
Behavior sensitive instructions are those whose effect depends on the value in the relocation-bounds register or the mode (supervisor or user). In short, they do not always do the same thing, depending on their location in physical storage or the mode [Popek 1974].  [Popek 1974].

Virtualizability theorem
Theorem 1 in Popek and Goldberg's paper provides a criterion, based upon the above instruction classification, for determining whether a specific machine is virtualizable: "For any conventional third generation computer, a virtual machine monitor may be constructed if the set of sensitive instructions for that computer is a subset of the set of privileged instructions." [Popek 197 It should be pointed out that the converse --if any sensitive instruction is not privileged, a VMM can not be constructed --does not necessarily hold. Popek and Goldberg indicate that it may be possible to work around certain types of deficiencies in an ad hoc manner to implement a VMM on a machine not quite satisfying the requirements of the theorem.

Recursive virtualizability theorem
Since an important purpose of the VMM is to permit more than one body of privileged code to run on a single machine simultaneously, an interesting question can be asked: Is it possible to run a virtual machine monitor as a user program under a virtual machine monitor; put another way, since a virtual machine generated by a VMM is supposed to be an efficient duplicate of the real machine, can another (or a copy of the same) VMM run on that virtual machine? A computer whose hardware permits this operation is called r sively virtualizable. recu Theorem 2 in Popek and Goldberg specifies requirements for recursive virtualizability: "A conventional third generation computer is recur s iv e 1 y v i r tu a 1 i z ab 1 e i f i t i s : ( a ) v i rt u a 1 i za bl e and (b) a VMM without any timing dependencies can ' be constructed for it." [Popek 1974] The restriction on timing dependencies arises from the equivalence property described earlier.
The virtual machine is equivalent to the real machine with two exceptions: timing and resource availability.
If a VMM includes time-sensitive code, it almost certainly will not perform on a VM as if it were running on the bare machine; therefore, the timing restriction must be included in Theorem 2 above.
The consequence of the resource availability exception, incidently, implies that indefinite recursive virtualizability will result in the virtual machines at each successive level being smaller and smaller, until there is insufficient main storage av ail ab le to continue the recursion.
Recursive virtualization, as might be intuitively realized, can grossly increase the overhead on the machine, and is of little practical value.

3 Summary
The concept of virtualizability is a useful one, not only for its primary benefits, but also because it is a sufficient condition for support of a multiprogrammed operating system. A theorem has been discussed that enables one to determine whether a machine is virtualizable on the . 5 of a defined classification of the instruction set. baSl This will be of significant use in the following chapters.

CHAPTER 4 A TYPICAL MICROPROCESSOR: INTEL 8080
The Intel 8080 is a typical microprocessor in good supply.
It is among the ear 1 iest microprocessors to be designed, and has been sold in sufficient quantities to have brought the price down to below twelve dollars for a single microprocessor.
Most other microprocessors, having been designed after the 8080, have at least its capabilities, so it would be reasonable to suppose that if a virtualizable machine architecture can be constructed using the 8080, it should be possible to do so with most other microprocessors as well.
The Intel 8080, therefore, is the microprocessor that will be ex am in e d a s to its v i r tu a 1 i z ab il it y , and sh a 11 b e the device upon which the architecture in Chapter 5 will be based. period or less, instruction length will vary from four to eighteen clock periods (2.0 to 9.0 microseconds); slow memory will increase the number of clock periods required for a machine cycle and, therefore, the instruction execution time [Osborne 1975 J. The data lines are also used to output data in many ways. from the CPU or input it from the device interface. The same lines used to specify a memory address are also used to specify an "I/0 Port" number; these I/O Port numbers are eight bits wide (a range of 0-255 decimal) and are duplicated on the high-order and low-order 8-bits of the address lines.
Naturally, the status during these operations differs from memory operations; there are status lines that indicate an input or output operation.

Interrupt system
The 8080 may be interrupted between instructions, and there is a line into the CPU that indicates whether an interrupt is pending (I NT) • Provided interrupts have not been disabled within the CPU (by execution of a disable interrupt (DI) instruction), this line is checked by the CPU between each instruction and, if the line is "high" (binary value 1), the CPU will cease obtaining instructions from memory and output a status indication that the interrupt has been acknowledged. It is then the responsibility of the interrupting I/O inter face to "jam" an instruction operation code onto the data lines, where the CPU expects to find it.
This instruction will in most cases be a "restart" instruction --a one-byte "call" to an implicit address. OUT (Output). The Output instructions (OUT) is also sensitive because of two normal uses. The obvious use of an OUT instruction is to output data to an output device. e is nothing to prevent a process from printing on any Th er terminal it wishes --any I/O interface could be addressed.
As I/0 devices are system resources, OUT is control-sensitive.
The OUT instruct ion has another function in many microcomputer systems --it is used to write-protect blocks of memory. An 110 port is implemented on each memory board, and an OUT instruction addressing that port is used to set a block of memory to write-protect status (read but no write) or to unprotected status (reads and writes permitted). An imperfect or hostile process could use this function of the OUT instruction to unprotect other processes' assigned memory and destroy its contents. This is another function that qualifies the OUT instruction as control-sensitive.

IN (Input).
While the reason is not as obvious as for the OUT instruction, the IN instruction is also control sensitive. The status indication made available by an input interface to indicate that a character has been received and is available to be read from the data . port lasts only until an IN instruction reads the character from that data port.
If a process performs an IN instruction from a data port assigned to a second process, that second process may miss a character.
Other possible problems Obviously, on the basis on non-privileged sensitive instructions, the 8080 does not satisfy the requirements of Popek and Goldberg's theorem on virtualizability.
The same problem indicates that it will be practically impossible to

Current multi-user systems
It would therefore appear that the Intel 8080 is badly suited for a multiprogrammed operating system. There are, however, some "time-sharing" systems based on the 8080 or a similar processor available. These systems were investigated to determine the method used to implement multi-user capabilities.
cromemco sys tern The time-sharing system vended by Cromemco, Inc., a fa cturer of an S-100 bus system based on the Zilog Z-80 ma nu microprocessor (a CPU upwardly software compatible with the BOBO), is constructed as a typical multiprogrammed system.
An operating system is resident and controls CPU time allocated to each process; a timer-generated interrupt initiates the process switches.
Mr. Brian G. Job, Sales Manager at Cromemco, provided the following information regarding their system.
The Cromemco system uses a memory bank system to alleviate the problem of the 1 im i ted address range.
As is the case in essentially all S-100 bus memory boards, a Cromemco memory board has a switch bank to set the address range to which the board will respond. Unlike other boards, however, a Cromemco board also has a set of eight switches that allow the board to be assigned to one or more of eight "memory banks." If only switch 6 is on, for example, that board is assigned to bank 6 only. If both switches 2 and 4 are on, that board is shared by banks 2 and 4 it "belongs" to both banks.  e.    . ng operation. dur1 A given board will respond to a memory · n initiated by the CPU only if the specified address operat10 is within the set address range of the board and the board is "bank enabled." An output port is implemented on each board --all boards have the same port number. In order to enable the boards belonging to bank 2, for example, the CPU must execute an OUT instruction to that port number with bit 2 in the a cc um u 1 at or on ( 1 ) • Each user has his own bank of memory, which can coexist with other banks in the same address range, since only one bank will be enabled at any one time.
Each user can therefore have up to 64K of memory (if 64K of memory was assigned to that bank when the memory boards were set up) .
This would seem to be a good way of protecting the memory allocated to one process from disruption by another process.
The Z-80 has an instruction set somewhat expanded from that of the 8080, but there are still no privileged instructions. The Z-80 is therefore no better than the 8080 in regard to its capability to support a true multiprogramming system.
How, then, is the Cromemco time-sharing system supported by the hardware?
The answer is that it is not. Interrupts instruction or, perhaps, turn other banks of memory on and alter their contents.
Cromemco does not limit the user to use of this time-sharing BASIC. As the operating system has been designed as a multiprogramming system, the "owner" of the system can allow anything, including machine language programs, to be run.
In this case, Cromemco warns, there is no way to guarantee that a malfunctioning process will not disrupt other users and the operating system itself.
In short, this is a return to many of the stringent conditions mentioned in Chapter 1 that guaranteed that a multiprogramming operating system could run on any machine. Unfortunately, they were impractical when they were discussed, and it must be concluded that the Cromemco time-sharing system is largely impractical for the same reason. Cromemco made a step forward with the

ShackShare system
Mr. Connor also provided information on his own timesharing system, called "ShackShare." He indicated that the system was developed primarily for use by his own programmers.
The need for the system was dictated by a demand for multi-user business applications.
ShackShare uses a memory bank system much like the CHAPTER 5

PROPOSED MACHINE ARCHITECTURE
The material presented in Chapter 4 would seem to make an 8080-based true multiprogramming system an impracticality. The prime difficulty is apparently a lack of any privileged instruction set whatsoever; "instruction protection," as described earlier, is non-existent. As there is no "supervisor mode" of operation (or perhaps it wo u 1 d be b et t er to say n o " user mod e" ) , i t i s ob v i o us that there cannot be any instructions whose use is limited to the operating system. In short, there is no hardware distinction between the environment in which the operating sys tern runs and the environment in which the user processes run.

Supervisory computer concept
There is a way, however, in which this distinction can be created. This method will be discussed now.
In the design of operating systems, it has been suggested that visualizing a multiprogrammed operating system and the user programs as a collection of processes being run on logically separate computers can be useful in clarifying the design of the operating system software [Gaines 1972].
It has also been often expressed that the software/hardware boundary, especially in operating systems, has been shifting 62 in the d irec ti on of implementing more of the central operating system as hardware functions [Tanenbaum 1976] [Shaw 1974], and that, in fact, "hardware and software are logically equivalent" [Tanenbaum 1976].
It has already been determined that the 8080 can provide not even the minimal support required for a multiprogrammed system; the "software/hardware boundary" has been moved as far as possible toward software, and yet it is not enough.
If one 8080 will not support a multiprogrammed system, perhaps two should be tried.
Dedicated CPU for operating system The concept of a "supervisory" computer has been been discussed in relation to multiprocessing systems for some time [Shaw 1974] [Gagliardi 1975] [Baer 1976]. An architecture which employs this concept where one processor has control over the others is cal 1 ed an "asymmetric multiprocessor"; the type of processor control is referred to as "fixed." This distinguishes it from the architecture in which all processors are equal, have access to the operating system code and schedule themselves; this architecture is called a "symmetric multiprocessor" and the type of control referred to as "floating" [Baer 1976].
A recent book on microprocessors describes a cellular computer in which each node consists of two microprocessors one handles a single user's program, and the other oversees communications with other nodes and operat i ng system routines [Rao 1978].
The effect on virtualizability, however, has r ently not been addressed. By placing two microprocessors appa in a master-slave relationship, two hardware "modes" have effectively been created. The mode is either "master" or "slave," depending on which processor the software is running.
This architecture moves a distinction normally made in the instruction set of a single CPU into the structure of the machine.
This master Isl ave arrangement of processors is a central concept of the architecture being proposed. The master processor executes only operating system code. As the master processor is in control of the real resources of the machine, this provision is essential.
One or more CPUs for user processes The slave processor executes, generally, user programs. It is probable that many areas of the operating system could be executed by the slave processor, but, since there is a processor dedicated to operating system functions, it is likely that very little of this will be necessary.
This is a boundary that can be moved if necessary to maximize throughput of the system.
There is no reason why there cannot be more than one slave processor. Increasing this number would make the system fit the definition of a multiprocessing system introduced in Chapter 2, which required that more than one processor be used on user processes.
The master-slave concept implies that one processor 1 under the control of the other. iS strong Y Means must be d in the architecture to implement this control. For provide example' the master must be ab le to interrupt the slave.
Since we know that a user could mask off interrupts, master must be ab le to interrupt the slave even when the the slave's interrupts are masked! This effectively would make Disable Interrupts (DI) a privileged instruction, since only the master processor would have the ability to truly mask interrupts.
As it is desirable for the multiprocessing system to multiprogram, all slave CPUs should have access to a common main storage. Otherwise, each process would have to be moved into a different processor's 11 local 11 memory for each slice of its execution time, producing excessive overhead.
The master processor should have access to the common main storage as well, as it will occasionally have to examine register save areas, etc., as well as perform I/O from common memory.

Input/Output
The master processor, having control over real machine resources, must have control over I/O devices. All IIO interfaces will be on the master processor's bus. Moreover, what memory there is is subject to "fragmentation. 11 This is a phenomenon that is common to multiprogrammed systems using static memory allocation. The operating system is · g the bot tom 4K of memory, 1 eav ing 6 OK unused and occupy1n available for user programs.
At B, the first three programs n loaded into memory and are active (in execution). have bee The first program to be loaded required 16K and was loaded from the 4K point in memory (just above the operating system) to the 20K point. The second program, which required 12 K, has been loaded from 20K to 32K. respectively. Over a period of time, storage can become fragmented into a "checkerboard of unused (and often unusable) 'holes"' [Shaw 197 4]. In a static allocation system, there are really only two ways out of this problem. One is to cease accept in g j ob s and run u n t i 1 a 11 j ob s in m em or y h av e terminated; the memory is then as it was after initialization and the process of fragmentation can begin again. The other method consists of moving active programs around in memory to "compact" the unused memory into one block. Compaction generates significant overhead; in addition, instruction formats ' addressing modes and register usage must allow . g of code after execution has commenced. Intel 8080 mov1n machine code is not amenable to this activity, partially Us e of the lack of relocation aids. beca

Memory system development
What is required is a memory system architecture which relieves the limitation of 64K for the entire system, makes relocation unnecessary, and allows the assignment of whatever memory is available to any process that needs memory at any address. can contain up to 64K of memory, so the entire system can contain up to 512K, a respectable amount for a multi-user system. An output port is implemented on the board --the port number is the same for every board in the system. To make a bank active (enable it to respond when an address in its assigned range appears on the address bus), it is only necessary to perform an OUT instruction to that port, specifying (on the data lines) the bank number.
Boards not assigned to that bank will be disabled and boards assigned to that bank will be enabled.
Bank assignment. Several alterations will be made to the cromemco concept. For reasons that will become apparent 1 ater, there wi 11 be only four banks. The bank "membership" will be set via an OUT instruction to the port on the board containing the memory block, rather than by means of switches. The port number will not be the same for all boards, but will be different for each.
A new row of eight switches will be used to set the "block number" this will be identical to the port number of the I/O port on the board.
To summarize the operation of the banking, an OUT instruction to a part ic ul ar board wi 11 be able to set the bank membership of that board only. Note that nothing has been mentioned about turning banks "on" and "off," as in the Cromemco system. Four bus lines will be indicators of which bank is being accessed for each memory operation. If, for example, a memory read is being performed from location 5BA6 (hex ad ec im al) and there is a board assigned to the address range 5000 to 5FFF whose · bank membership is banks 0 and 2, the board will respond, putting the contents of that memory location on the data bus, if either bank line 0 or 2 is high.
The motivation for th is o per at ion wil 1 become clear soon. Addressing.
The mechanism for addressing a block of memory will also be altered. At present, the address range of a typical microcomputer memory board is set via a row of switches. If iS desired to be able to address the board on 4K boundaries' four switches (to set the first hexadecimal digit of the address) are required. If the switches are set for a hex ad ec im al c (binary 1100) on a 4K board, for example, the board wi 11 respond to addresses in the range COOO through CFFF. This arrangement is more flexible than in early machines, where the address was designed right into the basic memory structure.
It is necessary, however, to make the addressing mechanism a good deal more flexible.
There is no reason why the address range of a board has to be manually fixed in any way. If the operating system software can at any time specify the address range of a given memory board, it will facilitate memory al location and use in ways that wil 1 be discussed 1 ater. This concept is not unprecedented; the idea of including a "page comparator" on a memory module has been advanced to serve the needs of advanced distributed computer architectures [Anderson 1975L The application of the concept in conjunction with a memory banking system is, however, believed to be novel.
The combination of a dynamically reconfigurable memory bank system with dynamically readdressable memory blocks forms the basis of the "Dynamic Memory Banking" concept.
As shall be shown later, this memory architecture has characteristics that will enable it to fulfill the desired requirements; it will relieve the 64K limitation on memory resources, make relocation unnecessary and allow the This section will define the functions of each of the major components, or "modules," of the system. Following this, ex am pl es sufficient to provide an intuitive understanding of system operation will be presented.
It is believed that this approach will result in understanding superior to that which would result if operation of the system was discussed prior to adequately defining hardware functions.
The specifications which follow are not meant to completely define the hardware involved. Problems such as providing proper circuit timing are left to the implementor.
Module is circuitry designed to allow access to the memory system by all processors and to implement the control the P rocessor must have over the slave processor( s) and master memory bank con fig ur at ion and addressing. Where the detail is appropriate, the identity of signals being passed between modules is specified. Although omitted from the diagram for purposes of clarity, a common internal clock is used by all components of the system.

Master Processor and bus
The Master Processor and its bus greatly resembles a standard 8 080-based S-100 bus microcomputer av ail ab le from several manufacturers today.
All  is shown. The architecture being described, however, provides for three SPMs; the two not shown are connected to the M&PCM in the same manner as the one shown.
The following description of an SPM there fore applies to al 1 SPMs in the system.
The SPM is a c ire ui t board containing an 8 080 and additional circuitry necessary to provide status signals, addresses and data to the M&PCM and to receive command signals and data from the M&PCM.
The sixteen address lines from the 8080 are buffered (isolated and strengthened) and "sent" to the M&PCM. The eight data lines are tri-state buffered and "sent" to the

M&PCM.
A "tri-state buffer" is intuitively a device that allows a signal to either control a line with a high (1) or low (0) value, or have no effect on the line. Th is ind ic at ion wil 1 provide the timing necessary for the M&PCM to "jam" in an instruction other than the one that would normally be fetched from the program code in memory. In this architecture, incidently, all MMs are intended to have unique module numbers --an I/O output to a given port should affect no more than one MM.
Each MM has associated with it an 8-bit "address/bank latch"; a latch can be visualized as a very small memory.
The latch associated with the MM is used to hold the current address to which the MM will respond (as a 4-bit value equal to the high-order hexadecimal digit (four bits) of the address) and also the current bank membership (as four bits). The value is set by means of an I/O output operation from the M&PCM.
The high-order four bits in the latch will represent the address and the low-order four bits of the bank membership. The contents of the 1 ate h is used to determine whether the MM is 11 selected, 11 or active, for the memory operation taking place. The design of the MM does not differ in many re spec ts from the design of a typical S-100 bus memory board on the market today. What is novel in the MM concept is making the bank membership and, particularly, the addressing dynamically alterable. This frees main storage from many of the constraints of a "uniquely addressed" memory architecture.
Different processes can literally occupy the same address space at the same time. The advantages of this architecture shall be discussed later in this chapter. The correspondence between the number of processors and the number of physical memory banks (four of each) is no coincidence.
Each memory bank serves a specific processor.
MMs to be accessed by SPMs 1, 2 and 3 must be assigned to banks 1, 2 and 3 respectively. MMs assigned to bank 0 can be accessed by the Master Processor. A given MM may be assigned to one or more processes at a given point in time.
In fact, a MM may be assigned to no physical bank for most of the time.
In order to clarify the effect of the Dynamic Memory   It is recognized that some of the operations discussed below will require more than one clock period to accomplish, both because of their complexity and because of the occasional necessity to "wait" until specific conditions are fulfilled. Memory error SPM suspended Unused one input data port will contain the "port number" for an I/O operation initiated by the SPM that has been trapped by the M&PCM. Obviously, this data port will contain valid information only when Bit 1 or Bit 2 of the input status port is on. The other data port will contain (for OUT instructions executed by the SPM) the data to be output.
The generation of "IN operation pending," "OUT operation pending," "Memory error" and "SPM suspended" signals will be described later. At that time, the latched address is put on the memory system address bus, the bank line corresponding to the processor making the request is made high, and, after a pause to ensure signal stability, OUTPUT-SYNC is made high.
The MM whose address/bank latch matches properly with the high-order four bi ts of the address bus and the bank bus obtains the contents of the required byte of memory and, after OUTPUT-SYNC goes high, puts that byte on the data bus d makes READY high and keeps it high until OUTPUT-SYNC an drops.
This com pl et es the operation as far as the MM is concerned. The M&PCM waits until READY goes high, pauses to ensure signal stability, and then passes the contents of the memory data bus through to the data bus of the requesting SPM and makes the READY signal to the SPM high. This is the normal completion of the operation for the system. If, however, the READY signal on the memory bus never goes high an error condition ex is ts. Either the addressed MM has malfunctioned, or the SPM has attempted to read from memory that was not assigned to its bank. If the READY signal does not go high by the end of the clock period, the M&PCM turns on the "Memory Error" (Bit 4) and " , SPM suspended" (Bit 5) bits in the input status port on the Master Processor bus, creating an interrupt in the process, and makes the high-order eight bi ts of the memory address av ail able to the Master Processor at the "port number" data port. The READY line to the SPM is left low pending action by the operating system running on the Master Prodessor.
Memory write from SPM. This operation is very similar to the memory read described above; therefore, only the differences will be described. The source of the data is, of course, the SPM. The 8080 does not put out the data onto the bus until the second half of the clock cycle. When this finally occurs, the contents will be immediately sent direct to the data bus of the memory system, provided, of course, that higher priority memory operations have not delayed the operation. If a delay has occurred, the data bUS will be latched by the M&PCM for use as soon as the operation can proceed. In any case, the OUTPUT-SYNC line of the memory bus is not made high until the data from the SPM has been put on the data bus of the memory system and has been allowed to stablize. Needless to say, it is not necessary for any data to be passed back to the SPM. The Simultaneous commands: Release SPM and Suspend SPM at instruction boundary. If simultaneous "Release SPM" and "Suspend SPM at instruction boundary" commands are given to the M&PCM, the "Release SPM" will take precedence until M1 goes low (off) • When M1 again goes high for the fetch of the following instruction, the SPM will again be suspended.
This "lock-step" mode of SPM operation is required for proper functioning of process switches as described in Chapter 6.

Examples of operation
The preceeding discussion has defined the operational behavior of the system and, in particular, the M&PCM. What has actually been described is a set of hardware "primitive

IN instruction trap. It is clear by now that IN and
OUT instructions in user processes must be "trapped" and performed by the Master Processor. As IN requires more complex activity, it will be described; the OUT operation is performed in a similar but simpler way.
The operation of the IN ins true t ion trap has been described above. It shall therefore be used as a starting point. When an SPM executes an IN instruction, the operation is trapped and the Master Processor is interrupted by the M&PCM. This interrupt creates a call to one of the restart locations in low memory; the particular purpose of the routine at that location is to service interrupts from the M&PCM. Depending on the design of the M&PCM, the particular restart generated may implicitly provide the identity of the SPM for which service is required; if this is not done, the I I interrupt routine must first determine (by inputting from each status port) which status port generated the interrupt _ i.e., which SPM requires service. Once finding the port and determining that it is an "IN operation pending" that requires service, the Master Processor "reads" the I/O port number specified by the SPM from the "Port Number" port on the M&PCM. This port number will probably have to be "mapped" to another number. For example, a user program may be writ ten for use of a term in al that can be addressed at port 3B (hexadecimal). When the pro gr am is actual 1 y being executed, however, the user may be at a terminal addressed at 6F. It is the responsibility of the operating system running on the Master Processor to maintain a port-to-port correspondence list and to use it to map a port number in a user program to a physical port number. The Master Processor then performs an IN operation from the physical port and an OUT operation to the data port appropriate to the SPM being serviced on the M&PCM. The Master Processor then performs a "Release SPM" primitive operation. Create interrupt to SPM. It is often necessary for the operating system to interrupt a user process running on an SPM.
There are two possible reasons. One is that the process has received its quantum of processor time and must be suspended for a process switch. The other is that the user process itself may be interrupt driven, and requires interrupts for its proper operation.
In general, "simulation" of an interrupt to a user process proceeds as fol lows.
If it is desired not to interrupt a process while interrupts are disabled, the Master it did not, the operation code of the desired restart instruction is sent to the M&PCM data port by the Master Processor, which then issues a combination "Jam Data" and "Release SPM" command. The M&PCM, as described earlier, will then route the restart byte to ·the SPM data bus and make READY high.
An option in the design of the M&PCM is presented here. It is possible that certain interrupt-driven user processes may be "difficult" to interrupt using the above procedure.
If the process operates the great preponderance of the time in interrupts disabled mode, and only enables interrupts for a short period, it is conceivable that the Master Processor could miss the "window" during which interrupts are enabled. Note, of course, that this problem does not affect the capability of the Master Processor to interrupt an SPM for a process switch; interrupts can be created to an SPM whether or not interrupts are disabled.
The problem arises only in regard to interrupts "simulated" to the user process. This would ensure that the user process operated exactly as if its processor was receiving the interrupt itself.

Characteristics of the architecture
In light of the behavioral operation of the system, it is possible to discuss the characteristics of the machine whose architecture has been specified.

Virtualizability
One might at first ask whether the machine is virtualizable. In Chapter 4, it was noted that three instructions in the 8080 instruction set are sensitive.

Disable Interrupts (DI), I/O Output (OUT) and I/O Input (IN)
can each affect the enforcement of operating system allocation decisions.
The proposed architecture has made those instructions "privileged" by restricting their effect on the system. DI may be executed, and may in fact be used for its orig in al Purpose by a user testing interrupt-driven software. It does not, however, prevent the Master Processor from actually interrupting the process to enforce its al location of CPU time.
The DI could therefore be said to be "simulated" by the system with ultimate efficiency it is simulated in the same amount of time it would take to execute it! IN and OUT instructions have also been made privileged by ensuring they are trapped by the system and simulated by the Master processor.
The simulation is of course not as efficient as the 11 simulation" of the DI i nstruc ti on, nor does it have to be in order to allow the machine to be virtualizable. From Chapter 3, the essential characteristics of a "third generation machine" are a dual (supervisor-user) operation mode and addressing done relative to a "relocation"

As discussed in
register. The dual operation mode has effectively been created as described above. The requirment for a "relocation" register must be interpreted. The purpose of the relocation register is to define the address area allocated to the user process and to provide a means for the process to run as if it were running relative to location zero, irrespective of I I where in main storage it actually exists. The hardware in the architecture that has been described effectively simulates a relocation register without any deg rad at ion in performance whatsoever. The "relocation register" is actually represented in the address/bank latches in the MMs.
The "relocation" of memory accesses, instead of taking place in the addressing circuitry of the machine, actually takes place in the MMs. A user has no way of even attempting to access memory assigned to another user, as it would be in another bank of MMs.
If he attempts to read from or write to memory not assigned to his process, no MM will respond and a memory error condition will be detected and trapped by the M&PCM; an example of this would be a process allocated memory from 0000 to 5FFF attempting to read from location 8048.
In fact, the "relocation register" simulation employed in this architecture is significantly more powerful than a simple register. For example, · it is not necessary to assign a user process a contiguous address space. Some 8080 programs av ail able today use memory in a couple of non-contiguous pieces, 0000-3FFF and DOOO-FFFF for example.
This architecture allows the system to assign real memory to only those areas of address space where memory is actually required.
It would seem that the requirements for Virtualizability have been fulfilled. The machine should be able to support a VMM. It can therefore support a multiprogramming system.
The architecture also presents other advantages, not onlY to systems implemented via a VMM but also to processes running under a simple multiprogramming operating system.

Relocation unnecessary
The problems of program relocation are effectively bypassed in this architecture. Since, in most cases, a process will be assigned only one block of contiguous addresses, n is roughly equivalent to the number of processes P· As, on the average, half of the 4K memory block will be wasted for each instance of internal fragmentation, the amount of memory wasted due to internal fragmentation will be roughly 2pK.
Al though this waste is not considered serious (and becomes less and less serious with falling memory costs), there are ways to reduce the effect of internal fragmentation.
Both methods described below involve using all or some MMs with smaller amounts of memory.
All MMs could have less memory --1K for example.
The waste due to each instance of internal fragmentation would then average .5K, one quarter of the value in the proposed architecture. This, of course, would require that capabilities for addressing MMs on 1K boundaries be provided, resulting in either a reduction in the number of processors in the system or a more inefficient procedure for altering address/bank membership of MMs. This method would also require many more MMs to make up a block of memory for use by a process.
The other method is to provide, for example, a few MMs of 3K, 2K and 1K capacity, which the operating system would assign as "tail end" MMs for processes. This would require a more sophisticated operating system, and, it is believed, would not be worth the additional hardware cost it may well cost less to allocate a full 4K MM where only 1K is required than to provide the extra special-purpose MMs.
In addition, these special-purpose MMs could not be combined into a single contiguous block due to addressing restrictions; they would therefore not satisfy the very desirable concept of being able to consider a block of memory as simply a block of memory.
The conclusion is that it is best to ignore the relatively small waste due to internal fragmentation.

Number of processes
It should be emphasized that the number of processes that may exist simultaneously in main memory is not limited to the number of physical memory banks, as it was in the Cr om em co sys t em • While there can only be four physical banks of memory in the machine, there can be any number of logical banks. When a process initially enters the system, the required memory is assigned by the operating system from a "free list" of MMs. A table maintained by the operating system lists the logical banks of MMs assigned to each process with their respective address assignments, much as il 1 us tr ated in Figure 4 in Section 5. 3. When a process is to receive CPU time, the operating system looks up the identity of the MMs in the logical bank and assigns the proper address and bank membership (a ppr opr i ate to the Particular SPM that will be running it for this quantum of CPU time) to each of them. When a process is not actually running on an SPM, the MMs assigned to it are not assigned to any physical bank.
The number of processes that can be contained within main storage simultaneously is limited only by the total amount of memory available.

storage protection
Protection of memory assigned to one process from the actions of another process is implicit in the system architecture.
As in the Cromemco and ShackShare systems described in Chapter 4, there is no way for a process to even "see" (address) the memory assigned to other processes.
Hence, there is no way in which a memory write of any kind can affect memory not assigned to the process. Unlike the Cromemco and ShackShare systems, there is no way for a user process to switch banks by itself and "get" to memory owned by another process. Storage protection is therefore complete.
A byproduct of this complete storage protection is a high degree of privacy and security between processes in the system. Virtualizable architectures have for some time been an area of interest for designers of "secure" systems [Goldberg 1974].

Fault-tolerance
Another area in which this architecture has strength is tolerance to system faults.
In contrast to conventional systems, in which any fault effectively renders the entire system useless, this architecture is fault-tolerant in two areas: main storage and slave processors. Jack Goldberg, Karl N. Levitt and John H. Wensley, in a paper titled "An organization for a Highly Survivable Memory," stated that "Main memory is typically the most unreliable system unit (except for mechanical peripherals), but is also the system function that benefits most from fault-tolerance techniques" [Goldberg,J. 197 4 J.
In the archi tee tur e proposed above, a failing memory board can simply be removed from the list of usable boards and the system operator notified via the console .
A process might also be implemented in ROM whose function it would be to test a suspect memory board and print a diagnostic report. In addition, the master processor, for CPU time, ranging from very s im pl e round-rob in methods to sophisticated algorithms for specific aims [Shaw 1974].
A trade-off between algorithm performance and overhead has always influenced the choice of scheduler. As might be expected, the "better" the algorithm at dis tributing CPU time optimally, the more overhead it generates.
The proposed architecture includes a CPU (the Master Processor) dedicated to operating system functions. Any time this processor spends on more complex scheduling algorithms will not be taken from CPU time used to process user programs. The overhead cost in implementation of sophisticated scheduling algorithms is greatly reduced. It must again be pointed out, however, that running of the Central Processor does result in some degree of degradation of user processes due to memory contention.

Reduced need for synchronization
In Chapter 2, it was mentioned that multiprocessing requires a significant degree of process "synchronization" to prevent more than one processor from entering "critical sections" of code at the same time. This asymmetric multiprocessing architecture provides that only one processor (the Master Processor) may execute critical sections of operating system code. synchronization.

Shared memory
This greatly reduces the need for It was mentioned above that the complete isolation between processes provided by this architecture is an advantage where privacy and security is required. Some processes, however, may have a valid need to share storage.
It is desirable to be able to provide such a capability for those processes requiring it.

Read/Write memory modules.
There is no reason why the operating system could not assign an individual MM to more than one logical --indeed, more than one physical bank. Provision of this capability may make the process table and its use more complex, but that is purely an operating system software problem --the hardware is fully capable.
In this regard, it should be mentioned that in the case of two processes working on the same area of memory it may be necessary to exclude one process from using the memory during a period when the other process is changing data in that area, as the updated area may not always include valid data during the update [Liskov 1972] [Shaw 1974]. This exclusion must be the basic responsibility of the processes involved, but it may be possible to implement is required in main storage, a little-used page in main memory is written back to its location on the auxiliary device, and the needed page is read in in its place.
Various algorithms are used to determine which page will be eliminated from main storage.
A key characteristic of a virtual memory sys tern is the need for the system to "map" an address in virtual memory to an equivalent address in real memory. In addition, if the required address does not exist in main memory at the time, action must be taken to initiate a page swap. The detection of the fact that the required page is not in memory is termed a "page fault." Once a page is in real memory, an address translation mechanism must be invoked on period was a memory operation by each processor, the system could support two processors before its memory service capacity was exhausted.
An examination of the clock period:memory operation column in Appendix A shows that the retio varies between 3.2 and 11.0. The majority of instructions have a ratio value between 3.33 and 4.0. Four processors would therefore seem to be the maximum that the memory system can practically support in its present form. In fact, the lowest priority processor can expect significant degradation due to memory contention.
The conclusion is that the system can support four processors; adding more would probably not increase the throughput.
The system is "memory speed bound." It might be noted at this point that any time the Master Processor is halted, awaiting an interrupt, is time that the SPMs will enjoy better memory performance. Much work has been done in the area of operating system design; there are many concepts upon which this piece of software may be built. This chapter will not attempt to provide detailed specifications for such software, nor will it even attempt to determine which of the many operating system concepts is most appropriate to this architecture.
There are, however, certain aspects of required operating system This chapter will functions peculiar to this architecture.
describe these aspects and attempt to by which they may be implemented in the suggest methods operating system design.
An initial decision must be made between a VMM and an "ordinary" multiprogramming system.
Due to machine virtualizability and the other hardware characteristics discussed in Chapter 5, either type of software can be supported. A multiprogrammed operating system fulfills the aims of a multi-user system, which was the primary motivation for the study. This thesis will limit itself to examining 118 the pertinent aspects of a multiprogrammed operating system as they apply to this machine. VMM specifications will be left for future work.

Required Functions
All multiprogrammed systems must have certain basic functions dealing with resource allocation and enforcement of allocations.
Operating system routines that allocate resources are known as "allocators"; "when the resource is an active unit such as a central processor or data channel, the allocator is usually called a scheduler" [Shaw 1974].
Schedule SPMs The operating system will require a scheduler for the SPMs. This scheduler may be simple or complex in concept, simply rotating around the active processes in main storage or taking into account such things as job priority, The system must maintain process lists that specify the MMs (module numbers and corresponding address rang es) and I/O resources assigned to each process. These lists may also carry externally generated process information, such as I I job priority, and internally generated process history, such as CPU time used. In short, the process 1 ist for each process should contain all information required by the operating system to properly service the job.

process switches
If an appearance of simultaneity in process execution is to be maintained, each process must be given quanta of CPU time on an SPM regularly. After a quantum of time has been allowed, the process must be interrupted, register contents stored, and a new process given control. the address zero MM for the process to be run, and issues a "Release SPM" command. Housekeeping chores that must now be done by the operating system include transferring the saved register contents from the old process to the appropriate area in the process list, identifying the next process to be run, and transferring the register contents for that process to the "new process" RAM area on the ROM-RAM MM, completing the setting up for the next process switch.
It is estimated that overhead due to a process switch should never exceed two milliseconds, and should in most cases be less than a millisecond. The primary source of variation in process switch time is the number of MMs that must be unbanked and banked.

Run input/ output
A major function of the operating system is running input/output for processes. This includes allocation of the I/O resources and the actual performance of the I/O operations. Port numbers specified by the user process must be mapped to the physical port. In some cases, status bits may differ between physical and "virtual" I/O devices; these must be mapped also. Spooling of output to printers, etc., may also be implemented in the operating system.
Isolation of user processes from physical I/O al so enables the operating sys tern to "transl ate" I/0 requests for a particular device to another device with entirely different operational characteristics. This resembles the ability of virtual machines to "retrofit" new features [Goldberg 1974]; a user process can thereby benefit from improved devices that were not available when the process was written.
Maintain input buffer for processes. Input on the 8080 is normally done character by character. An input character must be "serviced" before the next character arrives at the input interface, or the first character will be lost.
Two keys can be pressed in rapid succession on a keyboard, and unless sufficiently frequent quanta of processing time can be allocated to each process, some method must be provided in the operating system to handle this problem. It is recommended that an input buffer be maintained for each process in the form of a FIFO queue. This should also result in more efficient use of CPU time by the process.
Detect end-of-process There must be some way for the operating system to detect the fact that an end-of-process has occurred. This could be implemented via an output operation to a special port that the operating system would recognize as an end-of-process signal. Alternately, the process could halt with interrupts disabled; as there is no way for the process to restart itself in this position, it is a clear indication that it is done. Both of these methods are appropriate for batch processing. A user on a term in al could indicate termination of his session by turning off his terminal (which the interface could be wired to recognize), or could issue a "signoff" command that the Master Processor would interpret as termination.
When a process terminates, the operating system must return all MMs assigned to it to the free list and delete the process list from its queue of active processes. History information concerning the operation of the process may optionally be posted to a job accounting file.

Optional functions
Al though the functions discussed above are those essential to operation of the system, there are several optional functions which may be implemented.

Simulate interrupts to processes
If it is desirable to allow user processes to be interrupt driven, provision must be made for simulation of interrupts to user processes. As discussed in Chapter 5, the hardware to allow this is already provided. Since the part ic ul ar res tart operation code to be used is under the control of the Master Processor, there is no need for the priority of interrupts simulated to the process to be the same as the priority of interrupts for the Master Processor.

Provide extended machine interface
It was stated in Chapter 2 that a multiprogrammed operating system provides "extended machines" for user processes. The extended machine the operating system being discussed will provide is actually very much like a virtual machine. In fact, it "looks" to user processes like "bare hardware"; the bare hardware it simulates is a uniprocessor 8080-based microcomputer system. This extended machine is rather inefficient where I/O is concerned. The reason for this concerns the way in which an 8080-based system normally does I/O. Figure 5 presents a flowchart for a typical input "driver." The status port is continuously interrogated until the bit indicating a character has been received turns on.
The character is then read from the data port. This makes it difficult to make use of the time a process is waiting for input. A multiprogrammed operating system on a large machine simply notes the fact that the process is waiting for input, and blocks the process from further CPU time until that input arrives.
There are complications which make this procedure impractical in processing current 8080 software. Some processes test the status port every so often to see if the operator wishes to interrupt processing. If no character has been received, processing continues. So it is impossible to suspend a process until input is received simply because it requests information from a status port.
A solution may be to implement a more extended machine for use by processes that have been written for it.
Such processes would signal via an I/O operation to a special port that they wish to be suspended until input is available. Actually, this more extended machine would suffice for running all processes, as it would retain the ability to handle the currently standard I/O methods. Processes that use the more extended features would improve concurrency in use of system resources.

output buffers
In discussing essential features of the operating system, provision of input buffers for processes was suggested.
While not essential for proper operation of the system, output buffers could also be established. As this would eliminate waiting for slow I/O devices, a process could do significantly more work in its quantum of processing time. If a buffer became full, the process could be blocked from further CPU time until the buffer was almost empty, enabling other processes to make use of more CPU time. This feature could be implemented even if the "more extended machine" described above were not.
Multi-bank memory for single processes Thus far, discussion has been limited to handling of processes requiring no more than 64K of memory. It may be desirable to implement a means by which user processes can request more than one logical bank of storage and initiate (through the operating system) bank switches. This, although simple in concept, would require a more complex process list and, therefore, more overhead in process switches. It should there fore be looked at careful 1 y to en sure the b ene fits are sufficient; planned applications will determine the necessity for this feature.

3 Summary
This chapter has discussed required and optional aspects of multiprogrammed operating systems for the proposed machine architecture. Other than those aspects discussed above, the specific design concept is not limited, and the implementor may feel free to use any design philosophy he desires.
CHAPTER 7 CONCLUSIONS AND EXTENSIONS

Conclusions
This study is merely the initial step in the realization of a microprocessor based virtualizable machine.
Much work, including design and construction of the hardware and implementation of an appropriate operating system, remains to be done. Nonetheless, certain conclusions can be drawn at this point.
It has been shown that two distinctly non-virtualizable microprocessors can be combined in such a way as to produce a machine whose total architecture is virtualizable. This is the central conclusion of this thesis, and is clearly interesting from a theoretical viewpoint. A conclusion of practical interest is th~t true multiprogrammed systems are possible and practical with currently available 8-bit microprocessors. The estimated typical system cost computed in Chapter 5 is very reasonable considering the processing power and multi-user capabilities provided.
The Dynamic Memory Banking System, an integral component of the proposed architecture, is based on concepts applicable to larger machines as well. A possible topic for future investigation is the alterations in design required to implement the system on a 16-bit minicomputer or a larger 130 "full-scale" machine.

Recommendations for future work
An initial study of this type naturally leaves many actions and additional investigations for the future. Some of this further work is obvious, and some is not.
Much work remains to be done before a working prototype is realized. Working from the rough specifications in this study, the implementor must finalize the design of the functional components. The machine must then be physically constructed and debugged. It should be mentioned that several components of this architecture --the Master Processor and bus, Slave Processor Modules and Memory Modules --are identical or very similar in design to current S-100 bus microcomputer products. Much design and debugging time can be avoided if advantage is taken of this similarity by simply modifying current product designs. This similarity is actually an advantage of the architecture, as it should reduce significantly the implementation time.
Operating system software must be designed and tested. It may be worthwhile to write a simulator for the machine that will run on an 8080-based system or even a large system. This would facilitate testing of the software prior to completion of the hardware.
As discussed briefly in Chapter 5, the system arch i tee tur e is wel 1 suited for implementation of virtual memory. This area would be primarily a software task, as the essential hardware is already provided for in the basic 132 architecture.
As described in Chapter 5, memory contention is the limiting factor in the processing power of the system.
There are at least two ways in which memory contention can be reduced. The operating system code, since it is executed by the Master Processor, could reside in memory on the Master Processor's bus, rather than in the Dynamic Memory Banking system. The great preponderance of memory accesses by the Master Processor would therefore not go through the M&PCM, resulting in significantly less contention being experienced by the SPMs. This method would require that the Ma st er Processor have the c apab il i ty to "turn off" the memory access "connection" from its bus to the M&PCM.
The second method involves the use of faster memory.
If Memory Modules capable of responding in one half of a clock period were used, the memory system could perform two memory operations in the time it formerly took to do one, and two processors could therefore access the memory system during each clock period. This would also result in significantly reduced memory contention; in fact, the number of SPMs in the sys tern could easily be increased to four or five, if an increase in the size of MMs to 8K or 16K was permissable.
It should be noted that this method would require a more sophisticated M&PCM design, as well as more expensive memory components. One limitation should be discussed.
As mentioned earlier, an 8080 performing a memory write does not make the data to be written available until the second half of the clock period. A memory write would therefore not compete for a memory operation until the second half of the clock period during which it was initiated.
A potentially valuable extension to the proposed architecture for certain applications would be the inclusion of more than one variety of microprocessor on the SPMs. For example, each SPM could contain an 8080, a Motorola 6800 and a MOS 6502. A process switch would then also have to select the processor to be used during the next quantum of processing time.
As microprocessors are comparatively inexpensive, this would not increase the cost of the system significantly.
Such a system would be capable of running object code for any of the installed microprocessors.
Although not at all addressed in this study, it is conceivable that a more sophisticated Memory