Artemis Operating System

The Wayback Machine - https://web.archive.org/web/20030808152201/http://www.humbug.demon.co.uk:80/dave/misc/artemis/artemis.html

Artemis

The "Original" Artemis Project

Early on in 1995 I'd been doing some performance tweaking work on VSTa and a variety of PC hardware platforms. As a part of that I'd been reading just about every paper I could find on microkernel design and optimisation (there were a very large number of papers to read too!), looking for new ways to improve performance.

At about the same time I also started to develop some code for embedded control systems (386SX based) and felt that VSTa might not offer the best solution for my needs in terms of stability, speed and size (whilst it was much smaller than say Linux for some minimum configuration it was still too large for my liking). Many of the ideas I had learned from VSTa seemed to be ideal as a basis for a new portable embedded OS, and so armed with my new understanding of performance issues I started to design the core of a new system, "Artemis".

Artemis and VSTa had many similarities, however there were significant differences. The most radical of these were in the way IPC, security and device drivers were implemented. VSTa embedded the idea of security and "user IDs" within its kernel, whereas Artemis simply required that a given process could either be a user or supervisor. Only supervisor privileged processes could perform device I/O or handle interrupts. In fact, interrupt handling was also completely different from VSTa (where interrupts were delivered by the kernel to a process as a message on a specific port), and involved running some "trusted" user code at kernel privilege.

IPC was the single greatest difference, and whilst Artemis still used the idea of message passing (and connections were still established between client and server processes), the exchange of actual messages was designed to improve "usual case" performance. VSTa allowed for a single message header and up to 4 buffers of data to be passed per-IPC. The buffers were memory mapped into the recipient process' address space and there were effectively 3 types of system call performed for each transaction between client and server: "send", "receive" and "reply". Artemis detached the act of dealing with the header of a received message from that of dealing with a single buffer (a single buffer, being the most common form, allowed a number of performance optimisations). This simplified the cost of forwarding messages from one server process to another (useful for fast hierarchies of servers) since the cost of any buffer operations could be delayed until necessary. Whilst the new approach taken by Artemis allowed great flexibility it did require more system calls to acheive any given task; the tradeoff was that these system call were much cheaper. Artemis provided 6 types of systems call to VSTa's 3: "request", "receive", "peek", "poke", "relay", "reply" (relay being something that VSTa could only achieve by a server process sending a previously received message to another server). The final difference between VSTa's and Artemis' message handling was that Artemis copied message buffer information between process' address spaces, whereas VSTa used memory mapping - on the x86 architecture this was an expensive option!

Despite these differences, there were a number of similarities:

Virtual memory management
Processes and heavyweight threads
"Port" based inter-process communications (IPC).
"Standard" connection-oriented IPC interface for all servers
User-space device drivers
Fine-grained kernel locking (spinlocks and semaphores)
Per-process event (fault/signal) handling.

Whilst this particular variant of Artemis never reached anything like the same level of maturity (in terms of features and porting of large applications/utilities), some simple IPC tests indicated that it was around 3 times faster than VSTa at null-IPC and signficantly faster still for non-null IPC. The addition of swap handling and more security checks would have reduced the benefits, but I believe it would still be significantly quicker. In fairness to VSTa, some of the optimisations used for Artemis could be implemented, and on non-x86 systems, where the cost of memory mapping operations may well be lower, the gap would close!

Artemis-II

After reaching a level sufficient to support embedded applications successfully it seemed worthwhile to try some slightly different implementation ideas. The main idea of the changes was to try and simplify the kernel whilst continuing to improve performance. Many of the new ideas were influenced by papers related to developments such as the MIT Exokernel and QSSL's QNX (and a series of others):

Processes and threads become domains and contexts
"Domain" based message communication (not connection-oriented any more)
Kernel redesigned to avoid the use of semaphores.
Huge simplification of the VM subsystem.

Whilst these changes were being implemented it became apparent that the kernel was becoming more specialised towards a processor architecture. In fact the amount of processor-specific code was roughly the same as Artemis-I, but the simplifications shifted the balance of processor-independent to processor-specific code.

The downside of the new developments was that the kernel code had ceased to be completely architecture independent, particularly with regard to VM. In order to allow user-level code to take advantage of VM features to allow special paging algorithms to be used some of the characteristics of the system MMU were now exposed to user-level code. In most cases these details would be handled by some standard library, however this would increase the difficulties of porting the kernel to another architecture (the kernel no longer presented an abstract virtual machine).

Artemis-3

Artemis-3 was an attempt to move away from all attempts to present a virtual machine and instead present a very high speed CPU/system-specific kernel. In doing this, messaging became replaced by the ability to send a software interrupt to a given domain, specifying a shared memory page that could be used to communicate between the two domains (this page might contain a list of other shared pages). The objective was to try and provide the fastest ways of communicating between two protected user-level tasks. In addition, other system abstractions were eliminated:

Kernel contexts (threads) replaced by user-level code.
Timer exported in hardware ticks instead of seconds and nanoseconds.
Greater control over FPU handling
CPU traps simply redirected to user-level trap handlers.
Kernel locking simplified to a single lock

The elimination of kernel supported contexts was a simple extension of the non-blocking I/O that was facilitated by the inter-domain software-interrrupt. As timer "tick" interrupts could now be delivered to each domain, thread scheduling could be handled entirely at a user-level.

Whilst most *nix kernels had recently striven to support finer and finer grained locking (to improve SMP capabilities) these locks had required more and more code (thus slowing the overall system). With the minimalist kernel that Artemis-3 represented there were no areas where system calls would need to hold a lock for more than a few tens of cycles, and as such the extra lock complexity would not have yielded better performance (indeed potentially, cache and memory effects could result in an overall performance loss).

Future Developments

Maintenance and enhancement work is ongoing for the original Artemis kernel, but it is unlikely that any new work will be done with either Artemis-II or Artemis-3. Many of the lessons learned with these 2 derivatives will be incorporated into any of my future OS projects however!

Back to the "Miscellaneous Computing" page

Last Updated: 24th September1996 DJH

Apr	AUG	Oct
	08
2002	2003	2004