Modern day computer and CPU architecture provides support to implement security to data from any code access. This is done by segregating trusted and not-trusted code. Trusted code has direct access to IO ports, memory, CPU and special instructions. Non-trusted code cannot directly access system resources. They need to make use of the exposed APIs from the trusted code, who would verify access rights and control concurrent access.
To achieve this, CPU provides four privilege levels, 0 to 3. 0 being the most privileged and 3 being the least allows to run most trusted code at ring 0 and least trusted code at ring 3; all User code would thus run at Ring 3 and Kernel code runs at Ring 0.
The suggested purpose of the rings is to run
- Ring 0 – OS Kernel
- Ring 1 – OS Services
- Ring 2 – Custom Extensions
- Ring 3 – User Applications
Transition from outer ring to inner ring is done through special control-structure called ‘call-gates’. The data structure is in system memory and thus not accessible for modification from outside ring 0. Access to privileged levels without going through gates and insufficient access rights results in general protection faults.
Ring 1 and 2 are almost never used by any OS implementation (exceptions do exist). Code at Ring 0 can access hardware directly and thus Kernel is implemented at this ring. Ring 1 and 2 are not used by any of the modern OS. The idea of running device driver codes at ring 1 would still GPF if privilege instructions are used, Thus not used by most OS.
Another interesting fact is that when a transition is made from less privileged ring to more privileged ring (System Calls), functions need to share data space(mostly on stacks) and upon return from privileged ring to lesser privileged ring, this space can be analyzed by caller, creating a security hole. It is thus that separate stacks are implemented and a stack-switch is mandatory when switching rings. The parameter values are automatically copied between stacks by CPU when stack-switch happens.
So what does a call-gate look like?
- Access rights information
- Segment selector for code segment of called procedure
- Offset in code segment
- Parameter count
When a call is made from lower to higher privilege, Segment selector and stack pointers are saved on new stack segment. Specified number of arguments are copied to new stack. Code Segment and Current IP is saved. New segment and stack pointers will be loaded from Task State Segment. TSS is a special system segment.
Upon return all registers are restored.
The next generation of development in CPU architecture was through Virtualization. Building the capability to run an OS (guest) within another (host). This was first achieved through emulation. Hardware was virtualized and guest OS tweaked to run in Ring 1. Thus each guest OS instruction was not directly executed for hardware direct access but would be run in an emulated environment under the control of the Host OS who would control the access to any resources. This had serious performance hit.
In the current generation of Virtualization, CPU microchips supported special instruction for virtualization. Host and Guest would run side by side on the same hardware, just that a small piece of hypervisor code would be sneaked underneath. This code would provide support for Host OS to control the guests and guests would have direct access to hardware. To support this, a new Ring was introduced – Ring –1. Thus guest and host, both would co-exist and run at Ring 0. No code patching or binary translation would be required.
This concept of sneaking in hypervisor code to virtualize both; guest and host was first explored as a root kit – http://en.wikipedia.org/wiki/Blue_Pill_(software). This approach allows to intercept anything including hardware interrupts (even timer). Given that the guests OS runs at ring 0, the performance impact is negligible.
Although host OS needs to be enlightened of the presence of hypervisor code to be able to control the guest OS.
PS: this is from Intel Architecture, other processor architectures may have different implementations.