system call - Everything2.com

System call (often shortened to syscall) refers to a piece of code in user space making a call into the kernel to perform a service. Most operating system kernels attempt to protect themselves from processes running on them, as well as protect the processes from each other. In addition, in order to manage hardware effectively, the kernel has to mediate all access. The way to do this is to run each process in a virtual machine; the only way to talk to the machine itself, or to other processes, is by going through the kernel. This also allows for the implementation of various security constraints, ranging from simple Unix permissions to complex Mandatory Access Control models.

The details of how to perform a system call depends on the operating system in use. On Linux/x86, it is done by placing the syscall number into one register and as many arguments to the system call as possible into the remaining registers. Some calls, like mmap, have too many parameters to pass entirely in registers to the kernel, in which case some are passed on the stack. At this point, all code is still executing from within the process, not the kernel. Then the process creates an interrupt, at which point control passes to the kernel, which handles the syscall and returns.

System calls are significantly more expensive than standard function calls on most systems. The cost varies depending on the implementation, but 10 to 1000 times slower seems to be the usual range. For one, the kernel has to validate all the parameters involved in the call; bugs in the implementation of system calls are a great way for a malicious process to take over the system, and most of these bugs end up being trusting input from the caller. In addition, it involves a context switch between the process and the kernel, and these are very expensive. Finally, in many cases, a large block of data is being transfered between user space and kernel space (or the other way around). This requires making an actual copy of the data before doing anything with it, and can be quite slow. Thus, it is preferable to avoid system calls unless necessary. One simple benchmark, described by W. Richard Stevens in his seminal work Advanced Programming in the Unix Environment, showed that the difference between making N system calls which write one byte apiece to a file, and a single system call that writes N bytes, is about N times more work.

Some operating systems, such as Unix, have a fairly well defined set of system calls, such as read, write, mmap, fork, and exec. Higher level services (such as IO buffering) are provided by user space libraries that run on top of the basic system call interface. On Linux/x86, there are fewer than 300 system calls, many of which are obsolete or obscure. One problem with system call interfaces is that once created, they are set in stone. For example, Linux has several implementations of the stat system call, each newer one extending the old one to support larger filesystems or more functionality. But the old versions can't be removed, because doing so would break existing programs. In a related way, one could replace the time system call (which returns the current time with granularity of a second) with times or gettimeofday, and both of those could be replaced by the new POSIX realtime function clock_gettime. Of course the stub in libc would still be there, but instead of calling the time system call, it would call clock_gettime. However, because old programs are hardwired to use those particular system calls, removing them would break them, and backwards compatibility is a major focus on most non-toy operating systems.

A recently trendy thing to do is avoid creating new system calls and instead provide device interfaces to operations; programs either open the device file and write commands to it, or perform ioctl commands on the file descriptor. This interface is somewhat more common when there is actually a physical device being manipulated, but also allows for extending and removing functionality later on. For example, no Unix OS has a "get random bytes" system call, but many provide /dev/random, a pseudo-device which provides random bytes when it is opened and read. A program wishing to call our hypothetical system call would only work on systems that provided it, while any program can attempt to open and read /dev/random using the normal Unix file handling system calls. Another advantage is that a /dev/random could even be added later on, after the application has been built and installed, and (assuming it was well coded), it would instantly start using it.

In summary, system calls are good. Without them, you wouldn't be able to do anything interesting (on a computer, anyway; there is always the big blue room). A fun little experiment is to play around with the tools strace or truss, which show you what system calls a program is executing as it runs.

steps to UNIX familiarity	".Sony" and the reason why you should never type it	trap	Easy way to remember which fork to use
They wrote it all in perl but it was mostly system calls	unix signals	strace	Truss
oVDIsis	BRK	Operating system	Hollywood principle
Advanced Programming in the UNIX Environment	Mandatory access control mechanism	futex	How to write portable code
usleep	busy waiting	kernel space	EDESTADDRREQ
oTOSis	OSIS	lstat	Some Nudity Required