Robert Oliver – Linux Hint https://linuxhint.com Exploring and Master Linux Ecosystem Thu, 24 Dec 2020 02:46:11 +0000 en-US hourly 1 https://wordpress.org/?v=5.6.2 List of Linux Syscalls https://linuxhint.com/list_of_linux_syscalls/ Tue, 25 Feb 2020 18:23:19 +0000 https://linuxhint.com/?p=55751

In this guide you’ll find a full list of Linux syscalls along with their definition, parameters, and commonly used flags.

You can combine multiple flags by using a logical AND and passing the result to the argument in question.

Some notes about this guide:

  • Calls that have been long depreciated or removed have been omitted.
  • Items pertaining to outdated or infrequently used architectures (i.e. MIPS, PowerPC) are generally omitted.
  • Structures are defined only once. If a struct is mentinoned and cannot be found in the syscall, please search the document for its definition.

Source materials include man pages, kernel source, and kernel development headers.

Table of Contents

read

Reads from a specified file using a file descriptor. Before using this call, you must first obtain a file descriptor using the opensyscall. Returns bytes read successfully.

ssize_t read(int fd, void *buf, size_t count)
  • fd – file descriptor
  • buf – pointer to the buffer to fill with read contents
  • count – number of bytes to read

write

Writes to a specified file using a file descriptor. Before using this call, you must first obtain a file descriptor using the open syscall. Returns bytes written successfully.

ssize_t write(int fd, const void *buf, size_t count)
  • fd – file descriptor
  • buf – pointer to the buffer to write
  • count – number of bytes to write

open

Opens or creates a file, depending on the flags passed to the call. Returns an integer with the file descriptor.

int open(const char *pathname, int flags, mode_t mode)
  • pathname – pointer to a buffer containing the full path and filename
  • flags – integer with operation flags (see below)
  • mode – (optional) defines the permissions mode if file is to be created

open flags

  • O_APPEND – append to existing file
  • O_ASYNC – use signal-driven IO
  • O_CLOEXEC – use close-on-exec (avoid race conditions and lock contentions)
  • O_CREAT – create file if it doesn’t exist
  • O_DIRECT – bypass cache (slower)
  • O_DIRECTORY – fail if pathname isn’t a directory
  • O_DSYNC – ensure output is sent to hardware and metadata written before return
  • O_EXCL – ensure creation of file
  • O_LARGEFILE – allows use of file sizes represented by off64_t
  • O_NOATIME – do not increment access time upon open
  • O_NOCTTY – if pathname is a terminal device, don’t become controlling terminal
  • O_NOFOLLOW – fail if pathname is symbolic link
  • O_NONBLOCK – if possible, open file with non-blocking IO
  • O_NDELAY – same as O_NONBLOCK
  • O_PATH – open descriptor for obtaining permissions and status of a file but does not allow read/write operations
  • O_SYNC – wait for IO to complete before returning
  • O_TMPFILE – create an unnamed, unreachable (via any other open call) temporary file
  • O_TRUNC – if file exists, ovewrite it (careful!)

close

Close a file descriptor. After successful execution, it can no longer be used to reference the file.

int close(int fd)
  • fd – file descriptor to close

stat

Returns information about a file in a structure named stat.

int stat(const char *path, struct stat *buf);
  • path – pointer to the name of the file
  • buf – pointer to the structure to receive file information

On success, the buf structure is filled with the following data:


struct stat {
    dev_t     st_dev;     /* device ID of device with file */
    ino_t     st_ino;     /* inode */
    mode_t    st_mode;    /* permission mode */
    nlink_t   st_nlink;   /* number of hard links to file */
    uid_t     st_uid;     /* owner user ID */
    gid_t     st_gid;     /* owner group ID */
    dev_t     st_rdev;    /* device ID (only if device file) */
    off_t     st_size;    /* total size (bytes) */
    blksize_t st_blksize; /* blocksize for I/O */
    blkcnt_t  st_blocks;  /* number of 512 byte blocks allocated */
    time_t    st_atime;   /* last access time */
    time_t    st_mtime;   /* last modification time */
    time_t    st_ctime;   /* last status change time */
};

fstat

Works exactly like the stat syscall except a file descriptor (fd) is provided instead of a path.

int fstat(int fd, struct stat *buf);
  • fd – file descriptor
  • buf – pointer to stat buffer (described in stat syscall)

Return data in buf is identical to the stat call.

lstat

Works exactly like the stat syscall, but if the file in question is a symbolic link, information on the link is returned rather than its target.

int lstat(const char *path, struct stat *buf);
  • path – full path to file
  • buf – pointer to stat buffer (described in stat syscall)

Return data in buf is identical to the stat call.

poll

Wait for an event to occur on the specified file descriptor.

int poll(struct pollfd *fds, nfds_t nfds, int timeout);
  • fds – pointer to an array of pollfd structures (described below)
  • nfds – number of pollfd items in the fds array
  • timeout – sets the number of milliseconds the syscall should block (negative forces poll to return immediately)

struct pollfd {
    int   fd;         /* file descriptor */
    short events;     /* events requested for polling */
    short revents;    /* events that occurred during polling */
};

lseek

This syscall repositions the read/write offset of the associated file descriptor. Useful for setting the position to a specific location to read or write starting from that offset.

off_t lseek(int fd, off_t offset, int whence)
  • fd – file descriptor
  • offset – offset to read/write from
  • whence – specifies offset relation and seek behavior

whence flags

  • SEEK_SEToffset is the absolute offset position in the file
  • SEEK_CURoffset is the current offset location plus offset
  • SEEK_ENDoffset is the file size plus offset
  • SEEK_DATA – set offset to next location greater or equal to offset that contains data
  • SEEK_HOLE – set offset to next hole in file great or equal to offset

Returns resulting offset in bytes from the start of the file.

mmap

Maps files or devices into memory.

void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset)
  • addr – location hint for mapping location in memory, otherwise, if NULL, kernel assigns address
  • length – length of the mapping
  • prot – specifies memory protection of the mapping
  • flags – control visibility of mapping with other processes
  • fd – file descriptor
  • offset – file offset

Returns a pointer to the mapped file in memory.

prot flags

  • PROT_EXEC – allows execution of mapped pages
  • PROT_READ – allows reading of mapped pages
  • PROT_WRITE – allows mapped pages to be written
  • PROT_NONE – prevents access of mapped pages

flags

  • MAP_SHARED – allows other processes to use this mapping
  • MAP_SHARED_VALIDATE – same as MAP_SHARED but ensures all flags are valid
  • MAP_PRIVATE – prevents other processes from using this mapping
  • MAP_32BIT – tells the kernel to locate mapping in the first 2 GB of RAM
  • MAP_ANONYMOUS – lets the mapping not be backed by any file (thus ignoring
    fd

    )

  • MAP_FIXED – treats addr argument as an actual address and not a hint
  • MAP_FIXED_NOREPLACE – same as MAP_FIXED but prevents clobbering existing mapped ranges
  • MAP_GROWSDOWN – tells the kernel to expand mapping downward in RAM (useful for stacks)
  • MAP_HUGETB – forces use of huge pages in mapping
  • MAP_HUGE_1MB – use with MAP_HUGETB to set 1 MB pages
  • MAP_HUGE_2MB – use with MAP_HUGETB to set 2 MB pages
  • MAP_LOCKED – maps the region to be locked (similar behavior to mlock)
  • MAP_NONBLOCK – prevents read-ahead for this mapping
  • MAP_NORESERVE – prevents allocation of swap space for this mappining
  • MAP_POPULATE – tells the kernel to populate page tables for this mapping (causing read-ahead)
  • MAP_STACK – tells the kernel to allocate address suitable for use in a stack
  • MAP_UNINITIALIZED – prevents clearing of anonymous pages

mprotect

Sets or adjusts protection on a region of memory.

int mprotect(void *addr, size_t len, int prot)
  • addr – pointer to region in memory
  • prot – protection flag

Returns zero when successful.

prot flags

  • PROT_NONE – prevents access to memory
  • PROT_READ – allows reading of memory
  • PROT_EXEC – allows execution of memory
  • PROT_WRITE – allows memory to be modified
  • PROT_SEM – allows memory to be used in atomic operations
  • PROT_GROWSUP – sets protection mode upward (for arcitectures that have stack that grows upward)
  • PROT_GROWSDOWN – sets protection mode downward (useful for stack memory)

munmap

Unmaps mapped files or devices.

int munmap(void *addr, size_t len)
  • addr – pointer to mapped address
  • len – size of mapping

Returns zero when successful.

brk

Allows for altering the program break that defines end of process’s data segment.

int brk(void *addr)
  • addr – new program break address pointer

Returns zero when successful.

rt_sigaction

Change action taken when process receives a specific signal (except SIGKILL and SIGSTOP).

int rt_sigaction(int signum, const struct sigaction *act, struct sigaction *oldact)
  • signum – signal number
  • act – structure for the new action
  • oldact – structure for the old action

struct sigaction {
    void     (*sa_handler)(int);
    void     (*sa_sigaction)(int, siginfo_t *, void *);
    sigset_t   sa_mask;
    int        sa_flags;
    void     (*sa_restorer)(void);
};

siginfo_t {
    int      si_signo;      /* signal number */
    int      si_errno;      /* errno value */
    int      si_code;       /* signal code */
    int      si_trapno;     /* trap that caused hardware signal (unusued on most architectures) */
    pid_t    si_pid;        /* sending PID */
    uid_t    si_uid;        /* real UID of sending program */
    int      si_status;     /* exit value or signal */
    clock_t  si_utime;      /* user time consumed */
    clock_t  si_stime;      /* system time consumed */
    sigval_t si_value;      /* signal value */
    int      si_int;        /* POSIX.1b signal */
    void    *si_ptr;        /* POSIX.1b signal */
    int      si_overrun;    /* count of timer overrun */
    int      si_timerid;    /* timer ID */
    void    *si_addr;       /* memory location that generated fault */
    long     si_band;       /* band event */
    int      si_fd;         /* file descriptor */
    short    si_addr_lsb;   /* LSB of address */
    void    *si_lower;      /* lower bound when address vioation occured */
    void    *si_upper;      /* upper bound when address violation occured */
    int      si_pkey;       /* protection key on PTE causing faut */
    void    *si_call_addr;  /* address of system call instruction */
    int      si_syscall;    /* number of attempted syscall */
    unsigned int si_arch;   /* arch of attempted syscall */
}

rt_sigprocmask

Retreive and/or set the signal mask of the thread.

int sigprocmask(int how, const sigset_t *set, sigset_t *oldset)
  • how – flag to determine call behavior
  • set – new signal mask (NULL to leave unchanged)
  • oldset – previous signal mask

Returns zero upon success.

how flags

  • SIG_BLOCK – set mask to block according to set
  • SIG_UNBLOCK – set mask to allow according to set
  • SIG_SETMASK – set mask to set

rt_sigreturn

Return from signal handler and clean the stack frame.

int sigreturn(unsigned long __unused)

ioctl

Set parameters of device files.

int ioctl(int d, int request, ...)
  • d – open file descriptor the device file
  • request – request code
  • ... – untyped pointer

Returns zero upon success in most cases.

pread64

Read from file or device starting at a specific offset.

ssize_t pread64(int fd, void *buf, size_t count, off_t offset)
  • fd – file descriptor
  • buf – pointer to read buffer
  • count – bytes to read
  • offset – offset to read from

Returns bytes read.

pwrite64

Write to file or device starting at a specific offset.

ssize_t pwrite64(int fd, void *buf, size_t count, off_t offset)
  • fd – file descriptor
  • buf – pointer to buffer
  • count – bytes to write
  • offset – offset to start writing

Returns bytes written.

readv

Read from file or device into multiple buffers.

ssize_t readv(int fd, const struct iovec *iov, int iovcnt)
  • fd – file descriptor
  • iov – pointer to iovec structue
  • iovcnt – number of buffers (described by iovec)
struct iovec {
    void  *iov_base;    /* Starting address */
    size_t iov_len;     /* Number of bytes to transfer */
};

Returns bytes read.

writev

Write to file or device from multiple buffers.

ssize_t writev(int fd, const struct iovec *iov, int iovcnt)
  • fd – file descriptor
  • iov – pointer to iovec structue
  • iovcnt – number of buffers (described by iovec)
struct iovec {
    void  *iov_base;    /* Starting address */
    size_t iov_len;     /* Number of bytes to transfer */
};

Returns bytes written.

access

Check permissions of current user for a specified file or device.

int access(const char *pathname, int mode)
  • pathname – file or device
  • mode – permissions check to perform

Returns zero on success.

pipe

Create a pipe.

int pipe(int pipefd[2])
  • pipefd – array of file descriptors with two ends of the pipe

Returns zero on success.

select

Wait for file descriptors to become ready for I/O.

int select(int nfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds,
struct timeval *timeout)
  • nfds – number of file desctipros to monitor (add 1)
  • readfds – fixed buffer with list of file descriptors to wait for read access
  • writefds – fixed buffer with list of file descriptors to wait for write access
  • exceptfds – fixed buffer with list of file descriptors to wait for exceptional conditions
  • timeout – timeval structure with time to wait before returning
typedef struct fd_set {
    u_int  fd_count;
    SOCKET fd_array[FD_SETSIZE];
} 
struct timeval {
	   long    tv_sec;         /* seconds */
	   long    tv_usec;        /* microseconds */
};

Returns number of file descriptors, or zero if timeout occurs.

sched_yield

Yield CPU time back to the kernel or other processes.

int sched_yield(void)

Returns zero on success.

mremap

Shrink or enlarge a memory region, possibly moving it in the process.

void *mremap(void *old_address, size_t old_size, size_t new_size, int flags, ... /* void
*new_address */
)
  • old_address – pointer to the old address to remap
  • old_size – size of old memory region
  • new_size – size of new memory region
  • flags – define additional behavior

flags

  • MREMAP_MAYMOVE – allow the kernel to move the region if there isn’t enough room (default)
  • MREMAP_FIXED – move the mapping (must also specify MREMAP_MAYMOVE)

msync

Syncronize a memory-mapped file previously mapped with mmap.

int msync(void *addr, size_t length, int flags)
  • addr – address of memoy mapped file
  • length – length of memory mapping
  • flags – define additional behavior

flags

  • MS_ASYNC – schedule sync but return immediately
  • MS_SYNC – wait until sync is complete
  • MS_INVALIDATE – invalidate other mappings of same file

Returns zero on success.

mincore

Check if pages are in memory.

int mincore(void *addr, size_t length, unsigned char *vec)
  • addr – address of memory to check
  • length – length of memory segment
  • vec – pointer to array sized to (length+PAGE_SIZE-1) / PAGE_SIZE that is clear if page is in memory

Returns zero, but vec must be referenced for presence of pages in memory.

madvise

Advise kernel on how to use a given memory segment.

int madvise(void *addr, size_t length, int advice)
  • addr – address of memory
  • length – length of segment
  • advice – advice flag

advice

  • MADV_NORMAL – no advice (default)
  • MADV_RANDOM – pages can be in random order (read-ahead performance may be hampered)
  • MADV_SEQUENTIAL – pages should be in sequential order
  • MADV_WILLNEED – will need pages soon (hinting to kernel to schedule read-ahead)
  • MADV_DONTNEED – do not need anytime soon (discourages read-ahead)

shmget

Allocate System V shared memory segment.

int shmget(key_t key, size_t size, int shmflg)
  • key – an identifier for the memory segment
  • size – length of memory segment
  • shmflg – behavior modifier flag

shmflg

  • IPC_CREAT – create a new segment
  • IPC_EXCL – ensure creation happens, else call will fail
  • SHM_HUGETLB – use huge pages when allocating segment
  • SHM_HUGE_1GB – use 1 GB hugetlb size
  • SHM_HUGE_2M – use 2 MB hugetlb size
  • SHM_NORESERVE – do not reserve swap space for this segment

shmat

Attach shared memory segment to calling process’s memory space.

void *shmat(int shmid, const void *shmaddr, int shmflg)
  • shmid – shared memory segment id
  • shmaddr – shared memory segment address
  • shmflg – define additional behavior

shmflg

  • SHM_RDONLY – attach segment as read-only
  • SHM_REMAP – replace exiting mapping

shmctl

Get or set control details on shared memory segment.

int shmctl(int shmid, int cmd, struct shmid_ds *buf)
  • shmid – shared memory segment id
  • cmd – command flag
  • bufshmid_ds structure buffer for return or set parameters
struct shmid_ds {
    struct ipc_perm shm_perm;    /* Ownership and permissions */
    size_t          shm_segsz;   /* Size of shared segment (bytes) */
    time_t          shm_atime;   /* Last attach time */
    time_t          shm_dtime;   /* Last detach time */
    time_t          shm_ctime;   /* Last change time */
    pid_t           shm_cpid;    /* PID of shared segment creator */
    pid_t           shm_lpid;    /* PID of last shmat(2)/shmdt(2) syscall */
    shmatt_t        shm_nattch;  /* Number of current attaches */
    ...
};
struct ipc_perm {
    key_t          __key;    /* Key providedto shmget */
    uid_t          uid;      /* Effective UID of owner */
    gid_t          gid;      /* Effective GID of owner */
    uid_t          cuid;     /* Effective UID of creator */
    gid_t          cgid;     /* Effective GID of creator */
    unsigned short mode;     /* Permissions and SHM_DEST + SHM_LOCKED flags */
    unsigned short __seq;    /* Sequence */
 };

Successful IPC_INFO or SHM_INFO syscalls return index of highest used entry in the kernel’s array of shared memory segments. Successful SHM_STAT syscalls return id of memory segment provided in shmid. Everything else returns zero upon success.

cmd

  • IPC_STAT – get shared memory segment info and place in buffer
  • IPC_SET – set shared memory segment parameters defined in buffer
  • IPC_RMID – mark shared memory segment to be removed

dup

Duplicate file desciptor.

int dup(int oldfd)
  • oldfd – file descriptor to copy

Returns new file descriptor.

dup2

Same as dup except dup2 uses file descriptor number specified in newfd.

int dup2(int oldfd, int newfd)
  • oldfd – file descriptor to copy
  • newfd – new file descriptor

pause

Wait for a signal, then return.

int pause(void)

Returns -1 when signal received.

nanosleep

Same as sleep but with time specified in nanoseconds.

int nanosleep(const struct timespec *req, struct timespec *rem)
  • req – pointer to syscall argument structure
  • rem – pointer to structure with remaining time if interrupted by signal
struct timespec {
    time_t tv_sec;        /* time in seconds */
    long   tv_nsec;       /* time in nanoseconds */
};

Returns zero upon successful sleep, otherwise time elapsed is copied into rem structure.

getitimer

Get value from an interval timer.

int getitimer(int which, struct itimerval *curr_value)
  • which – which kind of timer
  • curr_value – pointer to itimerval structure with argument details
struct itimerval {
    struct timeval it_interval; /* Interval for periodic timer */
    struct timeval it_value;    /* Time until next expiration */
 };

Returns zero on success.

which timers

  • ITIMER_REAL – timer uses real time
  • ITIMER_VIRTUAL – timer uses user-mode CPU execution time
  • ITIMER_PROF – timer uses both user and system CPU execution time

alarm

Set an alarm for delivery of signal SIGALRM.

unsigned int alarm(unsigned int seconds)
  • seconds – send SIGALRM in x seconds

Returns number of seconds remaining until a previously set alarm will trigger, or zero if no alarm was previously set.

setitimer

Create or destroy alarm specified by which.

int setitimer(int which, const struct itimerval *new_value, struct itimerval *old_value)
  • which – which kind of timer
  • new_value – pointer to itimerval structure with new timer details
  • old_value – if not null, pointer to itimerval structure with previous timer details
struct itimerval {
    struct timeval it_interval; /* Interval for periodic timer */
    struct timeval it_value;    /* Time until next expiration */
 };

Returns zero on success.

getpid

Get PID of current process.

pid_t getpid(void)

Returns the PID of the process.

sendfile

Transfer data between two files or devices.

ssize_t sendfile(int out_fd, int in_fd, off_t *offset, size_t count)
  • out_fd – file descriptor for destination
  • in_fd – file descriptor for source
  • offset – position to begin read
  • count – bytes to copy

Returns bytes written.

socket

Create an endpoint for network communication.

int socket(int domain, int type, int protocol)
  • domain – flag specifying type of socket
  • type – flag specifying socket specifics
  • protocol – flag specifying protocol for communication

domain flags

  • AF_UNIX – Local communication
  • AF_LOCAL – Same as AF_UNIX
  • AF_INET – IPv4 Internet protocol
  • AF_AX25 – Amateur radio AX.25 protocol
  • AF_IPXIPX – Novell protocols
  • AF_APPLETALK – AppleTalk
  • AF_X25 – ITU-T X.25 / ISO-8208 protocol
  • AF_INET6 – IPv6 Internet protocol
  • AF_DECnet – DECet protocol sockets
  • AF_KEYKey – IPsec management protocol
  • AF_NETLINK – Kernel user interface device
  • AF_PACKET – Low-level packet interface
  • AF_RDS – Reliable Datagram Sockets (RDS)
  • AF_PPPOX – Generic PPP transport layer for L2 tunnels (L2TP, PPPoE, etc.)
  • AF_LLC – Logical link control (IEEE 802.2 LLC)
  • AF_IB – InfiniBand native addressing
  • AF_MPLS – Multiprotocol Label Switching
  • AF_CAN – Controller Area Network automotive bus protocol
  • AF_TIPC – TIPC (cluster domain sockets)
  • AF_BLUETOOTH – Bluetooth low-level socket protocol
  • AF_ALG – Interface to kernel cryptography API
  • AF_VSOCK – VSOCK protocol for hypervisor-guest communication (VMWare, etc.)
  • AF_KCMKCM – Kernel connection multiplexor interface
  • AF_XDPXDP – Express data path interface

type flags

  • SOCK_STREAM – sequenced, reliable byte streams
  • SOCK_DGRAM – datagrams (connectionless and unreliable messages, fixed maximum length)
  • SOCK_SEQPACKET – sequenced, reliable transmission for datagrams
  • SOCK_RAW– raw network protocol access
  • SOCK_RDM – reliable datagram layer with possible out-of-order transmission
  • SOCK_NONBLOCK – socket is non-blocking (avoid extra calls to fcntl)
  • SOCK_CLOEXEC – set close-on-exec flag

Returns file descriptor on success.

connect

Connect to a socket.

int connect(int sockfd, const struct sockaddr *addr, socklen_t addrlen)
  • sockfd – socket file descriptor
  • addr – pointer to socket address
  • addrlen – size of address

Returns zero on success.

accept

Accept connection on socket.

int accept(int sockfd, struct sockaddr *addr, socklen_t *addrlen)
  • sockfd – socket file descriptor
  • addr – pointer to socket address
  • addrlen – size of address

Returns file descriptor of accepted socket on success.

sendto

Send message on a socket.

send(int sockfd, const void *buf, size_t len, int flags)
  • sockfd – socket file descriptor
  • buf – buffer with message to send
  • len – length of message
  • flags – additional parameters

flags

  • MSG_CONFIRM – informs link layer a reply has been received
  • MSG_DONTROUTE – do not use gateway in transmission of packet
  • MSG_DONTWAIT – perform non-blocking operation
  • MSG_EOR – end of record
  • MSG_MORE – more data to send
  • MSG_NOSIGNAL – do not generate SIGPIPE signal if peer closed connection
  • MSG_OOB – sends out-of-band data on supported sockets and protocols

recvfrom

Receive message from socket.

ssize_t recvfrom(int sockfd, void *buf, size_t len, int flags, struct sockaddr
 *src_addr, socklen_t *addrlen)
  • sockfd – socket file descriptor
  • buf – buffer to receive message
  • size – size of buffer
  • flags – additional parameters
  • src_addr – pointer to source address
  • addrlen – length of source address.

flags

  • MSG_CMSG_CLOEXEC – set close-on-exec flag for socket file descriptor
  • MSG_DONTWAIT – perform operation in a non-blocking manner
  • MSG_ERRQUEUE – queued errors should be received in socket error queue

Returns bytes received successfully.

sendmsg

Similar to the sendto syscall but allows sending additional data via the msg argument.

ssize_t sendmsg(int sockfd, const struct msghdr *msg, int flags)
  • sockfd – socket file descriptor
  • msg – pointer to msghdr structure with message to send (with headers)
  • flags– same as sendto syscall
struct msghdr {
    void         *msg_name;       /* optional address */
    socklen_t     msg_namelen;    /* address size */
    struct iovec *msg_iov;        /* scatter/gather array */
    size_t        msg_iovlen;     /* number of array elements in msg_iov */
    void         *msg_control;    /* ancillary data */
    size_t        msg_controllen; /* ancillary data length */
    int           msg_flags;      /* flags on received message */
};

recvmsg

Receive message from socket.

ssize_t recvmsg(int sockfd, struct msghdr *msg, int flags)
  • sockfd – socket file descriptor
  • msg – pointer to msghdr structure (defined in sendmsg above) to receive
  • flags – define additional behavior (see sendto syscall)

shutdown

Shut down full-duplex connection of a socket.

int shutdown(int sockfd, int how)
  • sockfd – socket file descriptor
  • how – flags definining additional behavior

Returns zero on success.

how

  • SHUT_RD – prevent further receptions
  • SHUT_WR – prevent further transmissions
  • SHUT_RDWR – prevent further reception and transmission

bind

Bind name to a socket.

int bind(int sockfd, const struct sockaddr *addr, socklen_t addrlen)
  • sockfd – socket file descriptor
  • addr – pointer to sockaddr structure with socket address
  • addrlen – length of address
struct sockaddr {
    sa_family_t sa_family;
    char        sa_data[14];
}

Returns zero on success.

listen

Listen on a socket for connections.

int listen(int sockfd, int backlog)
  • sockfd – socket file descriptor
  • backlog – maximum length for pending connection queue

Returns zero on success.

getsockname

Get socket name.

int getsockname(int sockfd, struct sockaddr *addr, socklen_t *addrlen)
  • sockfd – socket file descriptor
  • addr – pointer to buffer where socket name will be returned
  • addrlen – length of buffer

Returns zero on success.

getpeername

Get the name of the connected peer socket.

int getpeername(int sockfd, struct sockaddr *addr, socklen_t *addrlen)
  • sockfd – socket file descriptor
  • addr – pointer to buffer where peer name will be returned
  • addrlen – length of buffer

Returns zero on success.

socketpair

Create pair of sockets already connected.

int socketpair(int domain, int type, int protocol, int sv[2])

Arguments are identical to socket syscall except fourth argument (sv) is an integer array that is filled with the two socket descriptors.

Returns zero on success.

setsockopt

Set options on a socket.

int setsockopt(int sockfd, int level, int optname, const void *optval, socklen_t optlen)
  • sockfd – socket file descriptor
  • optname – option to set
  • optval – pointer to the value of the option
  • optlen – length of option

Returns zero on success.

getsockopt

Get current options of a socket.

int getsockopt(int sockfd, int level, int optname, void *optval, socklen_t *optlen)
  • sockfd – socket file descriptor
  • optname – option to get
  • optval – pointer to receive option value
  • optlen – length of option

Returns zero on success.

clone

Create child process.

int clone(int (*fn)(void *), void *stack, int flags, void *arg, ...
 /* pid_t *parent_tid, void *tls, pid_t *child_tid */)
  • fd – pointer to initial execution address
  • stack – pointer to child process’s stack
  • flag – define behavior of clone syscall
  • arg – pointer to arguments for child process

flags

  • CLONE_CHILD_CLEARTID – clear id of child thread at location referenced by child_tld
  • CLONE_CHILD_SETTID – store id of child thread at location referenced by child_tid
  • CLONE_FILES – parent and child process share same file descriptors
  • CLONE_FS – parent and child process share same filesystem information
  • CLONE_IO – child process shares I/O context with parent
  • CLONE_NEWCGROUP – child is created in new cgroup namespace
  • CLONE_NEWIPC – child process created in new IPC namespace
  • CLONE_NEWNET – create child in new network namespace
  • CLONE_NEWNS – create child in new mount namespace
  • CLONE_NEWPID – create child in new PID namespace
  • CLONE_NEWUSER – create child in new user namespace
  • CLONE_NEWUTS – create child process in new UTS namespace
  • CLONE_PARENT – child is clone of the calling process
  • CLONE_PARENT_SETTID – store id of child thread at location referenced by parent_tid
  • CLONE_PID – child process is created with same PID as parent
  • CLONE_PIDFD – PID file descriptor of child process is placed in parent’s memory
  • CLONE_PTRACE – if parent process is traced, trace child as well
  • CLONE_SETTLS – thread local storage (TLS) descriptor is set to TLS
  • CLONE_SIGHAND – parent and child share signal handlers
  • CLONE_SYSVSEM – child and parent share same System V semaphore adjustment values
  • CLONE_THREAD – child is created in same thread group as parent
  • CLONE_UNTRACED – if parent is traced, child is not traced
  • CLONE_VFORK – parent process is suspended until child calls execve or _exit
  • CLONE_VM – parent and child run in same memory space

fork

Create child process.

pid_t fork(void)

Returns PID of child process.

vfork

Create child process without copying page tables of parent process.

pid_t vfork(void)

Returns PID of child process.

execve

Execute a program.

int execve(const char *pathname, char *const argv[], char *const envp[])
  • pathname – path to program to run
  • argv – pointer to array of arguments for program
  • envp – pointer to array of strings (in key=value format) for the environment

Does not return on success, returns -1 on error.

exit

Terminate calling process.

void _exit(int status)
  • status – status code to return to parent

Does not return a value.

wait4

Wait for a process to change state.

pid_t wait4(pid_t pid, int *wstatus, int options, struct rusage *rusage)
  • pid – PID of process
  • wstatus – status to wait for
  • options – options flags for call
  • rusage – pointer to structure with usage about child process filled on return

Returns PID of terminated child.

options

  • WNOHANG – return if no child exited
  • WUNTRACED – return if child stops (but not traced with ptrace)
  • WCONTINUED – return if stopped child resumed with SIGCONT
  • WIFEXITED – return if child terminates normally
  • WEXITSTATUS – return exit status of child
  • WIFSIGNALED – return true if child was terminated with signal
  • WTERMSIG – return number of signal that caused child to terminate
  • WCOREDUMP – return true if child core dumped
  • IFSTOPPED – return true if child was stopped by signal
  • WSTOPSIG – returns signal number that caused child to stop
  • WIFCONTINUED – return true if child was resumed with SIGCONT

kill

Send a signal to process.

int kill(pid_t pid, int sig)
  • pid – PID of process
  • sig – number of signal to send to process

Return zero on success.

getppid

Get PID of parent’s calling process.

pid_t getppid(void)

Returns the PID of parent of calling process.

uname

Get information about the kernel.

int uname(struct utsname *buf)
  • buf – pointer to utsname structure to receive information

Return zero on success.

struct utsname {
    char sysname[];    /* OS name (i.e. "Linux") */
    char nodename[];   /* node name */
    char release[];    /* OS release (i.e. "4.1.0") */
    char version[];    /* OS version */
    char machine[];    /* hardware identifer */
    #ifdef _GNU_SOURCE
        char domainname[]; /* NIS or YP domain name */
    #endif
};

semget

Get System V semaphore set identifier.

int semget(key_t key, int nsems, int semflg)
  • key – key of identifier to retreive
  • nsems – number of semaphores per set
  • semflg – semaphore flags

Returns value of semaphore set identifier.

semop

Perform operation on specified semampore(s).

int semop(int semid, struct sembuf *sops, size_t nsops)
  • semid – id of semaphore
  • sops – pointer to sembuf structure for operations
  • nsops – number of operations
struct sembuf {
    ushort  sem_num;        /* semaphore index in array */
    short   sem_op;         /* semaphore operation */
    short   sem_flg;        /* flags for operation */
};

Return zero on success.

semctl

Perform control operation on semaphore.

int semctl(int semid, int semnum, int cmd, ...)
  • semid – semaphore set id
  • semnum – number of semaphor in set
  • cmd – operation to perform

Optional fourth argument is a semun structure:

union semun {
    int              val;    /* SETVAL value */
    struct semid_ds *buf;    /* buffer for IPC_STAT, IPC_SET */
    unsigned short  *array;  /* array for GETALL, SETALL */
    struct seminfo  *__buf;  /* buffer for IPC_INFO */
};

Returns non-negative value corresponding to cmd flag on success, or -1 on error.

cmd

  • IPC_STAT – copy information from kernel associated with semid into semid_ds referenced by arg.buf
  • IPC_SET – write values of semid_ds structure referenced by arg.buf
  • IPC_RMID – remove semaphore set
  • IPC_INFO – get information about system semaphore limits info seminfo structure
  • SEM_INFO – return seminfo structure with same info as IPC_INFO except some fields are returned with info about resources consumed by semaphores
  • SEM_STAT – return semid_ds structure like IPC_STAT but semid argument is index into kernel’s semaphore array
  • SEM_STAT_ANY – return seminfo structure with same info as SEM_STAT but sem_perm.mode isn’t checked for read permission
  • GETALL – return semval for all semaphores in set specified by semid into arg.array
  • GETNCNT – return value of semncnt for the semaphore of the set indexed by semnum
  • GETPID – return value of sempid for the semaphore of the set indexed by semnum
  • GETVAL – return value of semval for the semaphore of the set indexed by semnum
  • GETZCNT – return value of semzcnt for the semaphore of the set indexed by semnum
  • SETALL – set semval for all the semaphores set using arg.array
  • SETVAL – set value of semval to arg.val for the semaphore of the set indexed by semnum

shmdt

Detach shared memory segment referenced by shmaddr.

int shmdt(const void *shmaddr)
  • shmaddr – address of shared memory segment to detach

Return zero on success.

msgget

Get System V message queue identifier.

int msgget(key_t key, int msgflg)
  • key – message queue identifier
  • msgflg – if IPC_CREAT and IPC_EXCL are specified and queue exists for key, then msgget fails with return error set to EEXIST

Return message queue identifier.

msgsnd

Send a message to a System V message queue.

int msgsnd(int msqid, const void *msgp, size_t msgsz, int msgflg)
  • msqid – message queue id
  • msgp – pointer to msgbuf structure
  • msgsz – size of msgbuf structure
  • msgflg – flags defining specific behavior
struct msgbuf {
    long mtype;       /* msg type, must be greater than zero */
    char mtext[1];    /* msg text */
};

Returns zero on success or otherwise modified by msgflg.

msgflg

  • IPC_NOWAIT – return immediately if no message of requested type in queue
  • MSG_EXCEPT – use with msgtyp > 0 to read first message in queue with type different from msgtyp
  • MSG_NOERROR – truncate message text if longer than msgsz bytes

msgrcv

Receive message from a system V message queue.

ssize_t msgrcv(int msqid, void *msgp, size_t msgsz, long msgtyp, int msgflg)
  • msqid – message queue id
  • msgp – pointer to msgbuf structure
  • msgsz – size of msgbuf structure
  • msgtyp – read first msg if 0, read first msg of msgtyp if > 0, or if negative, read first msg in queue with type less or equal to absolute value of msgtyp
  • msgflg – flags defining specific behavior
struct msgbuf {
    long mtype;       /* msg type, must be greater than zero */
    char mtext[1];    /* msg text */
};

Returns zero on success or otherwise modified by msgflg.

msgctl

System V message contol.

int msgctl(int msqid, int cmd, struct msqid_ds *buf)
  • msqid – message queue id
  • cmd – command to execute
  • buf – pointer to buffer styled in msqid_ds
struct msqid_ds {
    struct ipc_perm msg_perm;     /* ownership/permissions */
    time_t          msg_stime;    /* last msgsnd(2) time */
    time_t          msg_rtime;    /* last msgrcv(2) time */
    time_t          msg_ctime;    /* last change time */
    unsigned long   __msg_cbytes; /* bytes in queue */
    msgqnum_t       msg_qnum;     /* messages in queue */
    msglen_t        msg_qbytes;   /* max bytes allowed in queue
    pid_t           msg_lspid;    /* PID of last msgsnd(2) */
    pid_t           msg_lrpid;    /* PID of last msgrcv(2) */
};
struct msginfo {
    int msgpool; /* kb of buffer pool used */
    int msgmap;  /* max # of entries in message map */
    int msgmax;  /* max # of bytes per single message */
    int msgmnb;  /* max # of bytes in the queue */
    int msgmni;  /* max # of message queues */
    int msgssz;  /* message segment size */
    int msgtql;  /* max # of messages on queues */
    unsigned short int msgseg; /* max # of segments unused in kernel */
};

Returns zero on successor modified return value based on cmd.

cmd

  • IPC_STAT – copy data structure from kernel by msqid into msqid_ds structure referenced by buf
  • IPC_SET – update msqid_ds structure referenced by buf to kernel, updating its msg_ctime
  • IPC_RMID – remove message queue
  • IPC_INFO – returns information about message queue limits into msginfo structure referenced by buf
  • MSG_INFO – same as IPC_INFO except msginfo structure is filled with usage vs. max usage statistics
  • MSG_STAT – same as IPC_STAT except msqid is a pointer into kernel’s internal array

fcntl

Manipulate a file descriptor.

int fcntl(int fd, int cmd, ... /* arg */ )
  • fd – file descriptor
  • cmd – cmd flag
  • /* arg */ – additional parameters based on cmd

Return value varies based on cmd flags.

cmd

Parameters in () is the optional /* arg */ with specified type.

  • F_DUPFD – find lowest numbered file descriptor greater or equal to (int) and duplicate it, returning new file descriptor
  • F_DUPFD_CLOEXEC – same as F_DUPFD but sets close-on-exec flag
  • F_GETFD – return file descriptor flags
  • F_SETFD – set file descriptor flags based on (int)
  • F_GETFL – get file access mode
  • F_SETFL – set file access mode based on (int)
  • F_GETLK – get record locks on file (pointer to struct flock)
  • F_SETLK – set lock on file (pointer to struct flock)
  • F_SETLKW – set lock on file with wait (pointer to struct flock)
  • F_GETOWN – return process id receiving SIGIO and SIGURG
  • F_SETOWN – set process id to receive SIGIO and SIGURG (int)
  • F_GETOWN_EX – return file descriptor owner settings (struct f_owner_ex *)
  • F_SETOWN_EX – direct IO signals on file descriptor (struct f_owner_ex *)
  • F_GETSIG – return signal sent when IO is available
  • F_SETSIG – set signal sent when IO is available (int)
  • F_SETLEASE – obtain lease on file descriptor (int), where arg is F_RDLCK, F_WRLCK, and F_UNLCK
  • F_GETLEASE – get current lease on file descriptor (F_RDLCK, F_WRLCK, or F_UNLCK are returned)
  • F_NOTIFY – notify when dir referenced by file descriptor changes (int) (DN_ACCESS, DN_MODIFY, DN_CREATE, DN_DELETE, DN_RENAME, DN_ATTRIB are returned)
  • F_SETPIPE_SZ – change size of pipe referenced by file descriptor to (int) bytes
  • F_GETPIPE_SZ – get size of pipe referenced by file descriptor

flock

struct flock {
    ...
    short l_type;    /* lock type: F_RDLCK, F_WRLCK, or F_UNLCK */
    short l_whence;  /* interpret l_start with SEEK_SET, SEEK_CUR, or SEEK_END */
    off_t l_start;   /* offset for lock */
    off_t l_len;     /* bytes to lock */
    pid_t l_pid;     /* PID of blocking process (F_GETLK only) */
    ...
};

f_owner_ex

struct f_owner_ex {
    int   type;
    pid_t pid;
};

flock

Apply or remove advisory lock on open file

int flock(int fd, int operation)
  • fd – file descriptor
  • operation – operaton flag

Returns zero on success.

operation

  • LOCK_SH – place shared lock
  • LOCK_EX – place exclusive lock
  • LOCK_UN – remove existing lock

fsync

Sync file’s data and metadata in memory to disk, flushing all write buffers and completes pending I/O.

int fsync(int fd)
  • fd – file descriptor

Returns zero on success.

fdatasync

Sync file’s data (but not metadata, unless needed) to disk.

int fdatasync(int fd)
  • fd – file descriptor

Returns zero on success.

truncate

Truncate file to a certain length.

int truncate(const char *path, off_t length)
  • path – pointer to path of file
  • length – length to truncate to

Returns zero on success.

ftruncate

Truncate file descriptor to a certain length.

int ftruncate(int fd, off_t length)
  • fd – file descriptor
  • length – length to truncate to

Returns zero on success.

getdents

Get directory entries from a specified file descriptor.

int getdents(unsigned int fd, struct linux_dirent *dirp, unsigned int count)
  • fd – file descriptor of directory
  • dirp – pointer to linux_dirent structure to receive return values
  • count – size of dirp buffer

Returns bytes read on success.

struct linux_dirent {
    unsigned long  d_ino;     /* number of inode */
    unsigned long  d_off;     /* offset to next linux_dirent */
    unsigned short d_reclen;  /* length of this linux_dirent */
    char           d_name[];  /* filename (null terminated) */
    char           pad;       /* padding byte */
    char           d_type;    /* type of file (see types below) */
}

types

  • DT_BLK – block device
  • DT_CHR – char device
  • DT_DIR – directory
  • DT_FIFO – FIFO named pipe
  • DT_LNK – symlink
  • DT_REG – regular file
  • DT_SOCK – UNIX socket
  • DT_UNKNOWN – unknown

getcwd

Get current working directory

char *getcwd(char *buf, size_t size)
  • buf – pointer to buffer to receive path
  • size – size of buf

Returns pointer to string containing current working directory.

chdir

Change the current directory.

int chdir(const char *path)
  • path – pointer to string with name of path

Returns zero on success.

fchdir

Change to the current directory specified by supplied file descriptor.

int fchdir(int fd)
  • fd – file descriptor

Returns zero on success.

rename

Rename or move a file.

int rename(const char *oldpath, const char *newpath)
  • oldpath – pointer to string with old path/name
  • newpath – pointer to string with new path/name

Returns zero on success.

mkdir

Make a directory.

int mkdir(const char *pathname, mode_t mode)
  • pathname – pointer to string with directory name
  • mode – file system permissions mode

Returns zero on success.

rmdir

Remove a directory.

int rmdir(const char *pathname)
  • pathname – pointer to string with directory name

Returns zero on success.

creat

Create a file or device.

int creat(const char *pathname, mode_t mode)
  • pathname – pointer to string with file or device name
  • mode – file system permissions mode

Returns a file descriptor on success.

Creates a hard link for a file.

int link(const char *oldpath, const char *newpath)
  • oldpath – pointer to string with old filename
  • newpath – pointer to string with new filename

Returns zero on success.

Remove a file.

int unlink(const char *pathname)
  • pathname – pointer to string with path name

Return zero on success.

Create a symlink.

int symlink(const char *oldpath, const char *newpath)
  • oldpath – pointer to string with old path name
  • newpath – pointer to string with new path name

Return zero on success.

Return name of a symbolic link.

ssize_t readlink(const char *path, char *buf, size_t bufsiz)
  • path – pointer to string with symlink name
  • buf – pointer to buffer with result
  • bufsiz – size of buffer for result

Returns number of bytes placed in buf.

chmod

Set permission on a file or device.

int chmod(const char *path, mode_t mode)
  • path – pointer to string with name of file or device
  • mode – new permissions mode

Returns zero on success.

fchmod

Same as chmod but sets permissions on file or device referenced by file descriptor.

int fchmod(int fd, mode_t mode)
  • fd – file descriptor
  • mode – new permissions mode

Returns zero on success.

chown

Change owner of file or device.

int chown(const char *path, uid_t owner, gid_t group)
  • path – pointer to string with name of file or device
  • owner – new owner of file or device
  • group – new group of file or device

Returns zero on success.

fchown

Same as chown but sets owner and group on a file or device referenced by file descriptor.

int fchown(int fd, uid_t owner, gid_t group)
  • fd – file descriptor
  • owner – new owner
  • group – new group

Returns zero on success.

lchown

Same as chown but doesn’t reference symlinks.

int lchown(const char *path, uid_t owner, gid_t group)
  • path – pointer to string with name of file or device
  • owner – new owner
  • group – new group

Returns zero on success.

umask

Sets the mask used to create new files.

mode_t umask(mode_t mask)
  • mask – mask for new files

System call will always succeed and returns previous mask.

gettimeofday

int gettimeofday(struct timeval *tv, struct timezone *tz)
  • tv – pointer to timeval structure to retreive time
  • tz – pointer to timezone structure to receive time zone
struct timeval {
    time_t      tv_sec;     /* seconds */
    suseconds_t tv_usec;    /* microseconds */
};
struct timezone {
    int tz_minuteswest;     /* minutes west of GMT */
    int tz_dsttime;         /* DST correction type */
};

Returns zero on success.

getrlimit

Get current resource limits.

int getrlimit(int resource, struct rlimit *rlim)
  • resource – resource flag
  • rlim – pointer to rlimit structure
struct rlimit {
    rlim_t rlim_cur;  /* soft limit */
    rlim_t rlim_max;  /* hard limit */
};

Returns zero on success and fills rlim structure with results.

resource flags

  • RLIMIT_AS – max size of process virtual memory
  • RLIMIT_CORE – max size of core file
  • RLIMIT_CPU – max CPU time, in seconds
  • RLIMIT_DATA – max size of process’s data segment
  • RLIMIT_FSIZE – max size of files that process is allowed to create
  • RLIMIT_LOCKS – max flock and fcntl leases allowed
  • RLIMIT_MEMLOCK – max bytes of RAM allowed to be locked
  • RLIMIT_MSGQUEUE – max size of POSIX message queues
  • RLIMIT_NICE – max nice value
  • RLIMIT_NOFILE – max number of files allowed to be opened plus one
  • RLIMIT_NPROC – max number of processes or threads
  • RLIMIT_RSS – max resident set pages
  • RLIMIT_RTPRIO – real-time priority ceiling
  • RLIMIT_RTTIME – limit in microseconds of real-time CPU scheduling
  • RLIMIT_SIGPENDING – max number of queued signals
  • RLIMIT_STACK – max size of process stack

getrusage

Obtain resource usage.

int getrusage(int who, struct rusage *usage)
  • who – target flag
  • usage – pointer to rusage structure
struct rusage {
    struct timeval ru_utime; /* used user CPU time */
    struct timeval ru_stime; /* used system CPU time */
    long   ru_maxrss;        /* maximum RSS */
    long   ru_ixrss;         /* shared memory size */
    long   ru_idrss;         /* unshared data size */
    long   ru_isrss;         /* unshared stack size */
    long   ru_minflt;        /* soft page faults */
    long   ru_majflt;        /* hard page faults */
    long   ru_nswap;         /* swaps */
    long   ru_inblock;       /* block input operations */
    long   ru_oublock;       /* block output operations */
    long   ru_msgsnd;        /* sent # of IPC messages */
    long   ru_msgrcv;        /* received # IPC messages */
    long   ru_nsignals;      /* number of signals received */
    long   ru_nvcsw;         /* voluntary context switches */
    long   ru_nivcsw;        /* involuntary context switches */
};

Returns zero on success.

who target

  • RUSAGE_SELF – get usage statistics for calling process
  • RUSAGE_CHILDREN – get usage statistics for all children of calling process
  • RUSAGE_THREAD – get usage statistics for calling thread

sysinfo

Return information about the system.

int sysinfo(struct sysinfo *info)
  • info – pointer to sysinfo structure
struct sysinfo {
    long uptime;             /* seconds since boot */
    unsigned long loads[3];  /* 1/5/15 minute load avg */
    unsigned long totalram;  /* total usable memory size */
    unsigned long freeram;   /* available memory */
    unsigned long sharedram; /* shared memory amount */
    unsigned long bufferram; /* buffer memory usage */
    unsigned long totalswap; /* swap space size */
    unsigned long freeswap;  /* swap space available */
    unsigned short procs;    /* total number of current processes */
    unsigned long totalhigh; /* total high memory size */
    unsigned long freehigh;  /* available high memory size */
    unsigned int mem_unit;   /* memory unit size in bytes */
    char _f[20-2*sizeof(long)-sizeof(int)];  /* padding to 64 bytes */
};

Returns zero on success and places system information in sysinfo structure.

times

Get process times.

clock_t times(struct tms *buf)
  • buf – pointer to tms structure
struct tms {
    clock_t tms_utime;  /* user time */
    clock_t tms_stime;  /* system time */
    clock_t tms_cutime; /* children user time */
    clock_t tms_cstime; /* children system time */
};

Returns clock ticks since arbitary point in past and may overflow. tms structure is filled with values.

ptrace

Trace a process.

long ptrace(enum __ptrace_request request, pid_t pid, void *addr, void *data)
  • request – determine type of trace to perform
  • pid – process id to trace
  • addr – pointer to buffer for certain response values
  • data – pointer to buffer used in certain types of traces

Returns zero on request, placing trace data into addr and/or data, depending on trace details in request flags.

request flags

  • PTRACE_TRACEME – indicate process traced by parent
  • PTRACE_PEEKTEXT and PTRACE_PEEKDATA – read word at addr and return as result of call
  • PTRACE_PEEKUSER – read word at addr in USER area of the traced process’s memory
  • PTRACE_POKETEXT and PTRACE_POKEDATA – copy data into addr in traced process’s memory
  • PTRACE_POKEUSER – copy data into addr in the traced process’s USER area in memory
  • PTRACE_GETREGS – copy traced program’s general registers into data
  • PTRACE_GETFPREGS – copy traced program’s floating-point registers into data
  • PTRACE_GETREGSET – read traced program’s registers in architecture-agnostic way
  • PTRACE_SETREGS – modify traced program’s general registers
  • PTRACE_SETFPREGS – modify traced program’s floating-point registers
  • PTRACE_SETREGSET – modify traced program’s registers (architecture-agnostic)
  • PTRACE_GETSIGINFO – get info about signal that caused stop into siginfo_t structure
  • PTRACE_SETSIGINFO – set signal info by copying siginfo_t structure from data into traced program
  • PTRACE_PEEKSIGINFO – get siginfo_t structures without removing queued signals
  • PTRACE_GETSIGMASK – copy mask of blocked signals into data which will be a sigset_t structure
  • PTRACE_SETSIGMASK – change blocked signals mask to value in data which should be a sigset_t structure
  • PTRACE_SETOPTIONS – set options from data, where data is a bit mask of the following options:
    • PTRACE_O_EXITKILL – send SIGKILL to traced program if tracing program exists
    • PTRACE_O_TRACECLONE – stop traced program at next clone syscall and start tracing new process
    • PTRACE_O_TRACEEXEC – stop traced program at next execve syscall
    • PTRACE_O_TRACEEXIT – stop the traced program at exit
    • PTRACE_O_TRACEFORK– stop traced program at next fork and start tracing forked process
    • PTRACE_O_TRACESYSGOOD – set bit 7 in signal number (SIGTRAP|0x80) when sending system call traps
    • PTRACE_O_TRACEVFORK – stop traced program at next vfork and start tracing new process
    • PTRACE_O_TRACEVFORKDONE – stop traced program after next vfork
    • PTRACE_O_TRACESECCOMP – stop traced program when seccomp rule is triggered
    • PTRACE_O_SUSPEND_SECCOMP – suspend traced program’s seccomp protections
  • PTRACE_GETEVENTMSG – get message about most recent ptrace event and put in data of tracing program
  • PTRACE_CONT – restart traced process that was stopped and if data is not zero, send number of signal to it
  • PTRACE_SYSCALL and PTRACE_SIGNELSTEP – restart traced process that was stopped but stop at entry or exit of next syscall
  • PTRACE_SYSEMU – continue, then stop on entry for next syscall (but don’t execute it)
  • PTRACE_SYSEMU_SINGLESTEP – same as PTRACE_SYSEMU but single step if instruction isn’t a syscall
  • PTRACE_LISTEN – restart traced program but prevent from executing (similar to SIGSTOP)
  • PTRACE_INTERRUPT – stop the traced program
  • PTRACE_ATTACH – attach to process pid
  • PTRACE_SEIZE attach to process pid but do not stop process
  • PTRACE_SECCOMP_GET_FILTER – allows for drump of traced program’s classic BPF filters, where addr is the index of filter and data is pointer to structure sock_filter
  • PTRACE_DETACH – detach then restart stopped traced program
  • PTRACE_GET_THREAD_AREA – reads TLS entry into GDT with index specified by addr, placing copy struct user_desc at data
  • PTRACE_SET_THREAD_AREA – sets TLS entry into GTD with index specified by addr, assigning it struct user_desc at data
  • PTRACE_GET_SYSCALL_INFO – get information about syscall that caused stop and place struct ptrace_syscall_info into data, where addr is size of buffer
struct ptrace_peeksiginfo_args {
    u64 off;    /* queue position to start copying signals */
    u32 flags;  /* PTRACE_PEEKSIGINFO_SHARED or 0 */
    s32 nr;     /* # of signals to copy */
};
struct ptrace_syscall_info {
    __u8 op;                    /* type of syscall stop */
    __u32 arch;                 /* AUDIT_ARCH_* value */
    __u64 instruction_pointer;  /* CPU instruction pointer */
    __u64 stack_pointer;        /* CPU stack pointer */
    union {
        struct {                /* op == PTRACE_SYSCALL_INFO_ENTRY */
            __u64 nr;           /* syscall number */
            __u64 args[6];      /* syscall arguments */
        } entry;
        struct {                /* op == PTRACE_SYSCALL_INFO_EXIT */
            __s64 rval;         /* syscall return value */
            __u8 is_error;      /* syscall error flag */
        } exit;
        struct {                /* op == PTRACE_SYSCALL_INFO_SECCOMP */
            __u64 nr;           /* syscall number */
            __u64 args[6];      /* syscall arguments */
            __u32 ret_data;    /* SECCOMP_RET_DATA part of SECCOMP_RET_TRACE return value */
        } seccomp;
    };
};

getuid

Get UID of calling process.

uid_t getuid(void)

Returns the UID. Always succeeds.

syslog

Read or clear kernel message buffer.

int syslog(int type, char *bufp, int len)
  • type – function to perform
  • bufp – pointer to buffer (used for reading)
  • len – length of buffer

Returns bytes read, available to read, total size of kernel buffer, or 0, depending on type flag.

type flag

  • SYSLOG_ACTION_READ – read len bytes of kernel message log into bufp, returns number of bytes read
  • SYSLOG_ACTION_READ_ALL – read entire kernel message log into bufp, reading last len bytes from kernel, returning bytes read
  • SYSLOG_ACTION_READ_CLEAR – read, then clear kernel message log into bufp, up to len bytes, returning bytes read
  • SYSLOG_ACTION_CLEAR – clear the kernel message log buffer, returns zero on success
  • SYSLOG_ACTION_CONSOLE_OFF – prevents kernel messages being sent to the console
  • SYSLOG_ACTION_CONSOLE_ON – enables kernel messages being sent to the console
  • SYSLOG_ACTION_CONSOLE_LEVEL – sets the log level of messages (values 1 to 8 via len) to allow message filtering
  • SYSLOG_ACTION_SIZE_UNREAD – returns number of bytes available for reading in kernel message log
  • SYSLOG_ACTION_SIZE_BUFFER – returns size of kernel message buffer

getgid

Get GID of calling process.

gid_t getgid(void)

Returns the GID. Always succeeds.

setuid

Set UID of calling process.

int setuid(uid_t uid)
  • uid – new UID

Returns zero on success.

setgid

Set GID of calling process.

int setgid(gid_t gid)
  • gid – new GID

Returns zero on success.

geteuid

Get effective UID of calling process.

uid_t geteuid(void)

Returns the effective UID. Always succeeds.

getegid

Get effective GID of calling process.

gid_t getegid(void)

Returns the effective GID. Always succeeds.

setpgid

Set process group ID of a process.

int setpgid(pid_t pid, pid_t pgid)
  • pid – process ID
  • pgid – process group ID

Returns zero on success.

getppid

Get process group ID of a process.

pid_t getpgid(pid_t pid)
  • pid – process ID

Returns process group ID.

getpgrp

Get process group ID of calling process.

pid_t getpgrp(void)

Return process group ID.

setsid

Create session if calling process isn’t leader of a process group.

pid_t setsid(void)

Returns created session ID.

setreuid

Set both real and effective UID for calling process.

int setreuid(uid_t ruid, uid_t euid)
  • ruid – the real UID
  • euid – the effective UID

Returns zero on success.

setregid

Set both real and effective GID for calling process.

int setregid(gid_t rgid, gid_t egid)
  • rgid – the real GID
  • egid – the effective GID

Returns zero on success.

getgroups

Get a list of supplementary group IDs for calling process.

int getgroups(int size, gid_t list[])
  • size – size of array list
  • list – array of gid_t to retreive list

Returns number of supplementary group IDs retreived into list.

setgroups

Set list of supplementary group IDs for calling process.

int setgroups(size_t size, const gid_t *list)
  • size – size of array list
  • list – array of gid_t to set list

Returns zero on success.

setresuid

Sets real, effective, and saved UID.

int setresuid(uid_t ruid, uid_t euid, uid_t suid)
  • ruid – the real UID
  • euid – the effective UID
  • suid – the saved UID

Returns zero on success.

setresgid

Sets real, effective, and saved GID.

int setresgid(gid_t rgid, gid_t egid, gid_t sgid)
  • rgid – the real GID
  • egid – the effective GID
  • sgid – the saved GID

Returns zero on success.

getresuid

Get the real, effective, and saved UID.

int getresuid(uid_t *ruid, uid_t *euid, uid_t *suid)
  • ruid – the real UID
  • euid – the effective UID
  • suid – the saved UID

Returns zero on success.

getresgid

Get the real, effective, and saved GID.

int getresuid(gid_t *rgid, gid_t *egid, gid_t *sgid)
  • rgid – the real GID
  • egid – the effective GID
  • sgid – the saved GID

Returns zero on success.

getpgid

Get process group ID of a process.

pid_t getpgid(pid_t pid)
  • pid – process ID

Returns process group ID.

setfsuid

Set UID for filesystem checks.

int setfsuid(uid_t fsuid)

Always returns previous filesystem UID.

setfsgid

Set GID for filesystem checks.

int setfsgid(uid_t fsgid)

Always returns previous filesystem GID.

getsid

Get session ID.

pid_t getsid(pid_t pid)

Returns session ID.

capget

Get capabilities of a thread.

int capget(cap_user_header_t hdrp, cap_user_data_t datap)
  • hdrp – capability header structure
  • datap – capability data structure
typedef struct __user_cap_header_struct {
    __u32 version;
    int pid;
} *cap_user_header_t;
typedef struct __user_cap_data_struct {
    __u32 effective;
    __u32 permitted;
    __u32 inheritable;
} *cap_user_data_t;

Returns zero on success.

capset

Set capabilities of a thread.

int capset(cap_user_header_t hdrp, const cap_user_data_t datap)
  • hdrp – capability header structure
  • datap – capability data structure
typedef struct __user_cap_header_struct {
    __u32 version;
    int pid;
} *cap_user_header_t;
typedef struct __user_cap_data_struct {
    __u32 effective;
    __u32 permitted;
    __u32 inheritable;
} *cap_user_data_t;

Returns zero on success.

rt_sigpending

Return signal set that are pending delivery to calling process or thread.

int sigpending(sigset_t *set)
  • set – pointer to sigset_t structure to retreive mask of signals.

rt_sigtimedwait

Suspend execution (until timeout) of calling process or thread until a signal referenced in set is pending.

int sigtimedwait(const sigset_t *set, siginfo_t *info, const struct timespec *timeout)
  • set – pointer to sigset_t structure to define signals to wait for
  • info – if not null, pointer to siginfo_t structure with info about signal
  • timeout – a timespec structure setting a maximum time to wait before resuming execution
struct timespec {
    long    tv_sec;         /* time in seconds */
    long    tv_nsec;        /* time in nanoseconds */
}

rt_sigqueueinfo

Queue a signal.

int rt_sigqueueinfo(pid_t tgid, int sig, siginfo_t *info)
  • tgid – thread group id
  • sig – signal to send
  • info – pointer to structure siginfo_t

Returns zero on success.

rt_sigsuspend

Wait for a signal.

int sigsuspend(const sigset_t *mask)
  • mask – pointer to sigset_t structure (defined in sigaction)

Always returns with -1.

sigaltstack

Set/get signal stack context.

int sigaltstack(const stack_t *ss, stack_t *oss)
  • ss – pointer to stack_t structure representing new signal stack
  • oss – pointer to stack_t structure used for getting information on current signal stack
typedef struct {
    void  *ss_sp;     /* stack base address */
    int    ss_flags;  /* flags */
    size_t ss_size;   /* bytes in stack */
} stack_t;

Returns zero on success.

utime

Change the last access and modification time of a file.

int utime(const char *filename, const struct utimbuf *times)
  • filename – pointer to string with filename
  • times – pointer to structure utimbuf structure
struct utimbuf {
    time_t actime;       /* access time */
    time_t modtime;      /* modification time */
};

Returns zero on success.

mknod

Create a special file (usually used for device files).

int mknod(const char *pathname, mode_t mode, dev_t dev)
  • pathname – pointer to string with full path of file to create
  • mode – permissions and type of file
  • dev – device number

Returns zero on success.

uselib

Load a shared library.

int uselib(const char *library)
  • library – pointer to string with full path of library file

Return zero on success.

personality

Set process execution domain (personality)

int personality(unsigned long persona)
  • persona – domain of persona

Returns previous persona on success unless persona is set to 0xFFFFFFFF.

ustat

Get filesystem statistics

int ustat(dev_t dev, struct ustat *ubuf)
  • dev – number of device with mounted filesystem
  • ubuf – pointer to ustat structure for return values
struct ustat {
    daddr_t f_tfree;      /* free blocks */
    ino_t   f_tinode;     /* free inodes */
    char    f_fname[6];   /* filesystem name */
    char    f_fpack[6];   /* filesystem pack name */
};

Returns zero on success and ustat structure referenced by ubuf is filled with statistics.

statfs

Get filesystem statistics.

int statfs(const char *path, struct statfs *buf)
  • path – pointer to string with filename of any file on the mounted filesystem
  • buf – pointer to statfs structure
struct statfs {
    __SWORD_TYPE    f_type;     /* filesystem type */
    __SWORD_TYPE    f_bsize;    /* optimal transfer block size */
    fsblkcnt_t      f_blocks;   /* total blocks */
    fsblkcnt_t      f_bfree;    /* free blocks */
    fsblkcnt_t      f_bavail;   /* free blocks available to unprivileged user */
    fsfilcnt_t      f_files;    /* total file nodes */
    fsfilcnt_t      f_ffree;    /* free file nodes */
    fsid_t          f_fsid;     /* filesystem id */
    __SWORD_TYPE    f_namelen;  /* maximum length of filenames */
    __SWORD_TYPE    f_frsize;   /* fragment size */
    __SWORD_TYPE    f_spare[5];
};

Returns zero on success.

fstatfs

Works just like statfs except provides filesystem statistics on via a file descriptor.

int fstatfs(int fd, struct statfs *buf)
  • fd – file descriptor
  • buf – pointer to statfs structure

Returns zero on success.

sysfs

Get filesystem type information.

int sysfs(int option, const char *fsname)
int sysfs(int option, unsigned int fs_index, char *buf)
int sysfs(int option)
  • option – when set to 3, return number of filesystem types in kernel, or can be 1 or 2 as indicated below
  • fsname – pointer to string with name of filesystem (set option to 1)
  • fs_index – index into null-terminated filesystem identifier string written to buffer at buf (set option to 2)
  • buf – pointer to buffer

Returns filesystem index when option is 1, zero for 2, and number of filesystem types in kernel for 3.

getpriority

Get priority of a process.

int getpriority(int which, int who)
  • which – flag determining which priority to get
  • who – PID of process

Returns priority of specified process.

which

  • PRIO_PROCESS – process
    * PRIO_PGRP – process group
  • PRIO_USER – user ID

setpriority

Set priority of a process.

int setpriority(int which, int who, int prio)
  • which – flag determining which priority to set
  • who – PID of process
  • prio – priority value (-20 to 19)

Returns zero on success.

sched_setparam

Set scheduling parameters of a process.

int sched_setparam(pid_t pid, const struct sched_param *param)
  • pid – PID of process
  • param – pointer to sched_param structure

Returns zero on success.

sched_getparam

int sched_getparam(pid_t pid, struct sched_param *param)
  • pid – PID of process
  • param – pointer to sched_param structure

Returns zero on success.

sched_setscheduler

Set scheduling parameters for a process.

int sched_setscheduler(pid_t pid, int policy, const struct sched_param *param)
  • pid – PID of process
  • policy – policy flag
  • param – pointer to sched_param structure

Returns zero on success.

policy

  • SCHED_OTHER – standard round-robin time sharing policy
  • SCHED_FIFO – first-in-first-out scheduling policy
  • SCHED_BATCH – executes processes in a batch-style schedule
  • SCHED_IDLE – denotes a process be set for low priority (background)

sched_getscheduler

Get scheduling parameters for a process.

int sched_getscheduler(pid_t pid)
  • pid – PID of process

Returns policy flag (see sched_setscheduler).

sched_get_priority_max

Get static priority maximum.

int sched_get_priority_max(int policy)
  • policy – policy flag (see sched_setscheduler)

Returns maximum priority value for provided policy.

sched_get_priority_min

Get static priority minimum.

int sched_get_priority_min(int policy)
  • policy – policy flag (see sched_setscheduler)

Returns minimum priority value for provided policy.

sched_rr_get_interval

Get SCHED_RR interval for a process.

int sched_rr_get_interval(pid_t pid, struct timespec *tp)
  • pid – PID of process
  • tp – pointer to timespec structure

Returns zero on success and fills tp with intervals for pid if SCHED_RR is the scheduling policy.

mlock

Lock all or part of calling process’s memory.

int mlock(const void *addr, size_t len)
  • addr – pointer to start of address space
  • len – length of address space to lock

Returns zero on success.

munlock

Unlock all or part of calling process’s memory.

int munlock(const void *addr, size_t len)
  • addr – pointer to start of address space
  • len – length of address space to unlock

Returns zero on success.

mlockall

Lock all address space of calling process’s memory.

int mlockall(int flags)
  • flags – flags defining additional behavior

flags

  • MCL_CURRENT – lock all pages as of time of calling this syscall
  • MCL_FUTURE – lock all pages that are mapped to this process in the future
  • MCL_ONFAULT – mark all current (or future, along with MCL_FUTURE) when they are page faulted

munlockall

Unlock all address space of calling process’s memory.

int munlockall(void)

Returns zero on success.

vhangup

Send a "hangup" signal to the current terminal.

int vhangup(void)

Returns zero on success.

modify_ldt

Read or write to the local descriptor table for a process

int modify_ldt(int func, void *ptr, unsigned long bytecount)
  • func0 for read, 1 for write
  • ptr – pointer to LDT
  • bytecount – bytes to read, or for write, size of user_desc structure
struct user_desc {
    unsigned int  entry_number;
    unsigned int  base_addr;
    unsigned int  limit;
    unsigned int  seg_32bit:1;
    unsigned int  contents:2;
    unsigned int  read_exec_only:1;
    unsigned int  limit_in_pages:1;
    unsigned int  seg_not_present:1;
    unsigned int  useable:1;
};

Returns bytes read or zero for success when writing.

pivot_root

Change root mount.

int pivot_root(const char *new_root, const char *put_old)
  • new_root – pointer to string with path to new mount
  • put_old – pointer to string with path for old mount

Returns zero on success.

prctl

int prctl(int option, unsigned long arg2, unsigned long arg3, unsigned long arg4,
 unsigned long arg5)
  • option – specify operation flag
  • arg2, arg3, arg4, and arg5 – variables used depending on option, see option flags

option

  • PR_CAP_AMBIENT – read/change ambient capability of calling thread referencing value in arg2, in regards to:
    • PR_CAP_AMBIENT_RAISE – capability in arg3 is added to ambient set
    • PR_CAP_AMBIENT_LOWER – capability in arg3 is removed from ambient set
    • PR_CAP_AMBIENT_IS_SET – returns 1 if capability in arg3 is in the ambient set, 0 if not
    • PR_CAP_AMBIENT_CLEAR_ALL – remove all capabilities from ambient set, set arg3 to 0
  • PR_CAPBSET_READ – return 1 if capability specified in arg2 is in calling thread’s capability bounding set, 0 if not
  • PR_CAPBSET_DROP – if calling thread has CAP_SETPCAP capability in user namespace, drop capability in arg2 from capability bounding set for calling process
  • PR_SET_CHILD_SUBREAPER – if arg2 is not zero, set "child subreaper" attribute for calling process, if arg2 is zero, unset
  • PR_GET_CHILD_SUBREAPER – return "child subreaper" setting of calling process in location pointed to by arg2
  • PR_SET_DUMPABLE – set state of dumpable flag via arg2
  • PR_GET_DUMPABLE – return current dumpable flag for calling process
  • PR_SET_ENDIAN – set endian-ness of calling process to arg2 via PR_ENDIAN_BIG, PR_ENDIAN_LITTLE, or PR_ENDIAN_PPC_LITTLE
  • PR_GET_ENDIAN – return endian-ness of calling process to location pointed by arg2
  • PR_SET_KEEPCAPS – set state of calling process’s "keep capabilities" flag via arg2
  • PR_GET_KEEPCAPS – return current state of calling process’s "keep capabilities" flag
  • PR_MCE_KILL – set machine check memory corruption kill policy for calling process via arg2
  • PR_MCE_KILL_GET – return current per-process machine check kill policy
  • PR_SET_MM – modify kernel memory map descriptor fields of calling process, where arg2 is one of the following options and arg3 is the new value to set
    • PR_SET_MM_START_CODE – set address above which program text can run
    • PR_SET_MM_END_CODE – set address below which program text can run
    • PR_SET_MM_START_DATA – set address above which initialized and uninitialized data are placed
    • PR_SET_MM_END_DATA – set address below which initialized and uninitialized data are placed
    • PR_SET_MM_START_STACK – set start address of stack
    • PR_SET_MM_START_BRK – set address above which program heap can be expanded with brk
    • PR_SET_MM_BRK – set current brk value
    • PR_SET_MM_ARG_START – set address above which command line is placed
    • PR_SET_MM_ARG_END – set address below which command line is placed
    • PR_SET_MM_ENV_START – set address above which environment is placed
    • PR_SET_MM_ENV_END – set address below which environment is placed
    • PR_SET_MM_AUXV – set new aux vector, with arg3 providing new address and arg4 containing size of vector
    • PR_SET_MM_EXE_FILE – Supersede /proc/pid/exe symlink with a new one pointing to file descriptor in arg3
    • PR_SET_MM_MAP – provide one-shot access to all addresses by passing struct prctl_mm_map pointer in arg3 with size in arg4
    • PR_SET_MM_MAP_SIZE – returns size of prctl_mm_map structure, where arg4 is pointer to unsigned int
  • PR_MPX_ENABLE_MANAGEMENT – enable kernel management of memory protection extensions
  • PR_MPX_DISABLE_MANAGEMENT – disable kernel management of memory protection extensions
  • PR_SET_NAME – set name of calling process to null-terminated string pointed to by arg2
  • PR_GET_NAME – get name of calling process in null-terminated string into buffer sized to 16 bytes referenced by pointer in arg2
  • PR_SET_NO_NEW_PRIVS – set calling process no_new_privs attribute to value in arg2
  • PR_GET_NO_NEW_PRIVS – return value of no_new_privs for calling process
  • PR_SET_PDEATHSIG – set parent-death signal of calling process to arg2
  • PR_GET_PDEATHSIG – return value of parent-death signal into arg2
  • PR_SET_SECCOMP – set "seccomp" mode for calling process via arg2
  • PR_GET_SECCOMP – get "seccomp" mode of calling process
  • PR_SET_SECUREBITS – set "securebits" flags of calling thread to value in arg2
  • PR_GET_SECUREBITS – return "securebits" flags of calling process
  • PR_GET_SPECULATION_CTRL – return state of speculation misfeature specified in arg2
  • PR_SET_SPECULATION_CTRL – set state of speculation misfeature specified in arg2
  • PR_SET_THP_DISABLE – set state of "THP disable" flag for calling process
  • PR_TASK_PERF_EVENTS_DISABLE – disable all performance counters for calling process
  • PR_TASK_PERF_EVENTS_ENABLE – enable performance counters for calling process
  • PR_GET_THP_DISABLE – return current setting of "THP disable" flag
  • PR_GET_TID_ADDRESS – return clear_child_tid address set by set_tid_address
  • PR_SET_TIMERSLACK – sets current timer slack value for calling process
  • PR_GET_TIMERSLACK – return current timer slack value for calling process
  • PR_SET_TIMING – set statistical process timing or accurate timestamp-based process timing by flag in arg2 (PR_TIMING_STATISTICAL or PR_TIMING_TIMESTAMP)
  • PR_GET_TIMING – return process timing method in use
  • PR_SET_TSC – set state of flag determining if timestamp counter can be read by process in arg2 (PR_TSC_ENABLE or PR_TSC_SIGSEGV)
  • PR_GET_TSC – return state of flag determing whether timestamp counter can be read in location pointed by arg2

Returns zero on success or value specified in option flag.

arch_prctl

Set architecture-specific thread state.

int arch_prctl(int code, unsigned long addr)
  • code – defines additional behavior

  • addr or *addr – address, or pointer in the case of "get" operations

  • ARCH_SET_FS – set 64-bit base for FS register to addr
  • ARCH_GET_FS – return 64-bit base value for FS register of current process in memory referenced by addr
  • ARCH_SET_GS – set 64-bit base address for GS register to addr
  • ARCH_GET_GS – return 64-bit base value for GS register of current process in memory referenced by addr

Returns zero on success.

adjtimex

Tunes kernel clock.

int adjtimex(struct timex *buf)
  • buf – pointer to buffer with timex structure
struct timex {
    int  modes;             /* mode selector */
    long offset;            /* time offset in nanoseconds if STA_NANO flag set, otherwise microseconds */
    long freq;              /* frequency offset */
    long maxerror;          /* max error in microseconds */
    long esterror;          /* est. error in microseconds */
    int  status;            /* clock command / status */
    long constant;          /* PLL (phase-locked loop) time constant */
    long precision;         /* clock precision in microseconds, read-only */
    long tolerance;         /* clock frequency tolerance, read-only */
    struct timeval time;    /* current time (read-only, except ADJ_SETOFFSET) */
    long tick;              /* microseconds between clock ticks */
    long ppsfreq;           /* PPS (pulse per second) frequency, read-only */
    long jitter;            /* PPS jitter, read-only, in nanoseconds if STA_NANO flag set, otherwise microseconds */
    int  shift;             /* PPS interval duration in seconds, read-only */
    long stabil;            /* PPS stability, read-only */
    long jitcnt;            /* PPS count of jitter limit exceeded events, read-only */
    long calcnt;            /* PPS count of calibration intervals, read-only */
    long errcnt;            /* PPS count of calibration errors, read-only */
    long stbcnt;            /* PPS count of stability limit exceeded events, read-only */
    int tai;                /* TAI offset set by previous ADJ_TAI operations, in seconds, read-only */
    /* padding bytes to allow future expansion */
};

Return clock state, either TIME_OK, TIME_INS, TIME_DEL, TIME_OOP, TIME_WAIT, or TIME_ERROR.

setrlimit

Set resource limits.

int setrlimit(int resource, const struct rlimit *rlim)
  • resource – type of resource to set (see getrlimit for list)
  • rlim – pointer to rlimit structure
struct rlimit {
    rlim_t rlim_cur;  /* soft limit */
    rlim_t rlim_max;  /* hard limit */
};

Returns zero on success.

chroot

Change root directory.

int chroot(const char *path)
  • path – pointer to string containing path to new mount

Returns zero on success.

sync

Flush filesystem caches to disk.

void sync(void)

Returns zero on success.

acct

Toggle process accounting.

int acct(const char *filename)
  • filename – pointer to string with existing file

Returns zero on success.

settimeofday

Set the time of day.

int settimeofday(const struct timeval *tv, const struct timezone *tz)
  • tv – pointer to timeval structure of new time (see gettimeofday for structure)
  • tz – pointer to timezone structure (see gettimeofday for structure)

Returns zero on success.

mount

Mount a file system.

int mount(const char *source, const char *target, const char *filesystemtype,
unsigned long mountflags, const void *data)
  • source – pointer to string containing device path
  • target – pointer to string containing mount target path
  • filesystemtype – pointer to filesystem type (see /proc/filesystems for supported filesystems)
  • mountflags – flags or mount options
  • data – usually a comma-separated list of options understood by the filesystem type

Returns zero on success.

mountflags

  • MS_BIND – perform bind mount, making file or subtree visible at another point within file systemn
  • MS_DIRSYNC – make dir changes synchronous
  • MS_MANDLOCK – allow mandatory locking
  • MS_MOVE – move subtree, source specifies existing mount point and target specifies new location
  • MS_NOATIME – don’t update access time
  • MS_NODEV – don’t allow access to special files
  • MS_NODIRATIME – don’t update access times for directories
  • MS_NOEXEC – don’t allow programs to be executed
  • MS_NOSUID – don’t honor SUID or SGID bits when running programs
  • MS_RDONLY – mount read-only
  • MS_RELATIME – update last access time if current value of atime is less or equal to mtime or ctime
  • MS_REMOUNT – remount existing mount
  • MS_SILENT – suppress disply of printk() warning messages in kernel log
  • MS_STRICTATIME – always update atime when accessed
  • MS_SYNCHRONOUS – make write synchronous

umount2

Unmount a filesystem.

int umount2(const char *target, int flags)
  • target – poiner to string with filesystem to umount
  • flags – additional options

Returns zero on success.

flags

  • MNT_FORCE – force unmount even if busy, which can cause data loss
  • MNT_DETACH – perform lazy unmount and make mount point unavailable for new access, then actually unmount when mount isn’t busy
  • MNT_EXPIRE – mark mount point as expired
  • UMOUNT_NOFOLLOW – do not dereference target if symlink

swapon

Start swapping to specified device.

int swapon(const char *path, int swapflags)
  • path – pointer to string with path to device
  • swapflags – flags for additional options

Returns zero on success.

swapflags

  • SWAP_FLAG_PREFER – new swap area will have a higher priority than the default priority level
  • SWAP_FLAG_DISCARD – discard or trim freed swap pages (for SSDs)

swapoff

Stop swapping to specified device.

int swapoff(const char *path)
  • path – pointer to string with path to device

Returns zero on success.

reboot

Reboot the system.

int reboot(int magic, int magic2, int cmd, void *arg)
  • magic – must be set to LINUX_REBOOT_MAGIC1 or LINUX_REBOOT_MAGIC2A for this call to work
  • magic2 – must be set to LINUX_REBOOT_MAGIC2 or LINUX_REBOOT_MAGIC2C for this call to work
  • arg – pointer to additional argument flag

Does not return on success, returns -1 on failure.

arg

  • LINUX_REBOOT_CMD_CAD_OFF – CTRL+ALT+DELETE is disabled, and CTRL+ALT+DELETE will send SIGINT to init
  • LINUX_REBOOT_CMD_CAD_ON – CTRL+ALT+DELETE enabled
  • LINUX_REBOOT_CMD_HALT – halt system and display "System halted."
  • LINUX_REBOOT_CMD_KEXEC – execute a previously loaded kernel with kexec_load, requires CONFIG_KEXEC in kernel
  • LINUX_REBOOT_CMD_POWER_OFF – power down system
  • LINUX_REBOOT_CMD_RESTART – restart system and display "Restarting system."
  • LINUX_REBOOT_CMD_RESTART2 – restart system and display "Restarting system with command aq%saq."

sethostname

Set hostname of machine.

int sethostname(const char *name, size_t len)
  • name – pointer to string with new name
  • len – length of new name

Returns zero on success.

setdomainname

Set NIS domain name.

int setdomainname(const char *name, size_t len)
  • name – pointer to string with new name
  • len – length of new name

Return zero on success.

iopl

Change I/O privilage level

int iopl(int level)
  • level – new privilege level

Returns zero on success.

ioperm

Set I/O permissions.

int ioperm(unsigned long from, unsigned long num, int turn_on)
  • from – starting port address
  • num – number of bits
  • turn_on – zero or non-zero denotes enabled or disabled

Returns zero on success.

init_module

Load module into kernel with module file specified by file descriptor.

int init_module(void *module_image, unsigned long len, const char *param_values)
  • module_image – pointer to buffer with binary image of module to load
  • len – size of buffer
  • param_values – pointer to string with parameters for kernel

Returns zero on success.

delete_module

Unload a kernel module.

int delete_module(const char *name, int flags)
  • name – pointer to string with name of module
  • flags – modify behavior of unload

Return zero on success.

flags

  • O_NONBLOCK – immediately return from syscall
  • O_NONBLOCK | O_TRUNC – unload module immediately even if reference count is not zero

quotactl

Change disk quotas.

int quotactl(int cmd, const char *special, int id, caddr_t addr)
  • cmd – command flag
  • special – pointer to string with path to mounted block device
  • id – user or group ID
  • addr – address of data structure, optional to some cmd flags

cmd

  • Q_QUOTAON – turn on quotas for filesystem referenced by special, with id specifying quota format to use:
    • QFMT_VFS_OLD – original format
    • QFMT_VFS_V0 – standard VFS v0 format
    • QFMT_VFS_V1 – format with support for 32-bit UIDs and GIDs
  • Q_QUOTAOFF – turn off quotas for filesystme referenced by special
  • Q_GETQUOTA – get quota limits and usage for a user or group id, referenced by id, where addr is pointer to dqblk structure
  • Q_GETNEXTQUOTA – same as Q_GETQUOTA but returns info for next id greater or equal to id that has quota set, where addr points to nextdqblk structure
  • Q_SETQUOTA – set quota info for user or group id, using dqblk structure referenced by addr
  • Q_GETINFO – get info about quotafile, where addr points to dqinfo structure
  • Q_SETINFO – set information about quotafile, where addr points to dqinfo structure
  • Q_GETFMT – get quota format used on filesystem referenced by special, where addr points to 4 byte buffer where format number will be stored
  • Q_SYNC – update on-disk copy of quota usage for filesystem
  • Q_GETSTATS – get statistics about quota subsystem, where addr points to a dqstats structure
  • Q_XQUOTAON – enable quotas for an XFS filesystem
  • Q_XQUOTAOFF – disable quotas on an XFS filesystem
  • Q_XGETQUOTA – on XFS filesystems, get disk quota limits and usage for user id specified by id, where addr points to fs_disk_quota structure
  • Q_XGETNEXTQUOTA – same as Q_XGETQUOTA but returns fs_disk_quota referenced by addr for next id greater or equal than id that has quota set
  • Q_XSETQLIM – on XFS filesystems, set disk quota for UID, where addr references pointer to fs_disk_quota structure
  • Q_XGETQSTAT – returns XFS specific quota info in fs_quota_stat referenced by addr
  • Q_XGETQSTATV – returns XFS specific quota info in fs_quota_statv referenced by addr
  • Q_XQUOTARM – on XFS filesystems, free disk space used by quotas, where addr references unsigned int value containing flags (same as d_flaags field of fs_disk_quota structure)
struct dqblk {
    uint64_t dqb_bhardlimit;  /* absolute limit on quota blocks alloc */
    uint64_t dqb_bsoftlimit;  /* preferred limit on quota blocks */
    uint64_t dqb_curspace;    /* current space used in bytes */
    uint64_t dqb_ihardlimit;  /* max number of allocated inodes */
    uint64_t dqb_isoftlimit;  /* preferred inode limit */
    uint64_t dqb_curinodes;   /* current allocated inodes */
    uint64_t dqb_btime;       /* time limit for excessive use over quota */
    uint64_t dqb_itime;       /* time limit for excessive files */
    uint32_t dqb_valid;       /* bit mask of QIF_* constants */
};
struct nextdqblk {
    uint64_t dqb_bhardlimit;
    uint64_t dqb_bsoftlimit;
    uint64_t dqb_curspace;
    uint64_t dqb_ihardlimit;
    uint64_t dqb_isoftlimit;
    uint64_t dqb_curinodes;
    uint64_t dqb_btime;
    uint64_t dqb_itime;
    uint32_t dqb_valid;
    uint32_t dqb_id;
};
struct dqinfo {
    uint64_t dqi_bgrace;  /* time before soft limit becomes hard limit */
    uint64_t dqi_igrace;  /* time before soft inode limit becomes hard limit */
    uint32_t dqi_flags;   /* flags for quotafile */
    uint32_t dqi_valid;
};
struct fs_disk_quota {
    int8_t   d_version;         /* version of structure */
    int8_t   d_flags;           /* XFS_{USER,PROJ,GROUP}_QUOTA */
    uint16_t d_fieldmask;       /* field specifier */
    uint32_t d_id;              /* project, UID, or GID */
    uint64_t d_blk_hardlimit;   /* absolute limit on disk blocks */
    uint64_t d_blk_softlimit;   /* preferred limit on disk blocks */
    uint64_t d_ino_hardlimit;   /* max # allocated inodes */
    uint64_t d_ino_softlimit;   /* preferred inode limit */
    uint64_t d_bcount;          /* # disk blocks owned by user */
    uint64_t d_icount;          /* # inodes owned by user */
    int32_t  d_itimer;          /* zero if within inode limits */
    int32_t  d_btimer;          /* as above for disk blocks */
    uint16_t d_iwarns;          /* # warnings issued regarding # of inodes */
    uint16_t d_bwarns;          /* # warnings issued regarding disk blocks */
    int32_t  d_padding2;        /* padding */
    uint64_t d_rtb_hardlimit;   /* absolute limit on realtime disk blocks */
    uint64_t d_rtb_softlimit;   /* preferred limit on realtime disk blocks */
    uint64_t d_rtbcount;        /* # realtime blocks owned */
    int32_t  d_rtbtimer;        /* as above, but for realtime disk blocks */
    uint16_t d_rtbwarns;        /* # warnings issued regarding realtime disk blocks */
    int16_t  d_padding3;        /* padding */
    char     d_padding4[8];     /* padding */
};
struct fs_quota_stat {
    int8_t   qs_version;            /* version for future changes */
    uint16_t qs_flags;              /* XFS_QUOTA_{U,P,G}DQ_{ACCT,ENFD} */
    int8_t   qs_pad;                /* padding */
    struct fs_qfilestat qs_uquota;  /* user quota storage info */
    struct fs_qfilestat qs_gquota;  /* group quota storage info */
    uint32_t qs_incoredqs;          /* number of dqots in core */
    int32_t  qs_btimelimit;         /* limit for blocks timer */
    int32_t  qs_itimelimit;         /* limit for inodes timer */
    int32_t  qs_rtbtimelimit;       /* limit for realtime blocks timer */
    uint16_t qs_bwarnlimit;         /* limit for # of warnings */
    uint16_t qs_iwarnlimit;         /* limit for # of warnings */
};
struct fs_qfilestatv {
    uint64_t qfs_ino;       /* inode number */
    uint64_t qfs_nblks;     /* number of BBs (512-byte blocks) */
    uint32_t qfs_nextents;  /* number of extents */
    uint32_t qfs_pad;       /* pad for 8-byte alignment */
};
struct fs_quota_statv {
    int8_t   qs_version;             /* version for future changes */
    uint8_t  qs_pad1;                /* pad for 16-bit alignment */
    uint16_t qs_flags;               /* XFS_QUOTA_.* flags */
    uint32_t qs_incoredqs;           /* number of dquots incore */
    struct fs_qfilestatv qs_uquota;  /* user quota info */
    struct fs_qfilestatv qs_gquota;  /* group quota info */
    struct fs_qfilestatv qs_pquota;  /* project quota info */
    int32_t  qs_btimelimit;          /* limit for blocks timer */
    int32_t  qs_itimelimit;          /* limit for inodes timer */
    int32_t  qs_rtbtimelimit;        /* limit for realtime blocks timer */
    uint16_t qs_bwarnlimit;          /* limit for # of warnings */
    uint16_t qs_iwarnlimit;          /* limit for # of warnings */
    uint64_t qs_pad2[8];             /* padding */
};

Returns zero on success.

gettid

Get thread ID.

pid_t gettid(void)

Returns thread ID of calling process.

readahead

Read file into page cache.

ssize_t readahead(int fd, off64_t offset, size_t count)
  • fd – file descriptor of file to read ahead
  • offset – offset from start of file to read
  • count – number of bytes to read

Returns zero on success.

setxattr

Set extended attribute value.

int setxattr(const char *path, const char *name, const void *value,
 size_t size, int flags)
  • path – pointer to string with filename
  • name – pointer to string with attribute name
  • value – pointer to string with attribute value
  • size – size of value
  • flags – set to XATTR_CREATE to create attribute, XATTR_REPLACE to replace

Returns zero on success.

lsetxattr

Set extended attribute value of symbolic link.

int lsetxattr(const char *path, const char *name, const void *value,
 size_t size, int flags)
  • path – pointer to string with symlink
  • name – pointer to string with attribute name
  • value – pointer to string with attribute value
  • size – size of value
  • flags – set to XATTR_CREATE to create attribute, XATTR_REPLACE to replace

Returns zero on success.

fsetxattr

Set extended attribute value of file referenced by file descriptor.

int fsetxattr(int fd, const char *name, const void *value, size_t size, int flags)
  • fd – file descriptor of file in question
  • name – pointer to string with attribute name
  • value – pointer to string with attribute value
  • size – size of value
  • flags – set to XATTR_CREATE to create attribute, XATTR_REPLACE to replace

Returns zero on success.

getxattr

Get extended attribute value.

ssize_t getxattr(const char *path, const char *name, void *value, size_t size)
  • path – pointer to string with filename
  • name – pointer to string with attribute name
  • value – pointer to string with attribute value
  • size – size of value

Returns size of extended attribute value.

lgetxattr

Get extended attribute value from symlink.

ssize_t lgetxattr(const char *path, const char *name, void *value, size_t size)
  • path – pointer to string with symlink
  • name – pointer to string with attribute name
  • value – pointer to string with attribute value
  • size – size of value

Returns size of extended attribute value.

fgetxattr

Get extended attribute value from file referenced by file descriptor.

ssize_t fgetxattr(int fd, const char *name, void *value, size_t size)
  • fd – file descriptor of file in question
  • name – pointer to string with attribute name
  • value – pointer to string with attribute value
  • size – size of value

Returns size of extended attribute value.

listxattr

List extended attribute names.

ssize_t listxattr(const char *path, char *list, size_t size)
  • path – pointer to string with filename
  • list – pointer to list of attribute names
  • size – size of list buffer

Returns size of name list.

llistxattr

List extended attribute names for a symlink.

ssize_t llistxattr(const char *path, char *list, size_t size)
  • path – pointer to string with symlink
  • list – pointer to list of attribute names
  • size – size of list buffer

Returns size of name list.

flistxattr

List extended attribute names for file referenced by file descriptor.

ssize_t flistxattr(int fd, char *list, size_t size)
  • fd – file descriptor of file in question
  • list – pointer to list of attribute names
  • size – size of list buffer

Returns size of name list.

removexattr

Remove an extended attribute.

int removexattr(const char *path, const char *name)
  • path – pointer to string with filename
  • name – pointer to string with name of attribute to remove

Returns zero on success.

lremovexattr

Remove an extended attribute of a symlink.

int lremovexattr(const char *path, const char *name)
  • path – pointer to string with filename
  • name – pointer to string with name of attribute to remove

Returns zero on success.

fremovexattr

Remove an extended attribute of a file referenced by a file descriptor.

int fremovexattr(int fd, const char *name)
  • fd – file descriptor of file in question
  • name – pointer to string with name of attribute to remove

Returns zero on success.

tkill

Send a signal to a thread.

int tkill(int tid, int sig)
  • tid – thread id
  • sig – signal to send

Returns zero on success.

time

Get time in seconds.

time_t time(time_t *t)
  • t – if not NULL, return value is also stored in referenced memory address

Returns time (in seconds) since UNIX Epoch.

futex

Fast user-space locking.

int futex(int *uaddr, int op, int val, const struct timespec *timeout,
 int *uaddr2, int val3)
  • uaddr – pointer to address of value to monitor for change
  • op – operation flag
  • timeout – pointer to timespec structure with timeout
  • uaddr2 – pointer to integer used for some operations
  • val3 – additional argument in some operations

Return value depends on operation detailed above.

op

  • FUTEX_WAIT – atomically varifies that uaddr still contains value val and sleeps awaiting FUTEX_WAKE on this address
  • FUTEX_WAKE – wakes at most val processes waiting on futex address
  • FUTEX_REQUEUE – wakes up val processes and requeues all waiters on futex at address uaddr2
  • FUTEX_CMP_REQUEUE – similar to FUTEX_REQUEUE but first checks if location uaddr contains value of val3

sched_setaffinity

Set process CPU affinity mask.

int sched_setaffinity(pid_t pid, size_t cpusetsize, cpu_set_t *mask)
  • pid – PID of process
  • cpusetsize – length of data at mask
  • mask – pointer to mask

Returns zero on success.

sched_getaffinity

Get process CPU affinity mask.

int sched_getaffinity(pid_t pid, size_t cpusetsize, cpu_set_t *mask)
  • pid – PID of process
  • cpusetsize – length of data at mask
  • mask – pointer to mask

Returns zero on success with mask placed in memory referenced by mask.

set_thread_area

Set thread local storage area.

int set_thread_area(struct user_desc *u_info)
  • u_info – pointer to user_desc structure

Returns zero on success.

io_setup

Create async I/O context.

int io_setup(unsigned nr_events, aio_context_t *ctx_idp)
  • nr_events – total number of events to receive
  • ctx_idp – pointer reference to created handle

Returns zero on success.

io_destroy

Destroy async I/O context.

int io_destroy(aio_context_t ctx_id)
  • ctx_id – ID of context to destroy

Returns zero on success.

io_getevents

Read async I/O events from queue.

int io_getevents(aio_context_t ctx_id, long min_nr, long nr, struct io_event
*eventsstruct, timespec *timeout)
  • ctx_id – AIO context ID
  • min_nr – minimum number of events to read
  • nr – number of events to read
  • eventsstruct – pointer to io_event structure
  • timeout – pointer to timespec timeout structure

Returns number of events read, or zero if no events are available or are less than min_nr.

io_submit

Submit async I/O blocks for processing.

int io_submit(aio_context_t ctx_id, long nrstruct, iocb *iocbpp)
  • ctx_id – AIO context ID
  • nrstruct – number of structures
  • iocbpp – pointer to iocb structure

Returns number of iocb submitted.

io_cancel

Cancel previously submitted async I/O operation.

int io_cancel(aio_context_t ctx_id, struct iocb *iocb, struct io_event *result)
  • ctx_id – AIO context ID
  • iocb – pointer to iocb structure
  • result – pointer to io_event structure

Returns zero on success and copies event to memory referenced by result.

get_thread_area

Get a thread local storage area.

int get_thread_area(struct user_desc *u_info)
  • u_info – pointer to user_desc structure to receive data

Returns zero on success.

lookup_dcookie

Return directory entry’s path.

int lookup_dcookie(u64 cookie, char *buffer, size_t len)
  • cookie – unique identifer of a directory entry
  • buffer – pointer to buffer with full path of directory entry
  • len – length of buffer

Returns bytes written to buffer with path string.

epoll_create

Open epoll file descriptor.

int epoll_create(int size)
  • size – ignored, but must be greater than 0

Returns file desctriptor.

getdents64

Get directory entries.

int getdents(unsigned int fd, struct linux_dirent *dirp, unsigned int count)
  • fd – file descriptor of directory
  • dirp – pointer to linux_dirent structure for results
  • count – size of the dirp buffer
struct linux_dirent {
    unsigned long  d_ino;     /* inode number */
    unsigned long  d_off;     /* offset to next linux_dirent */
    unsigned short d_reclen;  /* length of this linux_dirent */
    char           d_name[];  /* null-terminated filename */
    char           pad;       /* zero padding byte */
    char           d_type;    /* file type */ 
}

Returns bytes read, and at end of directory returns zero.

set_tid_address

Set pointer to thread ID.

long set_tid_address(int *tidptr)
  • tidptr – pointer to thread ID

Returns PID of calling process.

restart_syscall

Restart a syscall.

long sys_restart_syscall(void)

Returns value of system call it restarts.

semtimedop

Same as the semop syscall except if calling thread would sleep, duraton is limited to timeout.

int semtimedop(int semid, struct sembuf *sops, unsigned nsops, struct timespec *timeout)
  • semid – id of semaphore
  • sops – pointer to sembuf structure for operations
  • nsops – number of operations
  • timeout – timeout for calling thread, and upon return from syscall time elapsed placed in structure

Returns zero on success.

fadvise64

Predeclare access pattern for file data to allow kernel to optimize I/O operations.

int posix_fadvise(int fd, off_t offset, off_t len, int advice)
  • fd – file descriptor of file in question
  • offset – offset that access will begin
  • len – length of anticipated access, or 0 to end of file
  • advice – advice to give kernel

Returns zero on success.

advice

  • POSIX_FADV_NORMAL – application has no specific advice
  • POSIX_FADV_SEQUENTIAL – application expects to access data sequentially
  • POSIX_FADV_RANDOM – data will be access randomly
  • POSIX_FADV_NOREUSE – data will be accessed only once
  • POSIX_FADV_WILLNEED – data will be needed in near future
  • POSIX_FADV_DONTNEED – data will not be needed in near future

timer_create

Create POSIX per-process timer.

int timer_create(clockid_t clockid, struct sigevent *sevp, timer_t *timerid)
  • clockid – type of clock to use
  • sevp – pointer to sigevent structure explaining how caller will be notified when timer expires
  • timerid – pointer to buffer that will receive timer ID

Returns zero on success.

union sigval {
    int     sival_int;
    void   *sival_ptr;
};
struct sigevent {
    int          sigev_notify; /* method of notification */
    int          sigev_signo;  /* notification signal */
    union sigval sigev_value;  /* data to pass with notification */
    void       (*sigev_notify_function) (union sigval); /* Function used for thread notification */
    void        *sigev_notify_attributes; /* attributes for notification thread */
    pid_t        sigev_notify_thread_id;  /* id of thread to signal */
};

clockid

  • CLOCK_REALTIME – settable system wide real time clock
  • CLOCK_MONOTONIC – nonsettable monotonicly increasing clock measuring time from unspecified point in past
  • CLOCK_PROCESS_CPUTIME_ID – clock measuring CPU time consumed by the calling process and its threads
  • CLOCK_THREAD_CPUTIME_ID – clock measuring CPU time consumed by calling thread

timer_settime

Arm or disarm POSIX per-process timer.

int timer_settime(timer_t timerid, int flags, const struct itimerspec *new_value,
 struct itimerspec *old_value)
  • timerid – id of timer
  • flags – specify TIMER_ABSTIME to process new_value->it_value as an absolute value
  • new_value – pointer to itimerspec structure defining new initial and new interval for timer
  • old_value – pointer to structure to receive previous timer details
struct itimerspec {
    struct timespec it_interval;  /* interval */
    struct timespec it_value;     /* expiration */
};

Returns zero on success.

timer_gettime

Returns time until next expiration from POSIX per-process timer.

int timer_gettime(timer_t timerid, struct itimerspec *curr_value)
  • timerid – id of timer
  • curr_value – pointer to itimerspec structure where current timer values are returned

Returns zero on success.

timer_getoverrun

Get overrun count on a POSIX per-process timer.

int timer_getoverrun(timer_t timerid)
  • timerid – id of timer

Returns overrun count of specified timer.

timer_delete

Delete POSIX per-process timer.

int timer_delete(timer_t timerid)
  • timerid – id of timer

Returns zero on success.

clock_settime

Set specified clock.

int clock_settime(clockid_t clk_id, const struct timespec *tp)
  • clk_id – clock id
  • tp – pointer to timespec structure with clock detais

Returns zero on success.

clock_gettime

Get time from specified clock.

int clock_gettime(clockid_t clk_id, struct timespec *tp)
  • clk_id – clock id
  • tp – pointer to timespec structure returned with clock detais

Returns zero on success.

clock_getres

Obtain resolution of specified clock.

int clock_getres(clockid_t clk_id, struct timespec *res)
  • clk_id – clock id
  • res – pointer to timespec structure returned with detais

Returns zero on success.

clock_nanosleep

High-resolution sleep with specifiable clock.

int clock_nanosleep(clockid_t clock_id, int flags, const struct timespec
*request, struct timespec *remain)
  • clock_id – type of clock to use
  • flags – specify TIMER_ABSTIME to process request is interpreted as an absolute value
  • remain – pointer to timespec structure to receive remaining time on sleep

Returns zero after sleep interval.

exit_group

Exit all threads in a process.

void exit_group(int status)
  • status – status code to return

Does not return.

epoll_wait

Wait for I/O event on epoll file descriptor.

int epoll_wait(int epfd, struct epoll_event *events, int maxevents, int timeout)
  • epfd – epoll file descriptor
  • events – pointer to epoll_event structure with events available to calling process
  • maxevents – maximum number of events, must e greater than zero
  • timeout – timeout in milliseconds
typedef union epoll_data {
    void    *ptr;
    int      fd;
    uint32_t u32;
    uint64_t u64;
} epoll_data_t;
struct epoll_event {
    uint32_t     events;    /* epoll events */
    epoll_data_t data;      /* user data variable */
};

Returns number of file descriptors ready for requested I/O or zero if timeout occured before any were available.

epoll_ctl

Control interface for epoll file descriptor.

int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event)
  • epfd – epoll file descriptor
  • op – operation flag
  • fd – file descirptor for target file
  • event – pointer to epoll_event structure with event, purpose altered by op

Returns zero on success.

op

  • EPOLL_CTL_ADD – add fd to interest list
  • EPOLL_CTL_MOD – change settings associated with fd in interest list to new settings specified in event
  • EPOLL_CTL_DEL – remove target file descriptor fd from interest list, with event argument ignored

tgkill

Send signal to a thread.

int tgkill(int tgid, int tid, int sig)
  • tgid – thread group id
  • tid – thread id
  • sig – signal to send

Returns zero on success.

utimes

Change file last access and modification times.

int utimes(const char *filename, const struct timeval times[2])
  • filename – pointer to string with file in question
  • times – array of timeval structure where times[0] specifies new access time where times[1] specifies new modification time

Returns zero on success.

mbind

Set NUMA memory policy on a memory range.

long mbind(void *addr, unsigned long len, int mode, const unsigned long
 *nodemask, unsigned long maxnode, unsigned flags)
  • addr – pointer to starting memory address
  • len – length of memory segment
  • mode – NUMA mode
  • nodemask – pointer to mask defining nodes that mode applies to
  • maxnode – max number of bits for nodemask
  • flags – set MPOL_F_STATIC_NODES to specify physical nodes, MPOL_F_RELATIVE_NODES to specify node ids relative to set allowed by threads current cpuset

Returns zero on success.

mode

  • MPOL_DEFAULT – remove any nondefault policy and restore default behavior
  • MPOL_BIND – specify policy restricting memory allocation to node specified in nodemask
  • MPOL_INTERLEAVE – specify page allocations be interleaved across set of nodes specified in nodemask
  • MPOL_PREFERRED – set preferred node for allocation
  • MPOL_LOCAL – mode specifies "local allocation" – memory is allocated on the node of the CPU that triggers allocation

set_mempolicy

Set default NUMA memory policy for thread and its offspring.

long set_mempolicy(int mode, const unsigned long *nodemask,
 unsigned long maxnode)
  • mode – NUMA mode
  • nodemask – pointer to mask defining node that mode applies to
  • maxnode – max number of bits for nodemask

Return zero on success.

get_mempolicy

Get NUMA memory policy for thread and its offspring.

long get_mempolicy(int *mode, unsigned long *nodemask, unsigned long maxnode,
 void *addr, unsigned long flags)
  • mode – NUMA mode
  • nodemask – pointer to mask defining node that mode applies to
  • maxnode – max number of bits for nodemask
  • addr – pointer to memory region
  • flags – defines behavior of call

Return zero on success.

flags

  • MPOL_F_NODE or 0 (zero preferred) – get information about calling thread’s default policy and store in nodemask buffer
  • MPOL_F_MEMS_ALLOWEDmode argument is ignored and subsequent calls return set of nodes thread is allowed to specify is returned in nodemask
  • MPOL_F_ADDR – get information about policy for addr

mq_open

Creates a new or open existing POSIX message queue.

mqd_t mq_open(const char *name, int oflag)
mqd_t mq_open(const char *name, int oflag, mode_t mode, struct mq_attr *attr)
  • name – pointer to string with name of queue
  • oflag – define operation of call
  • mode – permissions to place on queue
  • attr – pointer to mq_attr structure to define parameters of queue
struct mq_attr {
    long mq_flags;       /* flags (not used for mq_open) */
    long mq_maxmsg;      /* max messages on queue */
    long mq_msgsize;     /* max message size in bytes */
    long mq_curmsgs;     /* messages currently in queue (not used for mq_open) */
};

oflag

  • O_RDONLY – open queue to only receive messages
  • O_WRONLY – open queue to send messages
  • O_RDWR – open queue for both send and receive
  • O_CLOEXEC – set close-on-exec flag for message queue descriptor
  • O_CREAT – create message queue if it doesn’t exist
  • O_EXCL – if O_CREAT specified and queue already exists, fail with EEXIST
  • O_NONBLOCK – open queue in nonblocking mode

Remove message queue.

int mq_unlink(const char *name)
  • name – pointer to string with queue name

Returns zero on success.

mq_timedsend

Send message to message queue.

int mq_send(mqd_t mqdes, const char *msg_ptr, size_t msg_len, unsigned msg_prio,
 const struct timespec *abs_timeout)
  • mqdes – descriptor pointing to message queue
  • msg_ptr – pointer to message
  • msg_len – length of message
  • msg_prio – priority of message
  • abs_timeout – pointer to timespec structure defining timeout

Returns zero on success.

mq_timedreceive

Receive a message from a message queue.

ssize_t mq_receive(mqd_t mqdes, char *msg_ptr, size_t msg_len, unsigned *msg_prio)
  • mqdes – descriptor pointing to message queue
  • msg_ptr – pointer to buffer to receive message
  • msg_len – length of message

Return number of bytes in received message.

mq_notify

Register to receive notification when message is available in a message queue.

int mq_notify(mqd_t mqdes, const struct sigevent *sevp)
  • mqdes – descriptor pointing to message queue
  • sevp – pointer to sigevent structure

Returns zero on success.

kexec_load

Load new kernel for execution at a later time.

long kexec_load(unsigned long entry, unsigned long nr_segments, struct
kexec_segment *segments, unsigned long flags)
  • entry – entry address in kernel image
  • nr_segments – number of segments referenced by segments pointer
  • segments – pointer to kexec_segment structure defining kernel layout
  • flags – modify behavior of call
struct kexec_segment {
    void   *buf;        /* user space buffer */
    size_t  bufsz;      /* user space buffer length */
    void   *mem;        /* physical address of kernel */
    size_t  memsz;      /* physical address length */
};

Returns zero on success.

flags

  • KEXEC_FILE_UNLOAD – unload currently loaded kernel
  • KEXEC_FILE_ON_CRASH – load new kernel in memory region reserved for crash kernel
  • KEXEC_FILE_NO_INITRAMFS – specify that loading initrd/initramfs is optional

waitid

Wait for change of state in process.

int waitid(idtype_t idtype, id_t id, siginfo_t *infop, int options)
  • idtype – defines id scope, specifying P_PID for process id, P_PGID process group id, or P_ALL to wait for any child where id is ignored
  • id – id of process or process group, defined by idtype
  • infop – pointer to siginfo_t structure filled in by return
  • options – modifies behavior of syscall

Returns zero on success.

options

  • WNOHANG – return immediately if no child has exited
  • WUNTRACED – also return if child as stopped but not traced
  • WCONTINUED – also return if stopped child has resumed via SIGCONT
  • WIFEXITED – returns true if child was terminated normally
  • WEXITSTATUS – returns exist status of child
  • WIFSIGNALED – returns true if child process terminated by signal
  • WTERMSIG – returns signal that caused child process to terminate
  • WCOREDUMP – returns true if child produced core dump
  • WIFSTOPPED – returns true if child process stopped by delivery of signal
  • WSTOPSIG – returns number of signal that causd child to stop
  • WIFCONTINUED – returns true if child process was resumed via SIGCONT
  • WEXITED – wait for terminated children
  • WSTOPPED – wait for stopped children via delivery of signal
  • WCONTINUED – wait for previously stopped children that were resumed via SIGCONT
  • WNOWAIT – leave child in waitable state

add_key

Add key to kernel’s key management.

key_serial_t add_key(const char *type, const char *description, const void
*payload, size_t plen, key_serial_t keyring)
  • type – pointer to string with type of key
  • description – pointer to string with description of key
  • payload – key to add
  • plen – length of key
  • keyring – serial number of keyring or special flag

Returns serial number of created key.

keyring

  • KEY_SPEC_THREAD_KEYRING – specifies caller’s thread-specific keyring
  • KEY_SPEC_PROCESS_KEYRING – specifies caller’s process-specific keyring
  • KEY_SPEC_SESSION_KEYRING – specifies caller’s session-specific keyring
  • KEY_SPEC_USER_KEYRING – specifies caller’s UID-specific keyring
  • KEY_SPEC_USER_SESSION_KEYRING – specifies caller’s UID-session keyring

request_key

Request key from kernel’s key management.

key_serial_t request_key(const char *type, const char *description,
 const char *callout_info, key_serial_t keyring)
  • type – pointer to string with type of key
  • description – pointer to string with description of key
  • callout_info – pointer to string set if key isn’t found
  • keyring – serial number of keyring or special flag

Returns serial number of key found on success.

keyctl

Manipulate kernel’s key management.

long keyctl(int cmd, ...)
  • cmd – command flag modifying syscall behavior
  • ... – additional arguments per cmd flag

Returns serial number of key found on success.

cmd

  • KEYCTL_GET_KEYRING_ID – ask for keyring id
  • KEYCTL_JOIN_SESSION_KEYRING – join or start named session keyring
  • KEYCTL_UPDATE – update key
  • KEYCTL_REVOKE – revoke key
  • KEYCTL_CHOWN – set ownership of key
  • KEYCTL_SETPERM – set permissions on a key
  • KEYCTL_DESCRIBE – describe key
  • KEYCTL_CLEAR – clear contents of keyring
  • KEYCTL_LINK – link key into keyring
  • KEYCTL_UNLINK – unlink key from keyring
  • KEYCTL_SEARCH – search for key in keyring
  • KEYCTL_READ – read key or keyring’s contents
  • KEYCTL_INSTANTIATE – instantiate partially constructed key
  • KEYCTL_NEGATE – negate partially constructed key
  • KEYCTL_SET_REQKEY_KEYRING – set default request-key keyring
  • KEYCTL_SET_TIMEOUT – set timeout on a key
  • KEYCTL_ASSUME_AUTHORITY – assume authority to instantiate key

ioprio_set

Set I/O scheduling class and priority.

int ioprio_set(int which, int who, int ioprio)
  • which – flag specifying target of who
  • who – id determined by which flag
  • ioprio – bit mask specifying scheduling class and priority to assign to who process

Returns zero on success.

which

  • IOPRIO_WHO_PROCESSwho is process or thread id, or 0 to use calling thread
  • IOPRIO_WHO_PGRPwho – is a process id identifying all members of a process group, or 0 to operate on process group where calling process is member
  • IOPRIO_WHO_USERwho is UID identifying all processes that have a matching real UID

ioprio_get

Get I/O scheduling class and priority.

int ioprio_get(int which, int who)
  • which – flag specifying target of who
  • who – id determined by which flag

Return ioprio value of process with highest I/O priority of matching processes.

inotify_init

Initialize an inotify instance.

int inotify_init(void)

Returns file descriptor of new inotify event queue.

inotify_add_watch

Add watch to an initalized inotify instance.

int inotify_add_watch(int fd, const char *pathname, uint32_t mask)
  • fd – file descriptor referring to inodify instance with watch list to be modified
  • pathname – pointer to string with path to monitor
  • mask – mask of events to be monitored

Returns watch descriptor on success.

inotify_rm_watch

Remove existing watch from inotify instance.

int inotify_rm_watch(int fd, int wd)
  • fd – file descriptor associated with watch
  • wd – watch descriptor

Returns zero on success.

migrate_pages

Move pages in process to another set of nodes.

long migrate_pages(int pid, unsigned long maxnode, const unsigned long
 *old_nodes, const unsigned long *new_nodes)
  • pid – PID of process in question
  • maxnode – max nodes in old_nodes and new_nodes masks
  • old_nodes – pointer to mask of node numbers to move from
  • new_nodes – pointer to mask of node numbers to move to

Returns number of pages that couldn’t be moved.

openat

Open file relative to directory file descirptor.

int openat(int dirfd, const char *pathname, int flags)
int openat(int dirfd, const char *pathname, int flags, mode_t mode)
  • dirfd – file descriptor of directory
  • pathname – pointer to string with path name
  • flags – see open syscall
  • mode – see open syscall

Returns new file descriptor on success.

mkdirat

Create directory relative to directory file descriptor.

int mkdirat(int dirfd, const char *pathname, mode_t mode)
  • dirfd – file descriptor of directory
  • pathname – pointer to string with path name
  • mode – see mkdir syscall

Returns zero on success.

mknodat

Create a special file relative to directory file descriptor.

int mknodat(int dirfd, const char *pathname, mode_t mode, dev_t dev)
  • dirfd – file descriptor of directory
  • pathname – pointer to string with path name
  • mode – see mknod syscall
  • dev – device number

Returns zero on success.

fchownat

Change ownership of file relative to directory file descriptor.

int fchownat(int dirfd, const char *pathname, uid_t owner, gid_t group, int flags)
  • dirfd – file descriptor of directory
  • pathname – pointer to string with path name
  • owner – user id (UID)
  • group – group id (GID)
  • flags – if AT_SYMLINK_NOFOLLOW is specified, do no dereference symlinks

unlinkat

Delete name and possibly file it references.

int unlinkat(int dirfd, const char *pathname, int flags)
  • dirfd – file descriptor of directory
  • pathname – pointer to string with path name
  • flags – see unlink or rmdir

Returns zero on success.

renameat

Change name or location of file relative to directory file descriptor.

int renameat(int olddirfd, const char *oldpath, int newdirfd, const char *newpath)
  • olddirfd – file descriptor of directory with source
  • oldpath – pointer to string with path name to source
  • newdirfd – file descriptor of directory with target
  • newpath – pointer to string with path name to target

Returns zero on success.

linkat

Create a hard link relative to directory file descriptor.

int linkat(int olddirfd, const char *oldpath, int newdirfd, const char *newpath, int flags)
  • olddirfd – file descriptor of directory with source
  • oldpath – pointer to string with path name to source
  • newdirfd – file descriptor of directory with target
  • newpath – pointer to string with path name to target
  • flags – see link

Returns zero on success.

symlinkat

Create a symbolic link relative to directory file descriptor.

int symlinkat(const char *target, int newdirfd, const char *linkpath)
  • target – pointer to string with target
  • newdirfd – file descriptor of directory with target
  • linkpath – pointer to string with source

Returns zero on success.

readlinkat

Read contents of symbolic link pathname relative to directory file descriptor.

ssize_t readlinkat(int dirfd, const char *pathname, char *buf, size_t bufsiz)
  • dirfd – file descriptor relative to symlink
  • pathname – pointer to string with symlink path
  • buf – pointer to buffer receiving symlink pathname
  • bufsiz – size of buf

Returns number of bytes placed into buf on success.

fchmodat

Change permissions of file relative to a directory file descriptor.

int fchmodat(int dirfd, const char *pathname, mode_t mode, int flags)
  • dirfd – file descriptor of directory
  • pathname – pointer to string with file in question
  • mode – permissions mask
  • flags – see chmod

Returns zero on success.

faccessat

Check user’s permissions for a given file relative to a directory file descriptor.

int faccessat(int dirfd, const char *pathname, int mode, int flags)
  • dirfd – file descriptor of directory
  • pathname – pointer to string with file in question
  • mode – specify check to perform
  • flags – see access

Returns zero if permissions are granted.

pselect6

Synchronous I/O multiplexing. Works just like select with a modified timeout and signal mask.

int pselect6(int nfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds,
const struct timespec *timeout, const sigset_t *sigmask)
  • nfds – number of file desctipros to monitor (add 1)
  • readfds – fixed buffer with list of file descriptors to wait for read access
  • writefds – fixed buffer with list of file descriptors to wait for write access
  • exceptfds – fixed buffer with list of file descriptors to wait for exceptional conditions
  • timeout – timeval structure with time to wait before returning
  • sigmask – pointer to signal mask

Returns number of file descriptors contained in returned descriptor sets.

ppoll

Wait for an event on a file descriptor like poll but allows for a signal to interrupt timeout.

int ppoll(struct pollfd *fds, nfds_t nfds, const struct timespec *timeout_ts,
 const sigset_t *sigmask)
  • fds – pointer to an array of pollfd structures (described below)
  • nfds – number of pollfd items in the fds array
  • timeout_ts – sets the number of milliseconds the syscall should block (negative forces poll to return immediately)
  • sigmask – signal mask

Returns number of structures having nonzero revents fields, or zero upon timeout.

unshare

Disassociate parts of process execution context.

int unshare(int flags)
  • flags – define behavior of call

flags

  • CLONE_FILES – unsuare file descriptor table so calling process no longer shares file descriptors with other processes
  • CLONE_FS – unshare file system attributes so calling process no longer shares its root or current directory, or umask with other processes
  • CLONE_NEWIPC – unshare System V IPC namespace so calling process has private copy of System V IPC namespace not shraed with other processes
  • CLONE_NEWNET – unshare network namespace so calling process is moved to a new network namespace not shared with other processes
  • CLONE_NEWNS – unsure mount namespace
  • CLONE_NEWUTS – unsuare UTS IPC namespace
  • CLONE_SYSVSEM – unshare System V sempaphore undo values

set_robust_list

Set list of robust futexes.

long set_robust_list(struct robust_list_head *head, size_t len)
  • pid – thread/process id, or if 0 current process id is used
  • head – pointer to location of list head
  • len_ptr – length of head_ptr

Returns zero on success.

get_robust_list

Get list of robust futexes.

long get_robust_list(int pid, struct robust_list_head **head_ptr, size_t *len_ptr)
  • pid – thread/process id, or if 0 current process id is used
  • head – pointer to location of list head
  • len_ptr – length of head_ptr

Returns zero on success.

splice

Splice data to/from a pipe.

splice(int fd_in, loff_t *off_in, int fd_out, loff_t *off_out, size_t len, unsigned int flags)
  • fd_in – file descriptor referring to a pipe for input
  • fd_out – file descriptor referring to a pipe for output
  • off_in – null if fd_in refers to a pipe, otherwise points to offset for read
  • off_out– null if fd_out refers to a pipe, otherwise points to offset for write
  • len – total bytes to transfer
  • flags – defines additional behavior related to syscall

Returns number of bytes spliced to or from pipe.

flags

  • SPLICE_F_MOVE – try to move pages instead of copying
  • SPLICE_F_NONBLOCK – try not to block I/O
  • SPLICE_F_MORE – advise kernel that more data coming in subsequent splice
  • SPLICE_F_GIFT – only for vmsplice, gift user pages to kernel

tee

Duplicate pipe content.

tee(int fd_in, int fd_out, size_t len, unsigned int flags)
  • fd_in – file descriptor referring to a pipe for input
  • fd_out – file descriptor referring to a pipe for output
  • len – total bytes to transfer
  • flags – defines additional behavior related to syscall (see flags for splice)

Returns number of bytes duplicated between pipes.

sync_file_range

Sync filesegment with disk.

int sync_file_range(int fd, off64_t offset, off64_t nbytes, nsigned int flags)
  • fd – file descriptor of file in question
  • offset – offset to begin sync
  • nbytes – number of bytes to sync
  • flags – defines additional behavior

Returns zero on success.

flags

  • SYNC_FILE_RANGE_WAIT_BEFORE – wait after write of all pages in range already submitted to device driver before performing any write
  • SYNC_FILE_RANGE_WRITE – write all dirty pages in range already not submitted for write
  • SYNC_FILE_RANGE_WAIT_AFTER – wait after write of all pages in range before performing any write

vmsplice

Splice user pages into pipe.

ssize_t vmsplice(int fd, const struct iovec *iov, unsigned long nr_segs, unsigned int
 flags)
  • fd – file descriptor of pipe
  • iovec – pointer to array of iovec structures
  • nr_segs – ranges of user memory
  • flags – defines additional behavior (see splice)

Return number of bytes transferred into pipe.

move_pages

Move pages of process to another node.

long move_pages(int pid, unsigned long count, void **pages, const int
*nodes, int *status, int flags)
  • pid – process id
  • pages – array of pointers to pages to move
  • nodes – array of integers specifying location to move each page
  • status – array of integers to receive status of each page
  • flags – defines additional behavior

Returns zero on success.

flags

  • MPOL_MF_MOVE – move only pages in exclusvie use
  • MPOL_MF_MOVE_ALL – pages shared between multiple processes can also be moved

utimensat

Change timestamps with nanosecond precision.

int utimensat(int dirfd, const char *pathname, const struct timespec
 times[2], int flags)
  • dirfd – directory file descriptor
  • pathname – pointer to string with path of file
  • times – array of timestamps, where times[0] is new last access time and times[1] is new last modification time
  • flags – if AT_SYMLINK_NOFOLLOW specified, update timestamps on symlink

Returns zero on success.

epoll_pwait

Wait for I/O event on epoll file descriptor. Same as epoll_wait with a signal mask.

int epoll_pwait(int epfd, struct epoll_event *events, int maxevents, int timeout,
 const sigset_t *sigmask)
  • epfd – epoll file descriptor
  • events – pointer to epoll_event structure with events available to calling process
  • maxevents – maximum number of events, must e greater than zero
  • timeout – timeout in milliseconds
  • sigmask – signal mask to catch

Returns number of file descriptors ready for requested I/O or zero if timeout occured before any were available.

signalfd

Create file descriptor that can receive signals.

int signalfd(int fd, const sigset_t *mask, int flags)
  • fd – if -1, create new file descriptor, otherwise use existing file descriptor
  • mask – signal mask
  • flags – set to SFD_NONBLOCK to assign O_NONBLOCK on new file descriptor, or SFD_CLOEXEC to set FD_CLOEXEC flag on new file descriptor

Returns file descripor on success.

timerfd_create

Create timer that notifies a file descriptor.

int timerfd_create(int clockid, int flags)
  • clockid – specify CLOCK_REALTIME or CLOCK_MONOTONIC
  • flags – set to TFD_NONBLOCK to assign O_NONBLOCK on new file descriptor, or TFD_CLOEXEC to set FD_CLOEXEC flag on new file descriptor

Returns new file descriptor.

eventfd

Create file descriptor for event notification.

int eventfd(unsigned int initval, int flags)
  • initval – counter maintained by kernel
  • flags – define additional behavior

Returns new eventfd file descriptor.

flags

  • EFD_CLOEXEC – set close-on-exec flag on new file descriptor (FD_CLOEXEC)
  • EFD_NONBLOCK – set O_NONBLOCK on new file descriptor, saving extra call to fcntl to set this status
  • EFD_SEMAPHORE – perform semaphore-like semantics for reads from new file descriptor

fallocate

Allocate file space.

int fallocate(int fd, int mode, off_t offset, off_t len)
  • fd – file descriptor in question
  • mode – defines behavior
  • offset – starting range of allocation
  • len – length of allocation

mode

  • FALLOC_FL_KEEP_SIZE – do not change file size even if offset+len is greater than the original file size
  • FALLOC_FL_PUNCH_HOLE – deallocate space in specified range, zeroing blocks

timerfd_settime

Arms or disarms timer referenced by fd.

int timerfd_settime(int fd, int flags, const struct itimerspec *new_value,
 struct itimerspec *old_value)
  • fd – file descriptor
  • flags – set to 0 to start relative timer, or TFD_TIMER_ABSTIME to use absolute timer
  • new_value – pointer to itimerspec structure to set value
  • old_value – pointer to itimerspec structure to receive previous value after successful update

Returns zero on success.

timerfd_gettime

Get current setting of timer referenced by fd.

int timerfd_gettime(int fd, struct itimerspec *curr_value)
  • fd – file descriptor
  • curr_value – pointer to itimerspec structure with current timer value

Returns zero on success.

accept4

Same as accept syscall.

signalfd4

Same as signalfd syscall.

eventfd2

Same as eventfd without flags argument.

epoll_create1

Same as epoll_create without flags argument.

dup3

Same as dup2 except calling program can force close-on-exec flag to be set on new file descriptor.

pipe2

Same as pipe.

inotify_init1

Same as inotify_init without flags argument.

preadv

Same as readv but adds offset argument to mark start of input.

pwritev

Same as writev but adds offset argument to mark start of output.

rt_tgsigqueueinfo

Not intended for application use. Instead, use rt_sigqueue.

perf_event_open

Start performance monitoring.

int perf_event_open(struct perf_event_attr *attr, pid_t pid, int cpu, int group_fd,
 unsigned long flags)
  • attr – pointer to perf_event_attr structure for additional configuration
  • pid – process id
  • cpu – cpu id
  • group_fd – create event groups
  • flags – defines additional behavior options
struct perf_event_attr {
    __u32     type;         /* event type */
    __u32     size;         /* attribute structure size */
    __u64     config;       /* type-specific configuration */

   union {
        __u64 sample_period;    /* sampling period */
        __u64 sample_freq;      /* sampling frequency */
    };

   __u64      sample_type;  /* specify values included in sample */
   __u64      read_format;  /* specify values returned in read */

   __u64      disabled       : 1,   /* off by default */
              inherit        : 1,   /* inherited by children */
              pinned         : 1,   /* must always be on PMU */
              exclusive      : 1,   /* only group on PMU */
              exclude_user   : 1,   /* don't count user */
              exclude_kernel : 1,   /* don't count kernel */
              exclude_hv     : 1,   /* don't count hypervisor */
              exclude_idle   : 1,   /* don't count when idle */
              mmap           : 1,   /* include mmap data */
              comm           : 1,   /* include comm data */
              freq           : 1,   /* use freq, not period */
              inherit_stat   : 1,   /* per task counts */
              enable_on_exec : 1,   /* next exec enables */
              task           : 1,   /* trace fork/exit */
              watermark      : 1,   /* wakeup_watermark */
              precise_ip     : 2,   /* skid constraint */
              mmap_data      : 1,   /* non-exec mmap data */
              sample_id_all  : 1,   /* sample_type all events */
              exclude_host   : 1,   /* don't count in host */
              exclude_guest  : 1,   /* don't count in guest */
              exclude_callchain_kernel : 1, /* exclude kernel callchains */
              exclude_callchain_user   : 1, /* exclude user callchains */
              __reserved_1 : 41;

              union {
                __u32 wakeup_events;    /* every x events, wake up */
                __u32 wakeup_watermark; /* bytes before wakeup */
              };
 
              __u32 bp_type; /* breakpoint type */

             union {
                __u64 bp_addr; /* address of breakpoint*/
                __u64 config1; /* extension of config */
                };

             union {
                __u64 bp_len; /* breakpoint length */
                __u64 config2; /* extension of config1 */
             };
            
             __u64 branch_sample_type;  /* enum perf_branch_sample_type */
             __u64 sample_regs_user;    /* user regs to dump on samples */
             __u32 sample_stack_user;   /* stack size to dump on samples */
             __u32 __reserved_2;        /* align to u64 */

};

Returns new open file descriptor on success.

flags

  • PERF_FLAG_FD_NO_GROUP – allows creating event as part of event group without a leader
  • PERF_FLAG_FD_OUTPUT – reroute output from event to group leader
  • PERF_FLAG_PID_CGROUP – activate per-container full system monitoring

recvmmsg

Receive multiple messages on a socket using single syscall.

int recvmmsg(int sockfd, struct mmsghdr *msgvec, unsigned int vlen, unsigned int flags,
 struct timespec *timeout)
  • sockfd – socket file descriptor
  • msgvec – pointer to array of mmsghdr structures
  • vlen -size of msgvec array
  • flags – specify flags from recvmsg or specify MSG_WAITFORONE to activate MSG_DONTWAIT after receipt of first message
  • timeout – pointer to timespec structure specfying timeout

Returns number of messages received in msgvec on success.

fanotify_init

Create fanotify group.

int fanotify_init(unsigned int flags, unsigned int event_f_flags)
  • flags – defines additional parameters
  • event_f_flags – defines file status flags set on file descriptors created for fanotify events

Returns new file descriptor on success.

flags

  • FAN_CLASS_PRE_CONTENT – allow receipt of events notifying access or attempted access of a file before containing final content
  • FAN_CLASS_CONTENT – allow receipt of events notifying access or attempted access of a file containing final content
  • FAN_REPORT_FID – allow receipt of events containing info about filesystem related to an event
  • FAN_CLASS_NOTIF – default value, allowing only for receipt of events notifying file access

event_f_flags

  • O_RDONLY – read-only access
  • O_WRONLY – write-only access
  • O_RDWR – read/write access
  • O_LARGEFILE – support files exceeding 2 GB
  • O_CLOEXEC – enable close-on-exec flag for file descriptor

fanotify_mark

Add/remote/modify a fanotify mark on a file.

int fanotify_mark(int fanotify_fd, unsigned int flags, uint64_t mask,
int dirfd, const char *pathname)
  • fanotify_fd – file descriptor from fanotify_init
  • flags – defines additional behavior
  • mask – file mask
  • dirfd – use depends on flags and pathname, see dirfd below

Returns zero on success.

dirfd

  • If pathname is NULL, dirfd is a file descriptor to be marked
  • If pathname is NULL and dirfd is AT_FDCWD then current working directory is marked
  • If pathname is an absolute path, dirfd is ignored
  • If pathname is a relative path and dirfd is not AT_FDCWD, then pathname and dirfd define the file to be marked
  • If pathname is a relative path and dirfd is AT_FDCWD, then pathname is used to determine file to be marked

flags

  • FAN_MARK_ADD – events in mask are added to mark or ignore mask
  • FAN_MARK_REMOVE – events in mask are removed from mark or ignore mask
  • FAN_MARK_FLUSH – remove all masks for filesystems, for mounts, or all marks for files and directories from fanotify group
  • FAN_MARK_DONT_FOLLOW – if pathname is a symlink, mark link instead of file it refers
  • FAN_MARK_ONLYDIR – if object marked is not a directory, then raise error
  • FAN_MARK_MOUNT – mark mount point specified by pathname
  • FAN_MARK_FILESYSTEM – mark filesystem specified by pathname
  • FAN_MARK_IGNORED_MASK – events in mask will be added or removed from ignore mask
  • FAN_MARK_IGNORED_SURV_MODIFY – ignore mask will outlast modify events
  • FAN_ACCESS – create event when file or dir is accessed
  • FAN_MODIFY – create event when file is modified
  • FAN_CLOSE_WRITE – create event when file that is writable is closed
  • FAN_CLOSE_NOWRITE – create event when a file that is read-only or a directory is closed
  • FAN_OPEN – create event when file or dir opened
  • FAN_OPEN_EXEC – create event when file is opened to be executed
  • FAN_ATTRIB – create event when file or dir metadata is changed
  • FAN_CREATE – create event when file or dir is created in marked directory
  • FAN_DELETE – create event when file or dir is deleted in marked directory
  • FAN_DELETE_SELF – create event when marked file or dir is deleted
  • FAN_MOVED_FROM – create event when file or dir is moved in a marked directory
  • FAN_MOVED_TO – create event when file or dir has been moved to a marked directory
  • FAN_MOVE_SELF – create event when marked file or directory is moved
  • FAN_Q_OVERFLOW – create event when overflow of event queue occurs
  • FAN_OPEN_PERM – create event when a process requests permission to open file or directory
  • FAN_OPEN_EXEC_PERM – create event when a process requests permission to open a file to execute
  • FAN_ACCESS_PERM – create event when a process reqests permission to read a file or directory
  • FAN_ONDIR – create events for directories themselves are accessed
  • FAN_EVENT_ON_CHILD – create events applying to the immediate children of marked directories

name_to_handle_at

Returns file handle and mount ID for file specified by dirfd and pathname.

int name_to_handle_at(int dirfd, const char *pathname, struct file_handle
*handle, int *mount_id, int flags)
  • dirfd – directory file descriptor
  • pathname – pointer to string with full path to file
  • file_handle – pointer to file_handle structure
  • mount_id – pointer to filesystem mount containing pathname

Returns zero on success and mount_id is populated.

open_by_handle_at

Opens file corresponding to handle that is returned from name_to_handle_at syscall.

int open_by_handle_at(int mount_fd, struct file_handle *handle, int flags)
  • mount_fd – file descriptor
  • handle – pointer to file_handle structure
  • flags – same flags for open syscall
struct file_handle {
    unsigned int  handle_bytes;   /* size of f_handle (in/out) */
    int           handle_type;    /* type of handle (out) */
    unsigned char f_handle[0];    /* file id (sized by caller) (out) */
};

Returns a file descriptor.

syncfs

Flush filesystem cache specified by a file descriptor.

int syncfs(int fd)
  • fd – file descriptor residing on disk to flush

Returns zero on success.

sendmmsg

Send multiple messages via socket.

int sendmmsg(int sockfd, struct mmsghdr *msgvec, unsigned int vlen, int flags)
  • sockfd – file descriptor specifying socket
  • msgvec – pointer to mmsghdr structure
  • vlen – number of messages to send
  • flags – flags defining operation (same as sendto flags)
struct mmsghdr {
    struct msghdr msg_hdr;  /* header of message */
    unsigned int  msg_len;  /* bytes to transmit */
};

Returns number of messages sent from msgvec.

setns

Reassociate a thread with namespace.

int setns(int fd, int nstype)
  • fd – file descriptor specifying a namespace
  • nstype – specify type of namespace (0 allows any namespace)

Returns zero on success.

nsflag

  • CLONE_NEWCGROUP – file descriptor must reference cgroup namespace
  • CLONE_NEWIPC – file descriptor must reference IPC namespace
  • CLONE_NEWNET – file descriptor must reference network namespace
  • CLONE_NEWNS – file descriptor must reference a mount namespace
  • CLONE_NEWPID – file descriptor must reference descendant PID namespace
  • CLONE_NEWUSER – file descriptor must reference user namespace
  • CLONE_NEWUTS – file descriptor must reference UTS namespace

getcpu

Return CPU/NUMA node for calling process or thread.

int getcpu(unsigned *cpu, unsigned *node, struct getcpu_cache *tcache)
  • cpu – pointer to the CPU number
  • node – pointer to the NUMA node number
  • tcache – set to NULL (no longer used)

Returns zero on success.

process_vm_readv

Copy data between a remote (another) process and the local (calling) process.

ssize_t process_vm_readv(pid_t pid, const struct iovec *local_iov, unsigned long liovcnt,
const struct iovec *remote_iov, unsigned long riovcnt, unsigned long flags)
  • pid – source process ID
  • local_iov – pointer to iovec structure with details about local address space
  • liovcnt – number of elements in local_iov
  • remote_iov – pointer to iovec structure with details about remote address space
  • riovcnt– number of elements in remote_iov
  • flags – unused, set to 0

Returns number of bytes read.

process_vm_writev

Copy data from the local (calling) process to a remote (another) process.

ssize_t process_vm_writev(pid_t pid, const struct iovec *local_iov, unsigned long liovcnt,
 const struct iovec *remote_iov, unsigned long riovcnt, unsigned long flags)
  • pid – source process ID
  • local_iov – pointer to iovec structure with details about local address space
  • liovcnt – number of elements in local_iov
  • remote_iov – pointer to iovec structure with details about remote address space
  • riovcnt– number of elements in remote_iov
  • flags – unused, set to zero
struct iovec {
    void  *iov_base;    /* start address */
    size_t iov_len;     /* bytes to transfer */
};

Returns number of bytes written.

kcmp

Compare two processes to see if they share resources in the kernel.

int kcmp(pid_t pid1, pid_t pid2, int type, unsigned long idx1, unsigned long idx2)
  • pid1 – the first process ID
  • pid2 – the second process ID
  • type – type of resource to compare
  • idx1 – flag-specific resource index
  • idx2 – flag-specific resource index

Returns zero if processes share the same resource.

type flags

  • KCMP_FILE – check if file descriptors specified in idx1 and idx2 are shared by both processes
  • KCMP_FILES – check if the two processes share the same set of open file descriptors (idx1 and idx2 are not used)
  • KCMP_FS – check if the two processes share the same filesystem information (for example, the filesystem root, mode creation mask, working directory, etc.)
  • KCMP_IO – check if processes share the same I/O context
  • KCMP_SIGHAND – check if processes share same table of signal dispositions
  • KCMP_SYSVSEM – check if processes share same semaphore undo operations
  • KCMP_VM – check if processes share same address space
  • KCMP_EPOLL_TFD – check if file descriptor referenced in idx1 of process pid1 is present in epoll referenced by idx2 of process pid2, where idx2 is a structure kcmp_epoll_slot describing target file
struct kcmp_epoll_slot {
    __u32 efd;
    __u32 tfd;
    __u64 toff;
};

finit_module

Load module into kernel with module file specified by file descriptor.

int finit_module(int fd, const char *param_values, int flags)
  • fd – file descriptor of kernel module file to load
  • param_values – pointer to string with parameters for kernel
  • flags – flags for module load

Returns zero on success.

flags

  • MODULE_INIT_IGNORE_MODVERSIONS – ignore symbol version hashes
  • MODULE_INIT_IGNORE_VERMAGIC – ignore kernel version magic
]]>
Linux System Call Tutorial with C https://linuxhint.com/linux_system_call_tutorial_c/ Sun, 24 Nov 2019 19:41:41 +0000 https://linuxhint.com/?p=50606 In our last article on Linux System Calls, I defined a system call, discussed the reasons one might use them in a program, and delved into their advantages and disadvantages. I even gave a brief example in assembly within C. It illustrated the point and described how to make the call, but did nothing productive. Not exactly a thrilling development exercise, but it illustrated the point.

In this article, we’re going to use actual system calls to do real work in our C program. First, we’ll review if you need to use a system call, then provide an example using the sendfile() call that can dramatically improve file copy performance. Finally, we’ll go over some points to remember while using Linux system calls.

Do You Need a System Call?

While it’s inevitable you’ll use a system call at some point in your C development career, unless you are targeting high performance or a particular type functionality, the glibc library and other basic libraries included in major Linux distributions will take care of the majority of your needs.

The glibc standard library provides a cross-platform, well-tested framework to execute functions that would otherwise require system-specific system calls. For example, you can read a file with fscanf(), fread(), getc(), etc., or you can use the read() Linux system call. The glibc functions provide more features (i.e. better error handling, formatted IO, etc.) and will work on any system glibc supports.

On the other hand, there are times where uncompromising performance and exact execution are critical. The wrapper that fread() provides is going to add overhead, and although minor, isn’t entirely transparent. Additionally, you may not want or need the extra features the wrapper provides. In that case, you’re best served with a system call.

You can also use system calls to perform functions not yet supported by glibc. If your copy of glibc is up to date, this will hardly be an issue, but developing on older distributions with newer kernels might require this technique.

Now that you’ve read the disclaimers, warnings, and potential detours, now let’s dig into some practical examples.

What CPU Are We On?

A question that most programs probably don’t think to ask, but a valid one nonetheless. This is an example of a system call that cannot be duplicated with glibc and isn’t covered with a glibc wrapper. In this code, we’ll call the getcpu() call directly via the syscall() function. The syscall function works as follows:

syscall(SYS_call, arg1, arg2,);

The first argument, SYS_call, is a definition that represents the number of the system call. When you include sys/syscall.h, these are included. The first part is SYS_ and the second part is the name of the system call.

Arguments for the call go into arg1, arg2 above. Some calls require more arguments, and they’ll continue in order from their man page. Remember that most arguments, especially for returns, will require pointers to char arrays or memory allocated via the malloc function.

example1.c

#include <stdio.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <sys/types.h>
 
int main() {
 
    unsigned cpu, node;
 
    // Get current CPU core and NUMA node via system call
    // Note this has no glibc wrapper so we must call it directly
    syscall(SYS_getcpu, &cpu, &node, NULL);
 
    // Display information
    printf("This program is running on CPU core %u and NUMA node %u.\n\n", cpu, node);
 
    return 0;
 
}
 
To compile and run:
 
gcc example1.c -o example1
./example1

For more interesting results, you could spin threads via the pthreads library and then call this function to see on which processor your thread is running.

Sendfile: Superior Performance

Sendfile provides an excellent example of enhancing performance through system calls. The sendfile() function copies data from one file descriptor to another. Rather than using multiple fread() and fwrite() functions, sendfile performs the transfer in kernel space, reducing overhead and thereby increasing performance.

In this example, we’re going to copy 64 MB of data from one file to another. In one test, we’re going to use the standard read/write methods in the standard library. In the other, we’ll use system calls and the sendfile() call to blast this data from one location to another.

test1.c (glibc)

#include <stdio.h>
#include <stdlib.h>
#include <sys/file.h>
#include <sys/random.h>
 
#define BUFFER_SIZE 67108864
#define BUFFER_1 "buffer1"
#define BUFFER_2 "buffer2"
 
int main() {
 
    FILE *fOut, *fIn;
 
    printf("\nI/O test with traditional glibc functions.\n\n");
 
    // Grab a BUFFER_SIZE buffer.
    // The buffer will have random data in it but we don't care about that.
    printf("Allocating 64 MB buffer:                     ");
    char *buffer = (char *) malloc(BUFFER_SIZE);
    printf("DONE\n");
 
    // Write the buffer to fOut
    printf("Writing data to first buffer:                ");
    fOut = fopen(BUFFER_1, "wb");
    fwrite(buffer, sizeof(char), BUFFER_SIZE, fOut);
    fclose(fOut);
    printf("DONE\n");
 
    printf("Copying data from first file to second:      ");
    fIn = fopen(BUFFER_1, "rb");
    fOut = fopen(BUFFER_2, "wb");
    fread(buffer, sizeof(char), BUFFER_SIZE, fIn);
    fwrite(buffer, sizeof(char), BUFFER_SIZE, fOut);
    fclose(fIn);
    fclose(fOut);
    printf("DONE\n");
 
    printf("Freeing buffer:                              ");
    free(buffer);
    printf("DONE\n");
 
    printf("Deleting files:                              ");
    remove(BUFFER_1);
    remove(BUFFER_2);
    printf("DONE\n");
 
    return 0;
 
}

test2.c (system calls)

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/file.h>
#include <sys/sendfile.h>
#include <sys/random.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/fcntl.h>
 
#define BUFFER_SIZE 67108864
 
int main() {
 
    int fOut, fIn;
 
    printf("\nI/O test with sendfile() and related system calls.\n\n");
 
    // Grab a BUFFER_SIZE buffer.
    // The buffer will have random data in it but we don't care about that.
    printf("Allocating 64 MB buffer:                     ");
    char *buffer = (char *) malloc(BUFFER_SIZE);
    printf("DONE\n");
 

    // Write the buffer to fOut
    printf("Writing data to first buffer:                ");
    fOut = open("buffer1", O_RDONLY);
    write(fOut, &buffer, BUFFER_SIZE);
    close(fOut);
    printf("DONE\n");
 
    printf("Copying data from first file to second:      ");
    fIn = open("buffer1", O_RDONLY);
    fOut = open("buffer2", O_RDONLY);
    sendfile(fOut, fIn, 0, BUFFER_SIZE);
    close(fIn);
    close(fOut);
    printf("DONE\n");
 
    printf("Freeing buffer:                              ");
    free(buffer);
    printf("DONE\n");
 
    printf("Deleting files:                              ");
    unlink("buffer1");
    unlink("buffer2");
    printf("DONE\n");
 
    return 0;
 
}

Compiling and Running Tests 1 & 2

To build these examples, you will need the development tools installed on your distribution. On Debian and Ubuntu, you can install this with:

apt install build-essentials

Then compile with:

gcc test1.c -o test1 && gcc test2.c -o test2

To run both and test the performance, run:

time ./test1 && time ./test2

You should get results like this:

I/O test with traditional glibc functions.

Allocating 64 MB buffer:                     DONE
Writing data to first buffer:                DONE
Copying data from first file to second:      DONE
Freeing buffer:                              DONE
Deleting files:                              DONE
real    0m0.397s
user    0m0.000s
sys     0m0.203s
I/O test with sendfile() and related system calls.
Allocating 64 MB buffer:                     DONE
Writing data to first buffer:                DONE
Copying data from first file to second:      DONE
Freeing buffer:                              DONE
Deleting files:                              DONE
real    0m0.019s
user    0m0.000s
sys     0m0.016s

As you can see, the code that uses the system calls runs much faster than the glibc equivalent.

Things to Remember

System calls can increase performance and provide additional functionality, but they are not without their disadvantages. You’ll have to weigh the benefits system calls provide against the lack of platform portability and sometimes reduced functionality compared to library functions.

When using some system calls, you must take care to use resources returned from system calls rather than library functions. For example, the FILE structure used for glibc’s fopen(), fread(), fwrite(), and fclose() functions are not the same as the file descriptor number from the open() system call (returned as an integer). Mixing these can lead to issues.

In general, Linux system calls have fewer bumper lanes than glibc functions. While it’s true that system calls have some error handling and reporting, you’ll get more detailed functionality from a glibc function.

And finally, a word on security. System calls directly interface with the kernel. The Linux kernel does have extensive protections against shenanigans from user land, but undiscovered bugs exist. Don’t trust that a system call will validate your input or isolate you from security issues. It is wise to ensure the data you hand to a system call is sanitized. Naturally, this is good advice for any API call, but you cannot be to careful when working with the kernel.

I hope you enjoyed this deeper dive into the land of Linux system calls. For a full list of Linux System Calls, see our master list. ]]> Debian: debian_frontend=noninteractive https://linuxhint.com/debian_frontend_noninteractive/ Mon, 27 May 2019 17:45:33 +0000 https://linuxhint.com/?p=41020 In this guide we’ll discuss the advantages of Debian’s configuration engine, how configuration dialogs work, how to reactivate them after use, and how to suppress them with the DEBIAN_FRONTEND=noninteractive environment variable.

An Introduction to Debian’s Configuration Engine

Debian’s package management system is easily Linux’s most popular, powering Debian, Ubuntu, Linux Mint, MX Linux, and a host of other Debian-derivatives. The DEB package format contains far more than just the software binary files. It contains a wide assortment of control files that tell the package manager about software dependencies, start and stop instructions for daemon control, versions, license, authors, and a digital signature to guarantee integrity and authenticity.

These control files can be setup by the software publisher or maintainer to prompt the user for important configuration variables. These options save the user considerable time by keeping them from the sometimes tedious task of editing possibly multiple configuration files. If you’re a frequent user of Debian or its derivatives, you have probably seen screens (either text or graphical) asking for configuration details after installing a new or updated package.

Configure it Again, Apt

These scripts just aren’t meant for install time, either. If you wish to reconfigure the package, you can run:

dpkg-reconfigure package-name

Where package-name is the name of the package. If a configuration profile is present, you will be presented with those options again and given a chance to make changes.

For example, on a new Debian install, I run:

dpkg-reconfigure console-setup

To configure the text terminal console font, size, and character set. It’s far easier than setting these items manually.

Automation, Automation, Automation

Configuration prompts are great if you are interacting as a knowledgeable user, but in some cases, particularly in automation or scripting, you don’t want to prompt the user at all. In this case, quieting the configuration prompts is likely advantageous. To do this, run your apt command with the environment variable specified before it.

DEBIAN_FRONTEND=noninteractive apt-get -q -y install postfix

In this case, all configuration questions will be prompted and either the default selected (if specified), or, if not provided, no configuration will be performed on the package. The -q switch prevents messages from being displayed, and the -y switch answers yes to perform the installation or upgrade unattended.

To make the environment variable persist for your session, run:

export DEBIAN_FRONTEND=noninteractive

Once you log out or exit your shell, the environment variable will disappear or reset to the default. If you want to set it permanently, you can add it to your .bashrc or .zshrc file, however I don’t recommend this because you may miss important configuration questions in the future. That said, if you intend for the Debian system to never require user configuration, this may be desirable.

Preserving Configuration Files

During package installation or upgrade, Debian may wish to prompt the user on overwriting a configuration file. This preference can be appended to the installation command.

apt-get install -q -y \
-o Dpkg::Options::="--force-confdef" \
-o Dpkg::Options::="--force-confold" \
postfix

In this command, the installer is told to quiet any messages, assume yes, and then upgrade configuration files if no changes are present in the new package. If a previous configuration file is present, create a new file and don’t overwrite the old one.

If you don’t care about the configuration file and want to overwrite it, you can use:

apt-get install -q -y -o Dpkg::Options::="--force-confnew" postfix

Take care when using this option If you’re not absolutely certain that you don’t need the existing configuration and something goes wrong, you can create significant issues on your system or lose access to a remote system upon reboot or service restart.

Changing the Frontend

Though the primary purpose of this article is to explain the noninteractive switch, there are other parameters you can specify for DEBIAN_FRONTEND.

noninteractive

Do not ask any questions and assume the defaults.

dialog

Presents the user with the familiar text gray window on blue background. This is the default.

text

This removes the dialog interface and asks the configuration questions in a pure text-based format. This is well suited for slow connections or terminal emulators that don’t cooperate well with the dialog-based input and windowing system.

gtk

Prompts the user graphically using the GTK libraries. This may not work correctly on KDE. Also requires the package cdebconf-gtk and gkdebconf to be installed before use.

Conclusion

I hope this guide has helped you with your system administration and automation tasks through use of the DEBIAN_FRONTEND environment variable.

]]>
Understanding Load Average on Linux https://linuxhint.com/load_average_linux/ Mon, 14 Jan 2019 10:19:24 +0000 https://linuxhint.com/?p=35393 Load average is a measurement of the amount of work versus free CPU cycles available on a system processor. In this article I’ll define the term, demonstrate how Linux calculates this value, then provide insight into how to interpret system load.

Different Methods of Calculating Load

Before we dive into Linux load averages, we must explore the different ways load is calculated and address the most common measurement of CPU load – a percentage.

Windows calculates load differently from Linux, and since Windows has been historically more popular on the desktop, the Windows definition of load is generally understood by most computer users. Most Windows users have seen the system load in the task manager displayed as a percentage ranging from 0% to 100%.

In Windows this is derived by examining how “busy” the System Idle Process is and using the inverse to represent the system load. For example, if the idle thread is executing 99% of the time, CPU load in Windows would be 1%. This value is easy to understand but provides less overall detail about the true status of the system.

In Linux, the load average is instead is represented by a decimal number starting at 0.00. The value can be roughly defined as the number of processes over the past minute that had to wait their turn for execution. Unlike Windows, Linux load average is not an instant measurement. Load is given in three values – the one minute average, the five minute average, and the fifteen minute average.

Understanding Load Average in Linux

At first, this extra layer of detail seems unnecessary if you simply want to know the current state of CPU load in your system. But since the averages of three time periods are given, rather than an instant measurement, you can get a more complete idea of the change of system load over time in a single glance of three numbers

Displaying the load average is simple. On the command line, you can use a variety of commands. I simply use the “w” command:

root@virgo [~]# w
21:08:43 up 38 days,  4:344 users,  load average: 3.11, 2.75, 2.70

The rest of the command will display who’s logged on and what they’re executing, but for our purposes this information is irrelevant so I’ve clipped it from the above display.

In an ideal system, no process should be held up by another process (or thread), but in a single processor system, this occurs when the load goes above 1.00.

The words “single processor system” are incredibly important here. Unless you’re running an ancient computer, your machine probably has multiple CPU cores. In the machine I’m on, I have 16 cores:

root@virgo [~]# nproc
16

In this case, a load average of 3.11 is not alarming at all. It simply means that a bit more than three processes were ready to execute and CPU cores were present to handle their execution. On this particular system, the load would have to reach 16 to be considered at “100%”.

To translate this to a percent-based system load, you could use this simple, if not obtuse, command:

cat /proc/loadavg | cut -c 1-4 | echo "scale=2; ($(</dev/stdin)/`nproc`)*100" | bc -l

This command sequences isolates the 1-minute average via cut and echos it, divided by the number of CPU cores, through bc, a command-line calculator, to derive the percentage.

This value is by no means scientific but does provide a rough approximation of CPU load in percent.

A Minute to Learn, a Lifetime to Master

In the previous section I put the “100%” example of a load of 16.0 on a 16 CPU core system in quotes because the calculation of load in Linux is a bit more nebulous than Windows. The system administrator must keep in mind that:

  • Load is expressed in waiting processes and threads
  • It is not an instantaneous value, rather an average, and
  • It’s interpretation must include the number of CPU cores, and
  • May over-inflate I/O waits like disk reads

Because of this, getting a handle of CPU load on a Linux system is not entirely an empirical matter. Even if it were, CPU load alone is not an adequate measurement of overall system resource utilization. As such, an experienced Linux administrator will consider CPU load in concert with other values such as I/O wait and the percentage of kernel versus system time.

I/O Wait

I/O wait is most easily seen via the “top” command:

In the screenshot above I have highlighted the I/O wait value. This is a percentage of time that the CPU was waiting on input or output commands to complete. This is usually indicative of high disk activity. While a high wait percentage alone may not significantly degrade CPU-bound tasks, it will reduce I/O performance for other tasks and will make the system feel sluggish.

High I/O wait without any obvious cause might indicate a problem with a disk. Use the “dmesg” command to see if any errors have occurred.

Kernel vs. System Time

The above highlighted values represent the user and kernel (system) time. This is a breakdown of the overall consumption of CPU time by users (i.e. applications, etc.) and the kernel (i.e. interaction with system devices). Higher user time will indicate more CPU usage by programs where higher kernel time will indicate more system-level processing.

A Fairly Average Load

Learning the relationship of load average to actual system performance takes time, but before long you’ll see a distinct correlation. Armed with the intricacies of system performance metrics, you’ll be able to make better decisions about hardware upgrades and program resource utilization.

]]>
Debian AppArmor Tutorial https://linuxhint.com/debian_apparmor_tutorial/ Sat, 23 Jun 2018 17:42:52 +0000 https://linuxhint-com.zk153f8d-liquidwebsites.com/?p=27475 AppArmor is a mandatory access control system for Linux. In a mandatory access control system (MAC), the kernel imposes restrictions on paths, sockets, ports, and various input/output mechanisms. It was developed by Immunex and now is maintained by SUSE. It has been part of the Linux kernel since version 2.6.36.

While the Linux kernel provides good isolation of users and strong file permission control, a MAC like AppArmor provides more finely-grained permissions and protection against many unknown threats. If a security vulnerability is found in the Linux kernel or other system daemon, a well-configured AppArmor system can prevent access to critical paths that could be vulnerable to the issue.

AppArmor can work in effectively two modes – enforce and complain. Enforce is the default production status of AppArmor, while complain is useful for developing a rule set based on real operation patterns and for logging violations. It is configured via plain text files in a relatively friendly format and has a shorter learning curve than most other mandatory access control systems.

Installation

To install AppArmor on Debian, run (as root):

apt install apparmor apparmor-utils auditd

You may omit auditd if you don’t need profile generation tools.

If you wish to install starter and additional profiles, run:

apt install apparmor-profiles apparmor-profiles-extra

 

Since AppArmor is a Linux kernel module, you must enable it with the following commands:

mkdir -p /etc/default/grub.d

Create the file /etc/default/grub.d/apparmor.cfg with the following contents:

GRUB_CMDLINE_LINUX_DEFAULT="$GRUB_CMDLINE_LINUX_DEFAULT apparmor=1 security=apparmor"

Save and exit, then run:

update-grub

Then reboot.

There is debate if this should be done automatically. You may wish to consult the end of this bug report to see if this has been changed since the time of this writing.

Once you reboot, you can check to see if AppArmor is enabled by running:

aa-status

This command will list loaded AppArmor profiles and list their current state of compliance (enforced, complain, etc.)

If you run:

ps auxZ | grep -v '^unconfined'

You’ll see a list of programs that are confined by an AppArmor profile. A confined program is one that is affected and limited (either passively, in complain mode, or actively in enforced mode) by AppArmor.

Changing Modes / Disabling AppArmor

If you wish to disable AppArmor because a program isn’t working, you may wish to consider placing the profile in complain mode instead of enforced mode. To do this, run (as root, or via sudo):

aa-complain /path/to/program

For example, if ping won’t work correctly, use:

aa-complain /usr/bin/ping

Once a profile is in complain mode you can examine the logging via /var/log/syslog or with journalctl -xe on systemd systems (Debian 8.x, Jessie, and higher).

Once you have edited the profile to remove or adjust the restriction, you can turn on enforce mode again for the binary with:

aa-enforce /path/to/program

In the example above, replace /path/to/program with the full path to the binary affected by the profile in question.

If you have a problem with a program and it’s in complain mode, the logs will provide specific information about what action was denied. The operation field will explain what the program tried to do, the profile field the specific profile affected, name will specify the target of the action (i.e. what file was stopped from a read or write operation), and the requested and denied masks indicate if the operation, both requested by the program and denied per the profile, was read or read-write.

You can disable a profile entirely by running:

aa-disable /path/to/program

Or, you can disable AppArmor completely by editing the file: /etc/default/grub.d/apparmor.cfg to contain:

GRUB_CMDLINE_LINUX_DEFAULT=”$GRUB_CMDLINE_LINUX_DEFAULT apparmor=0

Then running:

update-grub

And rebooting your system.

Working with AppArmor Profiles

AppArmor profiles reside in the /etc/apparmor.d/ directory. If you install the apparmor-profiles and apparmor-profiles-extra packages package, you’ll find profiles in /usr/share/doc/apparmor-profiles and /usr/share/doc/apparmor-profiles/extra. To activate them, copy the files into /etc/apparmor.d then edit them to ensure they contain the values you want, save, then run:

service apparmor reload

If you wish to reload just one profile, run:

apparmor_parser -r /etc/apparmor.d/profile

Where “profile” is the name of the profile in question.

It is not recommended to just copy the profiles and extra profiles into the /etc/apparmor.d directory without hand-editing them. Some profiles may be old and some most certainly will not contain the values you want. If you do copy them all, at least set them to complain so that you can monitor violations without breaking programs in production:

cd /etc/apparmor.d
for f in *.* ; do aa-complain /etc/apparmor.d/$f; done

You can use the aa-enforce command individually to enable profiles you wish to keep, tune the ones that cause issues and enforce those, or remove ones you don’t need by running aa-disable or removing the profile file from /etc/apparmor.d.

Creating an AppArmor Profile

Before you create a custom profile, you will want to search the /etc/apparmor.d and /usr/share/doc/apparmor-profiles directories for an existing profile that covers the binary in question. To search these, run:

find /usr/share/doc/apparmor-profiles | grep “program” -i

Replace program with the program you want to protect with AppArmor. If you find one, copy it to /etc/apparmor.d and then edit the file in your favorite text editor.

Each profile comprises of three main sections: includes, capabilities, and paths. You can find a helpful reference in SuSE’s documentation.

Includes

Includes provide syntax that you can use inside the file. They use the C/C++ #include <> syntax and usually reference abstractions found in the /etc/apparmor.d/abstractions directory.

Capabilities

The capabilities section, typically found after the includes, lists specific capabilities that the program can perform. For example, you can let a program perform a setuid operation with:

capability setuid

The capability net_bind_service allows a program to bind to a network port. If you don’t grant this, a server daemon like Apache can’t open port 80 and listen. However, omitting this capability can provide excellent security for processes you don’t trust on the network.

Paths

You may list paths that the program is able to read (and possibly write). For example, if you want to allow the program to access the /etc/passwd file, add:

/etc/passwd r

In the profile. Note the “r” – this means read only. If you change this to “w”, writing to this path or file will be allowed.

Even if you allow a path in AppArmor, it is still subject to Linux file system restrictions (i.e. set with chmod, chgrp, and chown). However, AppArmor will still provide an extra layer of protection should those mechanisms be compromised.

Conclusion

The key to a successful AppArmor deployment is to set profiles to complain, then enforce. Careful log examination will give you the minimal paths and capabilities needed for successful program operation. By assigning these and no more you will dramatically increase your system security.

]]>
What Is a Linux System Call? https://linuxhint.com/what-is-a-linux-system-call/ Thu, 18 Jan 2018 06:47:22 +0000 https://linuxhint-com.zk153f8d-liquidwebsites.com/?p=21626

First Things First

Before we delve into the definition of a Linux system call and examine the details of its execution, it is best to start with defining the various software layers of a typical Linux system.

The Linux kernel is a specialized program that boots and runs at the lowest available level on your hardware. It has the task of orchestrating everything that runs on the computer, including handling keyboard, disk, and network events to providing time slices for executing multiple programs in parallel.

When the kernel executes a user-level program, it virtualizes the memory space so that programs believe they are the only process running in memory. This protective bubble of hardware and software isolation increases security and reliability. An unprivileged application cannot access memory belonging to other programs, and if that program crashes, the kernel terminates so that it cannot harm the rest of the system.

Breeching the Barrier with Linux System Calls

This layer of isolation between unprivileged applications provides an excellent boundary to protect other applications and users on the system. However, without some way to interface with the other elements in the computer and the outside world, programs wouldn’t be able to accomplish much of anything.

To facilitate interaction, the kernel designates a software gate that allows the running program to request that the kernel act on its behalf. This interface is known as a system call.

Since Linux follows the UNIX philosophy of “everything is a file”, many functions can be performed by opening and reading or writing to a file, which could be a device. On Windows, for example, you might use a function called CryptGenRandom to access random bytes. But on Linux, this can be done by simply opening the “file” /dev/urandom and reading bytes from it using standard file input/output system calls. This crucial difference allows for a simpler system call interface.

Wafer-Thin Wrapper

In most applications, system calls are not made directly to the kernel. Virtually all programs link in the standard C library, which provides a thin but important wrapper around Linux system calls. The library makes sure that the function arguments are copied into the correct processor registers then issues the corresponding Linux system call. When data is received from the call, the wrapper interprets the results and returns it back to the program in a consistent way.

Behind the Scenes

Every function in a program that interacts with the system is eventually translated into a system call. To see this in action, let’s start with a basic example.

void main() {
}

This is probably the most trivial C program you will ever see. It simply gains control via the main entry point and then exits. It doesn’t even return a value since main is defined as void. Save the file as ctest.c and let’s compile it:

gcc ctest.c -o ctest

Once it’s compiled, we can see the file size as 8664 bytes. It may vary slightly on your system, but it should be around 8k. That’s a lot of code just to enter and exit! The reason it’s 8k is that the libc runtime is being included. Even if we strip the symbols, it’s still a tad over 6k.

In an even simpler example, we can make the Linux system call to exit rather than depending on the C runtime to do that for us.

void _start() {
    asm("movl $1,%eax;"
    "xorl %ebx,%ebx;"
    "int  $0x80");
}

Here we move 1 into the EAX register, clear out the EBX register (which would otherwise contain the return value) then call the Linux system call interrupt 0x80 (or 128 in decimal). This interrupt triggers the kernel to process our call.

If we compile our new example, called asmtest.c, and strip out the symbols and exclude the standard library:

gcc -s -nostdlib asmtest.c -o asmtest

we’ll produce a binary less than 1k (on my system, it yields 984 bytes). Most of this code is executable headers. We now are calling the direct Linux system call.

For All Practical Purposes

In nearly all cases, you won’t ever have to make direct system calls in your C programs. If you use assembly language, however, the need may arise. However, in optimization, it would be best to let the C library functions make the system calls and have only your performance-critical code embedded in the assembly directives.

How to Program System Call Tutorials

List of All System Calls

If you want to see a list of all available system calls for Linux you can check these reference pages: Full List of System Calls on LinuxHint.com, filippo.io/linux-syscall-table/ and or syscalls.kernelgrok.com ]]> Running Your Own Production Email Server https://linuxhint.com/running-your-own-production-email-server/ Fri, 22 Dec 2017 01:23:58 +0000 https://linuxhint-com.zk153f8d-liquidwebsites.com/?p=20881 Email is hard.

You should know that up front. It isn’t for the faint of heart. Turn around and don’t look back, and all that stuff.

Now that the proper warnings are out of the way, let’s explore the most common available options in running your own email server. I’ll step through the pros and cons of each approach and hopefully give you the insight you need in making this difficult decision.

Why Run Your Own Mail Server?

Privacy is the main concern. Google scans your email to show related advertising. Even though this is done automatically and supposedly no human ever sees it, this still doesn’t sit well with some. Microsoft and others claim not to do this, but the Edward Snowden leaks about the NSA’s links to most major email providers, including Google and Microsoft, make it clear that your email with one of these providers will be scanned.

It is worth mentioning that privacy is a tricky thing to achieve in email. If you send an email to someone using Google or Microsoft’s mail servers, your communication with that person will be scanned and analyzed just the same. Privacy, at least as much as it is possible in email without the use of PGP encryption, is only somewhat guaranteed as long as you communicate with someone who either uses the same server or uses a server with similar levels of data privacy.

Cost is often a concern as well, though providers generally offer mailboxes from less than $10 USD per month. This may seem expensive compared to the cost of a small virtual server, especially if you have many users, but it’s worth taking into consideration the administration time in setup and maintenance, as well as the cost of any involved commercial software.

Potential Hurdles

It’s important to know what you’re getting into with running your own production email server. While you gain privacy and can reduce costs, you do have to do maintenance, even in a fully automatic system. In addition to normal system administration duties like security and bugfix updates, you’ll have to deal with diagnosing bounce notifications, adjusting mailbox quotas, and dealing with blacklists.

Blacklists are both a blessing and a curse for mail administrators. By validating incoming mail against them, you can reduce a lot of SPAM. However, you also must be careful to not end up on one through the actions of your users. A mail only server is not likely to run into this issue unless you have a compromised account or rogue user, but if you do web hosting on the same server you must make absolute sure that all web scripts are kept up to date. A server that hosts WordPress sites, for example, makes a poor choice to host email unless you are diligent about keeping your sites updated and secure.

Should I Run My Own Email Server?

If you aren’t comfortable with running your own server, don’t know how to fix email server issues, and can’t tolerate reception and delivery issues, running your own email server isn’t for you. In this case, I’d recommend checking out offerings from Google or Microsoft, or one of the many smaller providers.

It’s worth mentioning that you may still have the occasional reliability issue even with small providers. The presence of Google and Microsoft in this market is strong and they both tend to run the show. Other providers must constantly adapt to the standards they use and enforce.

Despite the warnings and pitfalls, there are some solid advantages to running your own mail server. Let’s explore the options.

Option 1 – Use Commercial Software Like cPanel

cPanel is a web hosting platform system that installs on RedHat Enterprise Linux or CentOS and reconfigures the system to provide a full array of services, including email. cPanel uses the Exim mail transfer agent (MTA) and has a very advanced configuration engine and spam detection system via SpamAssassin.

The amount of options available for customization via an easy-to-use graphical interface are numerous and can be overwhelming. However, the default configuration is very functional and will work for most users right out of the box. Users are offered a pre-package configuration of three webmail systems – Horde, Squirrelmail, and RoundCube. Also included is excellent support for the POP3, IMAP, and SMTP protocols, mobile support, calendar and contact sharing on iOS devices, and even full-text mailbox searching.

Licenses for virtual dedicated servers cost around $10 to $20 USD per month, depending on license vendor. It may come bundled with your server at no cost. You’ll also receive support from both your datacenter license provider and, as a last resort for more complex issues, cPanel.

Option 2 – Webmin / Virtualmin

Webmin provides an easy-to-install and configure solution for web and email hosting via a dual license plugin called Virtualmin. While similar in scope to cPanel, it doesn’t have as much user interface polish. That said, with simple configuration via a web interface, it is entirely useable and provides a significant shortcut to live production email.

Webmin/Virtualmin are aimed at a more advanced audience. While a novice could certainly install Webmin via the simple installer script provided, more command line and hands-on configuration is required over a system like cPanel. Webmin does provide far more customization options than cPanel, but this flexibility is generally provided via SSH commands and editing configuration files rather than via the graphical interface.

Webmin is open source and can be installed on a wide variety of Linux systems, including RedHat Enterprise Linux, CentOS, Debian, Ubuntu, and Arch. It also has an optional commercial license and support.

Option 3 – Rolling Your Own

cPanel and Webmin provide excellent default configurations, but these systems are resistant to extreme customization. Webmin tolerates this better than cPanel, but with either solution you’d be better off using the workflow and methods described in the documentation. Custom integrations with other systems may break cPanel or Webmin.

If you need something more custom, or prefer to avoid having a third-party software solution managing your email system, rolling your own is probably the best way to go. This isn’t difficult, but for scale installations it does require a bit of tool creation to ensure new users are added correctly, existing users are maintained, passwords are reset with secure values, and new virtual domains are routed correctly.

This option does require the most up-front work and knowledge. Building your own email server from scratch also requires more maintenance to ensure system updates don’t break your workflow and management system. That said, you’ll end up with a system that is truly your own and is configured in the precise manner you need.

Conclusion

If you are considering running your own mail server, I strongly recommend weighing the pros and cons before committing to the project. It’s a lot of work, both up-front and on an ongoing basis, but the benefits to privacy, security, and customization are hard to beat.

]]>
GPU Programming with C++ https://linuxhint.com/gpu-programming-cpp/ Sat, 09 Dec 2017 10:57:09 +0000 https://linuxhint-com.zk153f8d-liquidwebsites.com/?p=20666

Overview

In this guide, we’ll explore the power of GPU programming with C++. Developers can expect incredible performance with C++, and accessing the phenomenal power of the GPU with a low-level language can yield some of the fastest computation currently available.

Requirements

While any machine capable of running a modern version of Linux can support a C++ compiler, you’ll need an NVIDIA-based GPU to follow along with this exercise. If you don’t have a GPU, you can spin up a GPU-powered instance in Amazon Web Services or another cloud provider of your choice.

If you choose a physical machine, please ensure you have the NVIDIA proprietary drivers installed. You can find instructions for this here: https://linuxhint.com/install-nvidia-drivers-linux/

In addition to the driver, you’ll need the CUDA toolkit. In this example, we’ll use Ubuntu 16.04 LTS, but there are downloads available for most major distributions at the following URL: https://developer.nvidia.com/cuda-downloads

For Ubuntu, you would choose the .deb based download. The downloaded file will not have a .deb extension by default, so I recommend renaming it to have a .deb at the end. Then, you can install with:

sudo dpkg -i package-name.deb

You will likely be prompted to install a GPG key, and if so, follow the instructions provided to do so.

Once you’ve done that, update your repositories:

sudo apt-get update
sudo apt-get install cuda -y

Once done, I recommend rebooting to ensure everything is properly loaded.

The Benefits of GPU Development

CPUs handle many different inputs and outputs and contain a large assortment of functions for not only dealing with a wide assortment of program needs but also for managing varying hardware configurations. They also handle memory, caching, the system bus, segmenting, and IO functionality, making them a jack of all trades.

GPUs are the opposite – they contain many individual processors that are focused on very simple mathematical functions. Because of this, they process tasks many times faster than CPUs. By specializing in scalar functions (a function that takes one or more inputs but returns only a single output), they achieve extreme performance at the cost of extreme specialization.

Example Code

In the example code, we add vectors together. I have added a CPU and GPU version of the code for speed comparison.
gpu-example.cpp contents below:

#include "cuda_runtime.h"
#include <stdio.h>
#include <stdlib.h>
#include <iostream>
#include <cstdio>
#include <chrono>

typedef std::chrono::high_resolution_clock Clock;

#define ITER 65535

// CPU version of the vector add function
void vector_add_cpu(int *a, int *b, int *c, int n) {
    int i;

    // Add the vector elements a and b to the vector c
    for (i = 0; i < n; ++i) {
    c[i] = a[i] + b[i];
    }
}

// GPU version of the vector add function
__global__ void vector_add_gpu(int *gpu_a, int *gpu_b, int *gpu_c, int n) {
    int i = threadIdx.x;
    // No for loop needed because the CUDA runtime
    // will thread this ITER times
    gpu_c[i] = gpu_a[i] + gpu_b[i];
}

int main() {

    int *a, *b, *c;
    int *gpu_a, *gpu_b, *gpu_c;

    a = (int *)malloc(ITER * sizeof(int));
    b = (int *)malloc(ITER * sizeof(int));
    c = (int *)malloc(ITER * sizeof(int));

    // We need variables accessible to the GPU,
    // so cudaMallocManaged provides these
    cudaMallocManaged(&gpu_a, ITER * sizeof(int));
    cudaMallocManaged(&gpu_b, ITER * sizeof(int));
    cudaMallocManaged(&gpu_c, ITER * sizeof(int));

    for (int i = 0; i < ITER; ++i) {
        a[i] = i;
        b[i] = i;
        c[i] = i;
    }

    // Call the CPU function and time it
    auto cpu_start = Clock::now();
    vector_add_cpu(a, b, c, ITER);
    auto cpu_end = Clock::now();
    std::cout << "vector_add_cpu: "
    << std::chrono::duration_cast<std::chrono::nanoseconds>(cpu_end - cpu_start).count()
    << " nanoseconds.\n";

    // Call the GPU function and time it
    // The triple angle brakets is a CUDA runtime extension that allows
    // parameters of a CUDA kernel call to be passed.
    // In this example, we are passing one thread block with ITER threads.
    auto gpu_start = Clock::now();
    vector_add_gpu <<<1, ITER>>> (gpu_a, gpu_b, gpu_c, ITER);
    cudaDeviceSynchronize();
    auto gpu_end = Clock::now();
    std::cout << "vector_add_gpu: "
    << std::chrono::duration_cast<std::chrono::nanoseconds>(gpu_end - gpu_start).count()
    << " nanoseconds.\n";

    // Free the GPU-function based memory allocations
    cudaFree(a);
    cudaFree(b);
    cudaFree(c);

    // Free the CPU-function based memory allocations
    free(a);
    free(b);
    free(c);

    return 0;
}

Makefile contents below:

INC=-I/usr/local/cuda/include
NVCC=/usr/local/cuda/bin/nvcc
NVCC_OPT=-std=c++11

all:
    $(NVCC) $(NVCC_OPT) gpu-example.cpp -o gpu-example

clean:
    -rm -f gpu-example

To run the example, compile it:

make

Then run the program:

./gpu-example

As you can see, the CPU version (vector_add_cpu) runs considerably slower than the GPU version (vector_add_gpu).

If not, you may need to adjust the ITER define in gpu-example.cu to a higher number. This is due to the GPU setup time being longer than some smaller CPU-intensive loops. I found 65535 to work well on my machine, but your mileage may vary. However, once you clear this threshold, the GPU is dramatically faster than the CPU.

Conclusion

I hope you’ve learned a lot from our introduction into GPU programming with C++. The example above doesn’t accomplish a great deal, but the concepts demonstrated provide a framework that you can use to incorporate your ideas to unleash the power of your GPU.

]]>
GPU Programming with Python https://linuxhint.com/gpu-programming-python/ Sat, 02 Dec 2017 11:13:17 +0000 https://linuxhint-com.zk153f8d-liquidwebsites.com/?p=20457 In this article, we’ll dive into GPU programming with Python. Using the ease of Python, you can unlock the incredible computing power of your video card’s GPU (graphics processing unit). In this example, we’ll work with NVIDIA’s CUDA library.

Requirements

For this exercise, you’ll need either a physical machine with Linux and an NVIDIA-based GPU, or launch a GPU-based instance on Amazon Web Services. Either should work fine, but if you choose to use a physical machine, you’ll need to make sure you have the NVIDIA proprietary drivers installed, see instructions: https://linuxhint.com/install-nvidia-drivers-linux

You’ll also need the CUDA Toolkit installed. This example uses Ubuntu 16.04 LTS specifically, but there are downloads available for most major Linux distributions at the following URL: https://developer.nvidia.com/cuda-downloads

I prefer the .deb based download, and these examples will assume you chose that route. The file you download is a .deb package but doesn’t have a .deb extension, so renaming it to have a .deb at the end his helpful. Then you install it with:

sudo dpkg -i package-name.deb

If you are prompted about installing a GPG key, please follow the instructions given to do so.

Now you’ll need to install the cuda package itself. To do so, run:

sudo apt-get update
sudo apt-get install cuda -y

This part can take a while, so you might want to grab a cup of coffee. Once it’s done, I recommend rebooting to ensure all modules are properly reloaded.

Next, you’ll need the Anaconda Python distribution. You can download that here:  https://www.anaconda.com/download/#linux

Grab the 64-bit version and install it like this:

sh Anaconda*.sh

(the star in the above command will ensure that the command is ran regardless of the minor version)

The default install location should be fine, and in this tutorial, we’ll use it. By default, it installs to ~/anaconda3

At the end of the install, you’ll be prompted to decide if you wish to add Anaconda to your path. Answer yes here to make running the necessary commands easier. To ensure this change takes place, after the installer finishes completely, log out then log back in to your account.

More info on Installing Anaconda: https://linuxhint.com/install-anaconda-python-on-ubuntu/

Finally we’ll need to install Numba. Numba uses the LLVM compiler to compile Python to machine code. This not only enhances performance of regular Python code but also provides the glue necessary to send instructions to the GPU in binary form. To do this, run:

conda install numba

Limitations and Benefits of GPU Programming

It’s tempting to think that we can convert any Python program into a GPU-based program, dramatically accelerating its performance. However, the GPU on a video card works considerably differently than a standard CPU in a computer.

CPUs handle a lot of different inputs and outputs and have a wide assortment of instructions for dealing with these situations. They also are responsible for accessing memory, dealing with the system bus, handling protection rings, segmenting, and input/output functionality. They are extreme multitaskers with no specific focus.

GPUs on the other hand are built to process simple functions with blindingly fast speed. To accomplish this, they expect a more uniform state of input and output. By specializing in scalar functions. A scalar function takes one or more inputs but returns only a single output. These values must be types pre-defined by numpy.

Example Code

In this example, we’ll create a simple function that takes a list of values, adds them together, and returns the sum. To demonstrate the power of the GPU, we’ll run one of these functions on the CPU and one on the GPU and display the times. The documented code is below:

import numpy as np
from timeit import default_timer as timer
from numba import vectorize

# This should be a substantially high value. On my test machine, this took
# 33 seconds to run via the CPU and just over 3 seconds on the GPU.
NUM_ELEMENTS = 100000000

# This is the CPU version.
def vector_add_cpu(a, b):
  c = np.zeros(NUM_ELEMENTS, dtype=np.float32)
  for i in range(NUM_ELEMENTS):
    c[i] = a[i] + b[i]
  return c

# This is the GPU version. Note the @vectorize decorator. This tells
# numba to turn this into a GPU vectorized function.
@vectorize(["float32(float32, float32)"], target='cuda')
def vector_add_gpu(a, b):
  return a + b;

def main():
  a_source = np.ones(NUM_ELEMENTS, dtype=np.float32)
  b_source = np.ones(NUM_ELEMENTS, dtype=np.float32)

  # Time the CPU function
  start = timer()
  vector_add_cpu(a_source, b_source)
  vector_add_cpu_time = timer() - start

  # Time the GPU function
  start = timer()
  vector_add_gpu(a_source, b_source)
  vector_add_gpu_time = timer() - start

  # Report times
  print("CPU function took %f seconds." % vector_add_cpu_time)
  print("GPU function took %f seconds." % vector_add_gpu_time)

  return 0

if __name__ == "__main__":
  main()

To run the example, type:

python gpu-example.py

NOTE: If you run into issues when running your program, try using “conda install accelerate”.

As you can see, the CPU version runs considerably slower.

If not, then your iterations are too small. Adjust the NUM_ELEMENTS to a larger value (on mine, the breakeven mark seemed to be around 100 million). This is because the setup of the GPU takes a small but noticeable amount of time, so to make the operation worth it, a higher workload is needed. Once you raise it above the threshold for your machine, you’ll notice substantial performance improvements of the GPU version over the CPU version.

Conclusion

I hope you’ve enjoyed our basic introduction into GPU Programming with Python. Though the example above is trivial, it provides the framework you need to take your ideas further utilizing the power of your GPU.

]]>
Debian Static IP Configuration https://linuxhint.com/debian-static-ip-configuration/ Sat, 11 Nov 2017 11:19:04 +0000 https://linuxhint-com.zk153f8d-liquidwebsites.com/?p=19932 How to Setup a Static IP Address on Debian 9

In this guide, I’ll show you how to set a static IP in your Debian desktop or server installation. By default, the Debian installer will try to fetch an IP via DHCP. In most cases this is desirable because it’s simple and it works with no configuration, especially in a home setting.

However, if you intend for your computer to be a server or want to predictably address it via a fixed IP address, assigning it a static IP is your best choice. However, before we get started, you’ll need to make sure that the IP you want to give your machine is unique and not being used on your network.

Determining What IP to Use

If you are setting up a Debian server in a data center environment, your data center will give you the information to use. If you’re allocating them yourself, check your router and other computers to see what network configuration values they are using.

For example, if your router is addressable via the IP address 192.168.1.254, then valid IPs would likely be 192.168.1.1 to 192.168.1.253. That said, you’ll want to check other IP addresses allocated to ensure the one you want to use is free.

The quickest and easiest way to obtain netmask and gateway settings would be to look at other machines on the network. If they are working correctly, you can generally trust those settings, especially if they use DHCP to automatically connect to the network. On Windows machines, the ipconfig command on the command line will show you the details of that machine’s network settings. For macOS and Linux machines, the ifconfig or ip addr show command will do the same.

Console Method After Install

This is the desired (and probably only) method to use for a Debian machine intended to be used a server, especially if it is in a data center.

Via SSH or a local terminal, we need to become root. Either login as root or become root with the su command. You may used to be using sudo to run root commands, but sudo isn’t configured by default on a fresh Debian install, so for this tutorial we’ll use su.

Once you are root, run:

ip link show

This will show a list of all of your network devices. Once you know the name of your network card, run:

nano /etc/network/interfaces

If you prefer to use vim, emacs, or another editor, substitute nano for the editor name of your choice.

Once you’re in the file, you can add the following configuration lines to add your dedicated IP. Please note that you must change eth0 to the name of your network device that we discovered earlier, and the IP address, netmask, and gateway to values we previously discovered.

auto eth0
  iface eth0 inet static
    address 192.168.1.200
    netmask 255.255.255.0
    gateway 192.168.1.254

Save the file (in nano this is done with CTRL+X, then hit Y when asked to save), then you can either run (as root):

systemctl restart networking.service

Or simply reboot your machine to activate the new IP.

If you want to add multiple IP addresses to the same interface, or perhaps add an IP alias to the same interface, use eth0:0, eth0:1, etc. (replacing eth0 with your device name), incrementing the value after the colon, for each additional IP address you want to add.

Graphical Method After Install

If your system is already installed and you don’t have a graphical desktop configured, you can use the console method as previously described. However, if you have a graphical desktop enabled, you can use the NetworkManager configuration screens. The screenshots and instructions are for the MATE desktop, but with all desktop environments the instructions will be very similar.

In MATE, click System -> Preferences -> Internet and Network -> Network Connections:

Then select the appropriate connection (most likely Wired Connection 1) and click Edit:

On the next screen, click IPv4 Settings then click Add, then enter the IP, netmask, and default gateway. In the example below I have added our example values as previously used, so be sure to change the values to match your requirements.

When you’re done, click Save, and the static IP will be added.

You may also use this method to add multiple IP addresses to the same interface, or to define IPv6 or any other required settings.

Graphical Method During Install

This method is best for home or small office installations where the Debian desktop interface is used. During the network detection phase of the installer, you can click cancel, which will take you to a screen like this:

Clicking continue will lead to the next screen where you’ll have an opportunity to configure the network manually. Select that option then click Continue again.

After this you’ll be prompted for network information (i.e. IP address, netmask, etc.).

If you miss the opportunity to click cancel during network detection, you can click “Go Back” and select “Configure the network” to achieve the same result.

Text Method During Install

The text method during install is identical to the graphical method shown above except the screen will have text-driven menus. The prompts and steps are otherwise the same.

]]>
OProfile Tutorial https://linuxhint.com/oprofile-tutorial/ Fri, 03 Nov 2017 21:56:02 +0000 https://linuxhint-com.zk153f8d-liquidwebsites.com/?p=19860 OProfile is a performance profiler for Linux. In this article, we’ll explore what it does, how to install and configure it, and how to put the data it assembles to use.

You might wonder why you would need a tool like this as there are plenty of good performance analysis tools available by default on most Linux distributions. Every install includes tools like top and vmstat, and tracing utilities like strace are usually just an apt-get away. Where does OProfile fit in?

The tools previously mentioned are excellent at obtaining a snapshot of a Linux system in real time. Tools like top or htop show all running processes, their current memory consumption, and processor usage. But knowing what processes and system calls are consuming the most resources becomes problematic.

That’s where OProfile comes in. This utility suite not only performs its analysis at a deeper level, but also saves data and allows you to produce performance reports that offer a wealth of information that can help you debug even the most elusive performance issue.

OProfile is not just for developers. In a desktop environment, OProfile can help you track down CPU-intensive background tasks or I/O calls that are slowing you down and aren’t immediately evident. On a busy system with shifting process priorities, this data can be hard to collect, let alone interpret. The multi-process nature of a server environment makes this task even more difficult with traditional tools.

That said, developers will no doubt get the most use out of OProfile. The information I’ll present will cover the basics of both use cases so you can dig into the performance metrics of any Linux program.

Installation

There is a very important note that must be made before diving deeply into OProfile – you may not be able to install it in a virtualized environment. If you are running Linux inside a VirtualBox, VMWare, or similar VM environment, OProfile may not be able to access the necessary performance counters to collect data. Furthermore, even if you are able to use it in a virtual environment, precise timing may be somewhat distorted based on host system load, so please keep this in mind if you aren’t running on native hardware.

Several Linux distributions have OProfile in their package management systems, making installation easy:

  • Debian / Ubuntu / Linux Mint – sudo apt-get install oprofile
  • Fedora / CentOS – sudo yum install oprofile
  • Arch – sudo pacman -S oprofile

A Simple Example

Once the program is installed, let’s get our feet wet with a trivial yet useful example. The program “ls” is a command you probably use all the time. It simply displays a list of files and folders in the current directory. Let’s trace its output:

sudo operf ls

oproflle ls screenshot

You’ll see something similar to the above screen shot. Once the profiler is finished, it will announce “Profiling done.” It has saved it’s data in a folder called oprofile_data which can be used to generate a report.

Running the command opreport (without sudo in this case) produces a report similar to this:

oprofile screen shot 2

In this example, the default report shows the number of samples when the CPU was not in a HALT state (in other words, was actively doing something). Kallsyms provides symbol lookup used by the profiler, and the ld.so and libc.so are part of the glibc package, a common library linked into nearly all Linux executables that provides basic functionality developers can use to keep from reinventing the wheel and provide a generic level of compatibility between various systems. You can see that the actual program ls had far less non-HALT time – the bulk of the heavy lifting was done by the standard libraries.

Once we’re done with the report, it’s a good idea to either remove the data folder or save it for future analysis. In this example, we’ll just remove it since we’re running sample exercises. Since we ran the command with sudo, we must remove the folder with sudo. Be careful!

sudo rm -Rf oprofile_data

A More Complex Example

In this next example, we’ll run a program that actually does something more complex than just list files in the current folder. Let’s download WordPress with wget.

sudo operf wget http://wordpress.org/latest.tar.gz

After this example, we can generate a report with the “opreport” command:

oprofile screen shot 3

You’ll see a lot more activity after this one. The wget command had to do a lot of work behind the scenes to obtain the latest copy of WordPress. Though it’s not necessary to examine each item, the interesting points of interest are:

  • ath9k and ath9k_hw – These modules are responsible for the WiFi connection on this laptop.
  • mac80211 and cfg80211 – These libraries were instrumental in performing the network connection required by wget.
  • libnss_dns and libresolv were used in resolving the wordpress.org domain into an IP address so wget could make an HTTP connection.
  • libcrypto and libssl – These libraries are part of the OpenSSL library. This performed the work to decode the received data from the https:// url. Note that even though we specified a URL with http://, the WordPress server redirected us to https:// and wget followed this redirect.
  • libpthread – This library performs threading operations which allow programs to do multiple things at once. In this case, wget started a thread to download the program and also provide an ASCII-based download progress indicator on the screen.

This kind of data can provide a wealth of information for a developer. But how is this important to a system administrator of a server or a power user on a desktop? By knowing which parts of a program are taking the most CPU time, we can find out what needs optimization or where the slowdown is occurring, allowing us to make better decisions about how to optimize our system.

In this example, the most CPU time was taken by the crypto/SSL routines. This is understandable because cryptography is a time consuming task. Had the wordpress.org website not redirected us to https:// this library would not have been used, saving us CPU time. The network layer would still have been used, but using a wired connection instead of a wireless connection would likely have been less taxing. Disabling the progress indicator on the wget program (via the -nv switch) would have saved CPU time in displaying download progress.

Digging Into Symbols

Even though the default report provides valuable and useful information, we can dig further. By running this:

opreport --demangle=smart --symbols

We can find out exactly how much CPU time functions in the libraries consumed:

oprofile screen shot 4

In this example, I used the wget command above but used an http:// URL (one that doesn’t redirect to https://) and you can see the absence of OpenSSL libraries in the trace. However, instead of just the library name, we now have a full listing of the functions involved. As you can see, the network layer consumed most of the CPU non-HALT time.

Taking it to the Next Level

In the previous examples we’ve used OProfile to take a look at one program at a time. You can examine your entire system at once using the –system-wide switch:

sudo operf --system-wide

Using this technique, OProfile will gather statistics in the same manner and stop when you hit CTRL+C. Afterwards, you can run the opreport command. Since the profiler will likely generate much more data (especially on a desktop or busy server).

opreport &amp;amp;amp;gt; report.txt

The report is now viewable in a file called report.txt

Low Overhead

It is important to note that while OProfile shouldn’t interfere with the operation of your programs, it will create a bit of overhead and thus slow down execution. In our simple examples above it didn’t create a problem, but on a program with long execution and extensive function calls you will likely notice a difference. Because of this, I wouldn’t recommend using this program in a production server environment unless faced with a critical performance problem that must be solved with live usage. Even then, I would use it just long enough to find the issue.

Conclusion

OProfile is a powerful performance profiling tool. It taps into the lowest level available in Linux to obtain performance counters and metrics that give you valuable information about your programs.

Gone are the days of guesswork in performance debugging – you now have the power to know precisely what your system is doing and how to improve it. By studying the reports generated by OProfile, you can make informed, data-driven decisions on optimizing your system.

]]>