Pages

Tuesday, February 14, 2017

What is Paging, Swaping, Page-in & Page Out

Computer has a certain amount of RAM, which is its "memory". It simulates more RAM by allowing extra data to be saved to the hard disk, which is known as Virtual Memory.

To do this, it breaks your memory space up into "pages". Applications that need access to data that is in memory call the data by page. If an application calls a page and it is in the RAM, then it is a "Page in" occurs. If an app calls for a page from memory, and that page is currently stored on the hard disk and has to be read back into the RAM, then a "Page Out" occurs.

A "Page-out" slows the operation of the system down because it has to read the data from a hard disk into RAM first, rather than reading straight from the RAM. Hard disks take about 300 times as long to transfer a page of data, which adds up to slow performance.

If page-outs exceed page-ins, you definitely don't have enough RAM. Ideally, page-outs should be less than 20% of the number of page-ins (the fewer page-outs, the faster your machine is performing) 

Adding more RAM, or reducing the number of open applications, are the only ways to reduce page-outs. While freeing up memory by working with fewer and smaller files and apps may help, more RAM is the only reasaonable solution.


Paging is a memory management scheme by which a computer stores and retrieves data from secondary storage[a] for use in main memory

to get the pagesize in linux

# getconf PAGESIZE

4096

What is Swapping?

This scheme involves every page in the system having an age which changes as the page is accessed. The more that a page is accessed, the younger it is; the less that it is accessed the older it becomes. Old pages are good candidates for swapping.

Swapping is the process of moving all the segments belonging to a process between the main memory and a secondary storage device. Swapping occurs under heavier work loads. Operating system kernel would move all the memory segments belonging to a process in to an area called swap area. When selecting a process for swapping, the operating system will select a process that will not become active for a while. When the main memory has enough space to hold the process, it will be transferred back in to the main memory from the swap space so that its execution could be continued.

This scheme involves every page in the system having an age which changes as the page is accessed. The more that a page is accessed, the younger it is; the less that it is accessed the older it becomes. Old pages are good candidates for swapping.

page-out === The system's free memory is less than a threhsold "lotsfree" and vhand daemon used "LFU" algorithm to move some unused / least used pages to the swap area.

page-in === One process which is running requested for a page that is not in the current memory (page-fault), vhand daemon is bringing it's pages to memory.

swap-out === System is thrashing and swapper daemon has de-activated a process and it's memory pages are moved into the swap area.

swap-in === A deactivated process is back to work and it's pages are being brought into the memory

What is Swappiness


The swappiness parameter controls the tendency of the kernel to move processes out of physical memory and onto the swap disk. Because disks are much slower than RAM, this can lead to slower response times for system and applications if processes are too aggressively moved out of memory.
swappiness can have a value of between 0 and 100

swappiness=0 tells the kernel to avoid swapping processes out of physical memory for as long as possible

swappiness=100 tells the kernel to aggressively swap processes out of physical memory and move them to swap cache

The default setting in Ubuntu is swappiness=60. Reducing the default value of swappiness will probably improve overall performance for a typical Ubuntu desktop installation. A value of swappiness=10 is recommended, but feel free to experiment. Note: Ubuntu server installations have different performance requirements to desktop systems, and the default value of 60 is likely more suitable.

To check the swappiness value

cat /proc/sys/vm/swappiness


What is the difference between Paging and Swapping?

In Paging, blocks of equal size (called pages) are transferred between the main memory and a secondary storage device, Since paging allows moving pages (it could be a part of the address space of a process), it is more flexible than swapping. Since, paging only moves pages (unlike swapping, which move a whole process), paging would allow more processes to reside on the main memory at the same time, when compared with a swapping system.

In swapping, all the segments belonging to a process will be moved back and forth between the main memory and a secondary storage device.  Swapping is more suitable when running heavier workloads.

What is the Page Fault

When the page (data) requested by a program is not available in the memory, it is called as a page fault. This usually results in the application being shut down.

Though the term "page fault" sounds like an error, page faults are common and are part of the normal way computers handle virtual memory. In programming terms, a page fault generates an "exception," which notifies the operating system that it must retrieve the memory blocks or "pages" from virtual memory in order for the program to continue. Once the data is moved into physical memory, the program continues as normal. This process takes place in the background and usually goes unnoticed by the user.

Most page faults are handled without any problems. However, an invalid page fault may cause a program to hang or crash. This type of page fault may occur when a program tries to access a memory address that does not exist. Some programs can handle these types of errors by finding a new memory address or relocating the data. However, if the program cannot handle the invalid page fault, it will get passed to the operating system, which may terminate the process. This can cause the program to unexpectedly quit.

While page faults are common when working with virtual memory, each page fault requires transferring data from secondary memory to primary memory. This process may only take a few milliseconds, but that can still be several thousand times slower than accessing data directly from memory. Therefore, installing more system memory can increase your computer's performance, since it will need to access virtual memory less often.

When your program reads/writes a memory location, it will use a virtual address to refer to that location in memory. Your system will then translate that virtual address into a physical address, and the data will be written to (or read from) that physical address. But your program has no idea that this is happening— the virtual to physical address translation is all kind of “under the hood”. The point is, your program doesn’t need to know that this is happening. The system will just maintain a happy illusion for the program to live in.
There are a number of reasons why you’d want to have this system; nearly all memory systems today use virtual memory, except in certain specialized situations. One major advantage of virtual memory is that you can use it to “pretend” that your machine has more RAM than it actually has installed— this usually happens through a system called “paging” (sometimes called “swapping”).

The idea is that the physical RAM only holds a certain number of “pages” of memory. Typically a “page” is about 4096 bytes, although it varies from system to system. In order to “pretend” that there is more RAM than physically possible, some of the pages are actually in the RAM, and some of them are stored on the hard drive.

The virtual memory system uses something called a “page table” to map virtual addresses to physical addresses. Since our machine could possibly have less RAM than our program thinks it has, it’s possible to have more virtual addresses than physical addresses. That means not all virtual addresses in a page table will have a valid corresponding physical address (i.e. not all virtual addresses will have a valid entry in the page table). If a virtual address has no valid entry in the page table, then any attempt by your program to access that virtual address will cause a page fault to occur

So what happens when we have a page fault? Well, when that happens, your OS invokes something called a page fault handler. As you might imagine, it’s a piece of code that handles page faults. Usually the page fault handler will do the following:
Figure out which page the virtual address is supposed to map to. Figure out where that page is located on the hard drive.
Choose an existing page in physical RAM that we (probably) aren’t currently using. Write that page back to the hard drive, and evict it from RAM (i.e. kick it out) to make room for the new page.
Load the new page into RAM, from the hard drive.

Update the page table, so that the virtual address that caused the page fault now has a valid entry. Likewise, clear the virtual address entry for the page that we just evicted, so that virtual addresses that correspond to the evicted page are no longer valid.

Now that the correct page is loaded in RAM, and the page table is up to date, return control to the program that was running before all these shenanigans occurred, and retry the memory access instruction that initially caused the page fault. If all goes well, this second time around the faulting instruction will work correctly, now that the correct page has been loaded into physical RAM.

Saturday, February 11, 2017

Linux Process States

List of Topics

1) List of Linux process states
2) Special symbol process states
3) Sample PS Command output


1) List of Linux process states

In Linux we have below process state's, which we can see as a part of TOP command output. Some of the main Process states and its symbol how it is displayed in the TOP command.

D - Process in uninterruptible sleep
R - Running process
S - Interruptible sleep (waiting for an event to complete)
T - Stopped, either by a job control signal or because it is being traced.
W - paging (not valid since the 2.6.xx kernel)
X - dead (should never be seen)
Z - Defunct ("zombie") process, terminated but not reaped by its parent.

2) Special symbol process states

Apart from regular process states we can see below special characters also in process states. Below are the explanation for the special symbol process states:

< - high-priority (not nice to other users)
N - low-priority (nice to other users)
L - has pages locked into memory (for real-time and custom IO)
s - is a session leader
l - is multi-threaded 
+ - is in the foreground process group 


3) Sample PS Command output

The states of the process can be identified from ps command output from STAT field. Below is the sample output of ps command. In the STAT field we can see the characters which we have explained in the previous section. 

[test@XXXXX ~]# ps aux

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND

root         1  0.0  0.0  10372   572 ?        Ss    2016   0:10 init [3]

root         2  0.0  0.0      0     0 ?        S<    2016   0:11 [migration/0]

root         3  0.0  0.0      0     0 ?        SN    2016   0:00 [ksoftirqd/0]

root         4  0.0  0.0      0     0 ?        S<    2016   0:00 [watchdog/0]

root         5  0.0  0.0      0     0 ?        S<    2016   0:10 [migration/1]
root         6  0.0  0.0      0     0 ?        SN    2016   0:01 [ksoftirqd/1]
root         7  0.0  0.0      0     0 ?        S<    2016   0:00 [watchdog/1]
root         8  0.0  0.0      0     0 ?        S<    2016   0:01 [migration/2]
root         9  0.0  0.0      0     0 ?        SN    2016   0:01 [ksoftirqd/2]
root        10  0.0  0.0      0     0 ?        S<    2016   0:00 [watchdog/2]
root        11  0.0  0.0      0     0 ?        S<    2016   0:01 [migration/3]
root        12  0.0  0.0      0     0 ?        SN    2016   0:01 [ksoftirqd/3]
root        13  0.0  0.0      0     0 ?        S<    2016   0:00 [watchdog/3]
root        14  0.0  0.0      0     0 ?        S<    2016   0:01 [migration/4]
root        15  0.0  0.0      0     0 ?        SN    2016   0:01 [ksoftirqd/4]

NI is the nice value, which is a user-space concept. PR is the process's actual priority, as viewed by the Linux kernel.]
op, by default, lists both columns. I am curious as to what is the difference. I checked out the man pages and cannot figure it out:

Priority:

h: PR  --  Priority The priority of the task.

Nice value: NI  --  Nice value
T

he nice value of the task.  A negative nice value means higher  priority, whereas  a  positive  nice value means lower priority.  Zero in this field simply means priority will not be adjusted in determining  a  task’s  dispatchability.

I understand that Nice value is related to the Kernel's CPU scheduler queue; then what does Priority indicate? Something regarding I/O perhaps?


The difference is that PR is a real priority of a process at the moment inside of the kernel and NI is just a hint for the kernel what the priority the process should have.

In most cases PR value can be computed by the following formula: PR = 20 + NI. Thus the process with niceness 3 has the priority 23 (20 + 3) and the process with niceness -7 has the priority 13 (20 - 7). You can check the first by running command nice -n 3 top. It will show that top process has NI 3 and PR 23. But for running nice -n -7 top in most Linux systems you need to have root privileges because actually the lower PR value is the higher actual priority is. Thus the process with PR 13 has higher priority than processes with standard priority PR 20. That's why you need to be root. But minimum niceness value allowed for non-root process can be configured in /etc/security/limits.conf.


Theoretically the kernel can change PR value (but not NI) by itself. For example it may reduce the priority of a process if it consumes too much CPU, or it may increase the priority of a process if that process had no chance to run for a long time because of other higher priority processes. In these cases the PR value will be changed by kernel and NI will remain the same, thus the formula "PR = 20 + NI" will not be correct. So the NI value can be interpreted as hint for the kernel what the priority the process should have, but the kernel can choose real priority (PR value) on its own depending on the situation. But usually the formula "PR = 20 + NI" is correct.

The nice value is a "global" mechanism, whereas priority is relevant for the task switcher right now


Process

In computing, a process is an instance of a computer program that is being executed. Depending on the operating system (OS), a process may be made up of multiple threads of execution that execute instructions concurrently.

Parent Process

In the operating system Unix, every process except process 0 (the swapper) is created when another process executes the fork() system call. The process that invoked fork is the parent process and the newly-created process is the child process. Every process (except process 0) has one parent process, but can have many child processes.
The operating system kernel identifies each process by its process identifier. Process 0 is a special process that is created when the system boots; after forking a child process (process 1), process 0 becomes the swapper process (sometimes also known as the “idle task”). Process 1, known as init, is the ancestor of every other process in the system.

Child process

A child process in computing is a process created by another process (the parent process).
A child process inherits most of its attributes, such as open files, from its parent. In UNIX, a child process is in fact created (using fork) as a copy of the parent. The child process can then overlay itself with a different program (using exec) as required.

Each process may create many child processes but will have at most one parent process; if a process does not have a parent this usually indicates that it was created directly by the kernel. In some systems, including UNIX based systems such as Linux, the very first process (called init) is started by the kernel at booting time and never terminates (see Linux startup process); other parentless processes may be launched to carry out various daemon tasks in userspace. Another way for a process to end up without a parent is if its parent dies, leaving an orphan process; but in this case it will shortly be adopted by init.

System call fork() is used to create processes. The purpose of fork() is to create a new process, which becomes the child process of the caller.

Orphan Process

An orphan process is a computer process whose parent process has finished or terminated, though it remains running itself.

In a Unix-like operating system any orphaned process will be immediately adopted by the special init system process. This operation is called re-parenting and occurs automatically.
Even though technically the process has the init process as its parent, it is still called an orphan process since the process that originally created it no longer exists.

Daemon

A daemon is a computer program that runs as a background process, rather than being under the direct control of an interactive user. Typically daemon names end with the letter d: for example, syslogd is the daemon that implements the system logging facility and sshd is a daemon that services incoming SSH connections.

Daemon is usually created by a process forking a child process and then immediately exiting, thus causing init to adopt the child process. Daemon process is a process orphaned intentionally.

What is Zombie Process

Zombie process is a defunct process that has completed its execution but still has an entry in the process table. This entry is still needed to allow the parent process to read its child’s exit status. Also, unlike normal processes, the kill command has no effect on a zombie process.

When a process ends, all of the memory and resources associated with it are deallocated so they can be used by other processes. However, the process’s entry in the process table remains. The parent can read the child’s exit status by executing the wait system call, whereupon the zombie is removed. The wait call may be executed in sequential code, but it is commonly executed in a handler for the SIGCHLD signal, which the parent receives whenever a child has died.

Difference Between Zombie and Orphan


A zombie process is not the same as an orphan process. An orphan process is a process that is still executing, but whose parent has died. They do not become zombie processes; instead, they are adopted by init (process ID 1), which waits on its children.

About Real time, User time and SYS time

In OS CPU Real, User and Sys process time statistics

One of these things is not like the other. Real refers to actual elapsed time; User and Sys refer to CPU time used only by the process.

1) What is Real Time

Real time is wall clock time - It's the time taken to start and finish the entire execution. 

2) What is User Time

It is the  the amount of CPU time spent in user-mode code (outside the kernel) within the process. This is only actual CPU time used in executing the process. Other processes and time the process spends blocked do not count towards this figure. If you want to know more about CPU time there is separate post where CPU time is well explained.

2) What is System Time

System time is the amount of CPU time spent in the kernel within the process. Which means executing CPU time spent in system calls within the kernel, as opposed to library code, which is still running in user-space. Like 'user', this is only CPU time used by the process. See below for a brief description of kernel mode (also known as 'supervisor' mode) and the system call mechanism.

User+Sys -  will tell you how much actual CPU time your process used. Note that this is across all CPUs, so if the process has multiple threads (and this process is running on a computer with more than one processor) it could potentially exceed the wall clock time reported by Real (which usually occurs). 

The rule of thumb is:

real < user: The process is CPU bound and takes advantage of parallel execution on multiple cores/CPUs.

real ≈ user: The process is CPU bound and takes no advantage of parallel exeuction.

real > user: The process is I/O bound. Execution on multiple cores would be of little to advantage

http://stackoverflow.com/questions/556405/what-do-real-user-and-sys-mean-in-the-output-of-time


A program is CPU bound if it would go faster if the CPU were faster, i.e. it spends the majority of its time simply using the CPU (doing calculations). A program that computes new digits of π will typically be CPU-bound, it's just crunching numbers.

A program is I/O bound if it would go faster if the I/O subsystem was faster. Which exact I/O system is meant can vary; I typically associate it with disk. A program that looks through a huge file for some data will often be I/O bound, since the bottleneck is then the reading of the data from disk.

CPU Bound means the rate at which process progresses is limited by the speed of the CPU. A task that performs calculations on a small set of numbers, for example multiplying small matrices, is likely to be CPU bound.

I/O Bound means the rate at which a process progresses is limited by the speed of the I/O subsystem. A task that processes data from disk, for example, counting the number of lines in a file is likely to be I/O bound.

Memory bound means the rate at which a process progresses is limited by the amount memory available and the speed of that memory access. A task that processes large amounts of in memory data, for example multiplying large matrices, is likely to be Memory Bound.

Cache bound means the rate at which a process progress is limited by the amount and speed of the cache available. A task that simply processes more data than fits in the cache will be cache bound.

I/O Bound would be slower than Memory Bound would be slower than Cache Bound would be slower than CPU Bound.

Wednesday, February 1, 2017

How to Calculate CPU Usage and What are the States of CPU

List of Topics

1) How to Calculate CPU Usage
2) What is CPU Time
3) States of CPU

1) How to Calculate CPU Usage

Normally in Operation System CPU usage is calculated based on CPU time. 

2) What is CPU time

CPU time is allocated in discrete time slices (ticks). For a certain number of time slices, the CPU is busy, other times it is not (which is represented by the idle process). In the picture below the CPU is busy for 5 of the 10 CPU slices. 5/10 = .50 = 50% of busy time (and there would therefore be 40% idle time).


enter image description here

Will explain in breif with the help sample top command output

Top Output  : Cpu(s):  0.7%us,  0.7%sy,  0.0%ni, 98.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

In the above line 0.7% is the CPU time in user space. CPU time is calculated based on ticks. For eg: 1 sec = 100 ticks then out of 100 ticks 0.7 is used for user programs. similarly 0.7 for sytem process and 98%7 for idle 

In case if a process is showing usage more than 100 then we need to check the number of cores and based on that we need to calculate it.

Since CPUs operate in GHz (billionths of cycles a second). The operating system slices that time in smaller units called ticks. They are not really 1/10 of a second. The tick rate in windows is 10 million ticks in a second and in Linux it is sysconf(_SC_CLK_TCK) (usually 100 ticks per second).

In something like top, the busy cpu cycles are then further broken down into percentages of things like user time and system time. In top on Linux and perfmon in Windows, you will
often get a display that goes over 100%, that is because the total is 100% * the_number_of_cpu_cores.

In an operating system, it is the scheduler's job to allocate these precious slices to processes, so the scheduler is what reports this.

3) States of CPU

CPU time is occupied  by one of the below stated of CPU, Below are the states of CPU with some explanation.

a) idIdle, which means it has nothing to do
b) usRunning a user space program, like a command shell, an email server, or a 
c) sy - Running the kernel, servicing interrupts or managing resources.
d) ni - shows the amount of CPU spent running user space processes that have been niced. When no processes have been niced then the number will be 0.
e) wa - Input and output operations, like reading or writing to a disk, are slow compared to the speed of a CPU. Although this operations happen very fast compared to everyday human activities, they are still slow when compared to the performance of a CPU. There are times when the processor has initiated a read or write operation and then it has to wait for the result, but has nothing else to do. In other words it is idle while waiting for an I/O operation to complete. The time the CPU spends in this state is shown by the wa statistic.
f) hi & si - These two statistics show how much time the processor has spent servicing interrupts. hi is for hardware interrupts, and si is for software interrupts. Hardware interrupts are physical interrupts sent to the CPU from various peripherals like disks and network interfaces. Software interrupts come from processes running on the system. A hardware interrupt will actually cause the CPU to stop what it is doing and go handle the interrupt. A software interrupt doesn't occur at the CPU level, but rather at the kernel level.
g) st - This last number only applies to virtual machines. When Linux is running as a virtual machine on a hypervisor, the st (short for stolen) statistic shows how long the virtual CPU has spent waiting for the hypervisor to service another virtual CPU running on a different virtual machine. Since in the real-world these virtual processors are sharing the same physical processor(s) then there will be times when the virtual machine wanted to run but the hypervisor scheduled another virtual machine instead.