Pages

Friday, January 20, 2017

AWK Syntax and Examples

1) What is AWK

awk is command used for processing files. With the help of awk we can print a particular field of a file or command output. Syntax of awk command is,

Syntax:  awk 'BEGIN {awk-commands} {Action} END {awk-commands}'

To understand more about awk will see some examples.

2) Print function

As mentioned above, with the help of awk we can print a particular field of a file or command.

eg 2. To print the first field of a file. (considering field separator as space )

# cat /etc/services | awk -F" " '{print $1}' 

This will print the first field of the file named services. Where -F" " means field separator is space. We can mention other values also to field separator like F"," (Coma as the field separator). 

Note : If you don't mention any field separator awk will consider space as default field separator. 

3) Inbuilt  Variable's

AWK has some inbuilt variables. Here is some of the list.

FILENAME - It represents the current file name.
FS
NR
FNR

OFS

NF - Number of fields.


When awk reads from the multiple input file, awk NR variable will give the total number of records relative to all the input file. Awk FNR will give you number of records for each input file.
3) sub and gsub function [ for Search and replace ]

In awk we have syntax called sub and gsub function to search for a partial string and perform the action. Below is the detailed explanation of gsub and gsub with examples.


gsub stands for global substitution. It replaces every occurrence of sub with regex. The third parameter is optional. If it is omitted, then $0 is used.

3.a) sub(regexpreplacement [, target])

The 'sub' function alters the value of TARGET.  It searches this value, which should be a string, for the leftmost substring matched by the regular expression, REGEXP, extending this match as far as possible.  Then the entire string      is changed by replacing the matched text with REPLACEMENT. The modified string becomes the new value of TARGET. This function is peculiar because TARGET is not simply used to compute a value, and not just any expression will do: it      must be a variable, field or array reference, so that `sub' can store a modified value there.  If this argument is omitted, then the default is to use and alter `$0'.

eg 3.a):

#echo "water, water, everywhere" | awk '{sub(/at/,"ith")}1'
output = "wither, water, everywhere" - 

Sub will only replace the leftmost occurrence of the regex with replacement(in this case `at'   with `ith') . 'sub' function returns the number of substitutions made (either one or zero).
   
3.b) : Now let's see what happens if the special character `&' appears in REPLACEMENT, it stands for the precise substring that matches the REGEXP. Below is the example for & string.


eg 3.b):

# echo "tommy,tom,water,tomboy" | awk '{ sub(/tom/, "& and his wife"); print }'
tom and his wifemy,tom,water,tomboy

Awk append the regexp instead of replacing it if we use & symbol. and this changes only the first occurance.

Here is another example:

          awk 'BEGIN {
                  str = "daabaaa"
                  sub(/a*/, "c&c", str)
                  print str
          }'

prints `dcaacbaaa'.  This show how `&' can represent a non-constant string, and also illustrates the leftmost rule.

3.c) Turning off Special character's 

Special character's can be turned off by putting a backslash before it in the string. As usual, to insert one backslash in the string, you must write two backslashes. Therefore, write '\\&' in string to include a literal `&' in the replacement.

eg 3.c): Here is how to replace the first `|' on each line with an `&':

          awk '{ sub(/\|/, "\\&"); print }'

Note : as mentioned above, the third argument to `sub' must be an value.  Some versions of `awk' allow the third argument to be an expression which is not an value.  In such a case, `sub' would still search for the pattern and return 0 or 1. 

4) Awk gsub function

List of all examples for gsub

[user@test ~]$ echo "water, water, everywhere" | awk '{gsub(/at/,"&ith");print}'
watither, watither, everywhere
[user@test ~]$ echo "water, water, everywhere" | awk '{gsub(/at/,"ith");print}'
wither, wither, everywhere
[user@test ~]$ echo "water, water, everywhere" | awk '{gsub(/at/,"bd\\&ith");print}'
wbd&ither, wbd&ither, everywhere
[user@test ~]$ echo "water, water, everywhere" | awk '{sub(/at/,"ith");print}'
wither, water, everywhere
[user@test ~]$ echo "water, water, everywhere" | awk '{gsub(/at/,"bd&ith");print}'
wbdatither, wbdatither, everywhere

4.a) Print the count of matched regex then use print before gsub function

eg 4.a):  To search in a file and print the count with line number. 


[user@test ~]$ echo "water, wattter, everywhere" | awk  -F, '{print gsub(/at/,"")}'
2

in the above example "at" occurs twice , so the count is printed as 2.

eg 4.a.2):

[user@test awkregex]$ grep -i NFS passwd_M
rpcuser:x:29:29:RPC Service User:/var/lib/nfs:/sbin/nologin
nfsnobody:x:65534:65534:Anonymous NFS User:/var/lib/nfs:/sbin/nologin

[user@test awkregex]$ grep -i NFS passwd_M | awk -F:  '{print NR "\t" gsub(/rpc/,"")}'
1       1
2       0

in the above example NR is line number and \t represent print in tab space. rpc occurs one time and none in 1st and 2nd line respectively and the same count has been printed.

4.b) to search and print the count on particular field

[user@test awkrgex]$ grep -i NFS passwd_M
rpcuser:x:29:29:RPC Service User:/var/lib/nfs:/sbin/nologin
nfsnobody:x:65534:65534:Anonymous NFS User:/var/lib/nfs:/sbin/nologin

to search in field 1 only we are creating a variable name col and using it in regex syntax and in similar way for column 2

[user@test awkrgex]$ grep -i NFS passwd_M |  awk -F: -v col=1 '{print NR "\t" gsub(/rpc/,"",$col)}'
1       1
2       0

[user@test awkrgex]$ grep -i NFS passwd_M |  awk -F: -v col=2 '{print NR "\t" gsub(/rpc/,"",$col)}'
1       0
2       0

5) Search and print  the lines which matches the recommended count

With the help of awk also we can do search and print lines like sed. Here fldcount is the sample file name. and we will print the lines which has 2nd field length 15 or 16 digit,and 1st field length 12 or 15 digit.

[user@test awkregex]$ cat fldcount
123710337783,351898014413150,123028040249634
123710337785,352934028758390,123028040109275
000123710337785,352934028758390,123028040109275
3710337785,352934028758390,123028040109275

[user@test awkregex]$ cat fldcount | awk -F, '{ if (((length($2) == 15 ) || length($2) == 16) && (length($1) == 12 && length($3) == 15)) print }'
123710337783,351898014413150,123028040249634
123710337785,352934028758390,123028040109275

6) awk printf function example



Printf is similar to #cat passwd

#cat passwd
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin

# awk -F":" '{printf("username=%s,userdid= %d\n", $1, $3)}' passwd | head -n 5

username=root,userdid= 0
username= bin,userdid= 1
username=daemon,userdid= 2
username=adm,userdid= 3
username=lp,userdid= 4

7) next and getline statement

The next statement forces awk to immediately stop processing the current record and go on to the next record. This means that no further rules are executed for the current record, and the rest of the current rule’s action isn’t executed.


Contrast this with the effect of the getline function (see Getline). That also causes awk to read the next record immediately, but it does not alter the flow of control in any way (i.e., the rest of the current action executes with a new input record).


Is there any way to cat a file that has something like:
field1,field2,field number 3,field4,field5
field1,field2,field3,field4,field5

(Some fields have spaces, some fields do not)

I want to print using awk $1 and $3, but only lines that do not have a space in field 3.
I don't know if there is any way to make awk print something only if it starts with , and ends with ,
Lines don't have the space in Field 3

awk -F, '{n=split($3,a," ");if(n==1){print $1,$3}}' filename

No comments:

Post a Comment