1) What is AWK
awk is command used for processing files. With the help of awk we can print a particular field of a file or command output. Syntax of awk command is,
Syntax: awk 'BEGIN {awk-commands} {Action} END
{awk-commands}'
To understand more about awk will see some examples.
2) Print function
As mentioned above, with the help of awk we can print a particular field of a file or command.
eg 2. To print the first field of a file. (considering field separator as space )
# cat /etc/services | awk -F" " '{print $1}'
This will print the first field of the file named services. Where -F" " means field separator is space. We can mention other values also to field separator like F"," (Coma as the field separator).
Note : If you don't mention any field separator awk will consider space as default field separator.
3) Inbuilt Variable's
AWK has some inbuilt variables. Here is some of the list.
FILENAME - It represents the current file name.
FS
NR
FNR
OFS
NF - Number of fields.
When awk reads from the multiple input file, awk NR variable will give the total number of records relative to all the input file. Awk FNR will give you number of records for each input file.
3) sub and gsub function [ for Search and replace ]
In awk we have syntax called sub and gsub function to search for a partial string and perform the action. Below is the detailed explanation of gsub and gsub with examples.
gsub stands for global substitution. It replaces every
occurrence of sub with regex. The third parameter is optional. If it is
omitted, then $0 is used.
3.a) sub(regexp, replacement [, target])
The 'sub' function alters the value of TARGET. It searches this value,
which should be a string, for the leftmost substring matched by the regular
expression, REGEXP, extending this match as far as possible. Then the
entire string is changed by replacing the matched text with
REPLACEMENT. The modified string becomes the new value of TARGET. This function
is peculiar because TARGET is not simply used to compute a value, and not just
any expression will do: it must be a variable, field or
array reference, so that `sub' can store a modified value there. If this
argument is omitted, then the default is to use and alter `$0'.
eg 3.a):
#echo "water,
water, everywhere" | awk '{sub(/at/,"ith")}1'
output = "wither, water, everywhere" -
Sub will only replace the leftmost occurrence of the regex with replacement(in this case `at' with `ith') . 'sub'
function returns the number of substitutions made (either one or zero).
3.b) : Now let's see what happens if the special character `&' appears in REPLACEMENT, it stands for
the precise substring that matches the REGEXP. Below is the example for & string.
eg 3.b):
# echo
"tommy,tom,water,tomboy" | awk '{ sub(/tom/, "& and his
wife"); print }'
tom and his
wifemy,tom,water,tomboy
Awk append the regexp
instead of replacing it if we use & symbol. and this changes only the first
occurance.
Here is another example:
awk 'BEGIN {
str = "daabaaa"
sub(/a*/, "c&c", str)
print str
}'
prints `dcaacbaaa'. This
show how `&' can represent a non-constant string, and also illustrates the leftmost rule.
3.c) Turning off Special character's
Special character's can be
turned off by putting a backslash before it in the string. As usual, to
insert one backslash in the string, you must write two backslashes. Therefore,
write '\\&' in string to include a literal `&' in the
replacement.
eg 3.c): Here is how to replace the first `|' on each line with an
`&':
awk '{ sub(/\|/, "\\&"); print }'
Note : as mentioned above,
the third argument to `sub' must be an value. Some versions of `awk'
allow the third argument to be an expression which is not an value. In
such a case, `sub' would still search for the pattern and return 0 or 1.
4) Awk gsub function
List of all examples for gsub
[user@test ~]$ echo
"water, water, everywhere" | awk
'{gsub(/at/,"&ith");print}'
watither, watither, everywhere
[user@test ~]$ echo
"water, water, everywhere" | awk '{gsub(/at/,"ith");print}'
wither, wither, everywhere
[user@test ~]$ echo
"water, water, everywhere" | awk
'{gsub(/at/,"bd\\&ith");print}'
wbd&ither, wbd&ither,
everywhere
[user@test ~]$ echo
"water, water, everywhere" | awk '{sub(/at/,"ith");print}'
wither, water, everywhere
[user@test ~]$ echo
"water, water, everywhere" | awk
'{gsub(/at/,"bd&ith");print}'
wbdatither, wbdatither,
everywhere
4.a) Print the
count of matched regex then use print before gsub function
eg 4.a): To search in a file and print the count with line number.
[user@test ~]$ echo "water, wattter, everywhere" | awk -F, '{print gsub(/at/,"")}'
2
in the above example "at" occurs twice , so the count is printed as 2.
eg 4.a.2):
[user@test awkregex]$ grep -i NFS passwd_M
rpcuser:x:29:29:RPC
Service User:/var/lib/nfs:/sbin/nologin
nfsnobody:x:65534:65534:Anonymous
NFS User:/var/lib/nfs:/sbin/nologin
[user@test awkregex]$
grep -i NFS passwd_M | awk -F: '{print NR "\t" gsub(/rpc/,"")}'
1
1
2
0
in the above
example NR is line number and \t represent print in tab space. rpc occurs one time and none in 1st and 2nd line respectively and the
same count has been printed.
4.b) to
search and print the count on particular field
[user@test awkrgex]$
grep -i NFS passwd_M
rpcuser:x:29:29:RPC
Service User:/var/lib/nfs:/sbin/nologin
nfsnobody:x:65534:65534:Anonymous
NFS User:/var/lib/nfs:/sbin/nologin
to search in field
1 only we are creating a variable name col and using it in regex syntax and in
similar way for column 2
[user@test
awkrgex]$ grep -i NFS passwd_M | awk -F: -v col=1 '{print NR
"\t" gsub(/rpc/,"",$col)}'
1
1
2
0
[user@test awkrgex]$
grep -i NFS passwd_M | awk -F: -v col=2 '{print NR "\t"
gsub(/rpc/,"",$col)}'
1
0
2
0
5) Search and print the lines which matches the recommended count
With the help of awk also we can do search and print
lines like sed. Here fldcount is the sample file name. and we will print the
lines which has 2nd field length 15 or 16 digit,and 1st field length 12 or 15
digit.
[user@test awkregex]$
cat fldcount
123710337783,351898014413150,123028040249634
123710337785,352934028758390,123028040109275
000123710337785,352934028758390,123028040109275
3710337785,352934028758390,123028040109275
[user@test
awkregex]$ cat fldcount | awk -F, '{ if (((length($2) == 15 ) || length($2) ==
16) && (length($1) == 12 && length($3) == 15)) print }'
123710337783,351898014413150,123028040249634
123710337785,352934028758390,123028040109275
6) awk printf
function example
Printf is similar to #cat passwd
#cat passwd
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
# awk
-F":" '{printf("username=%s,userdid= %d\n", $1, $3)}'
passwd | head -n 5
username=root,userdid=
0
username=
bin,userdid= 1
username=daemon,userdid=
2
username=adm,userdid=
3
username=lp,userdid=
4
7) next and
getline statement
The next statement forces awk to immediately stop
processing the current record and go on to the next record. This means that no
further rules are executed for the current record, and the rest of the current
rule’s action isn’t executed.
Contrast this with the effect of the getline function (see Getline). That also causes awk to read the next record immediately, but
it does not alter the flow of control in any way (i.e., the rest of the current
action executes with a new input record).
Is there any way to
cat a file that has something like:
field1,field2,field
number 3,field4,field5
field1,field2,field3,field4,field5
(Some fields have
spaces, some fields do not)
I want to print using
awk $1 and $3, but only lines that do not have a space in field 3.
I don't know if there
is any way to make awk print something only if it starts with , and ends with ,
Lines don't have the
space in Field 3
awk -F,
'{n=split($3,a," ");if(n==1){print $1,$3}}' filename