Pages

Thursday, December 27, 2018

AWK : In Built Variables and Functions

Example HTML page
NEXT and GETLINE 


The next statement forces awk to immediately stop processing the current record and go on to the next record. This means that no further rules are executed for the current record, and the rest of the current rule’s action isn’t executed.

Contrast this with the effect of the getline function. That also causes awk to read the next record immediately, but it does not alter the flow of control in any way (i.e., the rest of the current action executes with a new input record).

NR and FNR

NR - stores the total number of input records read so far, regardless of how many files have been read. The value of NR starts at 1 and always increases until the program terminates. 


FNR - stores the number of records read from the current file being processed. The value of FNR starts at 1, increases until the end of the current file is reached, then is set again to 1 as soon as the first line of the next file is read, and so on. 

AWK Two File Processing

When Processing multiple files awk reads each file sequentially, one after another, in the order they are specified on the command line. 

$ awk 'NR == FNR { # some actions; next} # other condition {# other actions}' file1.txt file2.txt


How it Works

So, the condition NR == FNR is only true while awk is reading the first file. Thus, in the program above, the actions indicated by # some actions are executed when awk is reading the first file; the actions indicated by # other actions are executed when awk is reading the second file, if the condition in # other condition is met. 

The next at the end of the first action block is needed to prevent the condition in # other condition from being evaluated, and the actions in # other actions from being executed, while awk is reading the first file.

Probably, it all becomes much clearer with some examples. There are really many problems that involve two files that can be solved using this technique. Let's look at this:

# prints lines that are both in file1.txt and file2.txt (intersection)

$ awk 'NR == FNR{a[$0];next} $0 in a' file1.txt file2.txt

Here we see another typical idiom: a[$0] alone has the only purpose of creating the array element indexed by $0, even if we don't assign any value to it. During the pass over the first file, all the lines seen are remembered as indexes of the array a. The pass over the second file just needs to check whether each line being read exists as an index in the array a (that's what the condition $0 in a does). If the condition is true, the line being read from file2.txt is printed (as we already know). In a very similar way, we can easily write the code to print the lines that appear in only one of the two files:

# prints lines that are only in file1.txt and not in file2.txt
$ awk 'NR == FNR{a[$0];next} !($0 in a)' file2.txt file1.txt
Note the order of the arguments. file2.txt is given first. To print lines that are only in file2.txt and not in file1.txt, just reverse the order of the arguments.

Thursday, December 20, 2018

regex Patterns

Lookbehind Regex 

Normally we can grep will search for a word and prints the entire line which has the match word. 

Using -O option we can print only the matching word.

If we want to search a word and print the word next to it. then we need to use Look behind in regex.

Syntax for Look Behind is enter the match word between the parenthesed followed by ? and <. Add = if you want to do positive lookbehind or add ! if you want to do negative lookbehind. Something like -v in grep option.

(?<=matchword)\w*
       
Examples 
1) Lookbehind

[test@localhost ]# echo "Input : File Java.lanag.xyz..File 
copied completed : one.txt" | grep -oP '(?<=File copied 
completed : 
)\w*.*' 
Output : one.txt
) Lookahead
[test@localhost ~]# echo "Input : File 
:Java.lanag.xyz..File copied completed : one.txt" | grep 
-oP '\w*(?= : \w*.txt)'
completed
[root@localhost log]#
3) Both
Input Text: applebananabananaapple 
(i)  banana(?=banana) - finds the 1st test ("banana" which 
has "banana" after it) 
(ii) banana(?!banana) - finds the 2nd test ("banana" which 
does not have "banana" after it)

(iii) (?<=apple)banana- finds the 1st test ("banana" which 
has "apple" before it)
(iv) (?<!apple)banana- finds the 2nd test ("banana" which 
does not have"apple" before it)