Pages

Wednesday, July 12, 2023

AWK File processing

 How to Use to rearrange based on 1st field

Sample Entry

File contents

benefiicary-1 16

benefiicary-1 72

benefiicary-2 6

benefiicary-2 71

benefiicary-2 84

benefiicary-3 64

benefiicary-3 76

benefiicary-3 91

benefiicary-4 60

benefiicary-1 11

benefiicary-2 23

Required output

benefiicary-1 16,72,11

benefiicary-2 6,71,84,23

benefiicary-3 64,76,91

benefiicary-4 60

USed Script

{
  if ($1 in users) users[$1] = users[$1] "," $2
  else users[$1] = $2
}

END {PROCINFO["sorted_in"] = "@ind_str_asc"; for (user in users) { print user, users[user] } }

How it works

1. AWK is a line based processing.

Line 1 Processing

So while processing 1st line below is what happening.

if ($1 in users) - $1  is 1st field in line 1 , which is benefiicary-1. AWK will check is benefiicary-1 present in array called users. As this array is just created with empty values. Condition will fail. So else block will get execute. 

Which will add entry to AWK Associative array with index as beneficiary-1 and value as  16

## Line 2

Now AWK will check is beneficiary-1 present in beneficiaries array, the condition will be true so it will execute the block under IF statement. 

Which is 

                        users[$1] = users[$1] "," $2

beneficiaries[benefiicary1] =  beneficiary[benefiicary1] "," 7  

  Array        = 6,7

So now Array beneficiaries with index benefiicary1 will have value 6,7 and the processing continues till last line.

END BLOCK

This 

Final Array benefiicary1 benefiicary2 benefiicary3 benefiicary4

6,7,11,23   6.7,8 6,7,9 6


END block will be executed like this.

for benefiicary in Array benefiicarys

print benefiicary, benefiicarys{benefiicary]

print benefiicary1 benefiicarys[benefiicary1]

  benefiicary1 6,7,11,23