<script data-ad-client="ca-pub-7841181112240136" async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
As a normal method in unix we need to use sort and then uniq to remove the duplicates from the file. Because without sort, it wont give correct unique values.
Instead of that we can use awk command to remove the duplicates in the file. For this we need to use awk associate array.
Short notes about AWK associative array
Unlike regular arrays in AWK associative arrays the indexes need not to be continuous set of number; you can use either string or number as an array index. Also, there is no need to declare the size of an array in advance – arrays can expand/shrink at runtime.
Its syntax is : array_name[index] = value
Example
Cat tes
File Contents
Cat test
1
2
3
1
1
3
3a
3a
5
Command to Use: cat tes | awk '!seen[$0]++'
How it Works
Seen[$0] - Uses the current line as the key to the array a. In our case 1 so array index will become 1 2 3 3a and 5. As we can't have same array index.
Note : Seen is an arbitrary word, it can be any words like a, b or any strings.
if seen[1] is never reference before then a[1] evalutes to empty string as awk will crate empty if it was not initialized before. IN this zero is false. If we negate then we will get true result. if it is non-zero (true) then we will get false result.
uses the current line $0 as key to the array a, taking the value stored there. If this particular key was never referenced before, a[$0] evaluates to the empty string.
!seen[$0]
The ! negates the value from before. If it was empty or zero (false), we now have a true result. If it was non-zero (true), we have a false result. If the whole expression evaluated to true, meaning that a[$0] was not set to begin with, the whole line is printed as the default action.
Also, regardless of the old value, the post-increment operator adds one to a[$0], so the next the same value in the array is accessed, it will be positive and the whole condition will fail.
Below is the actual way how awk with expression works
awk 'expression' file Is actually a short hand of: awk 'expression {print $0}' file
Whenever the a test with no associated action is true, the default action is triggered. The default action is the equivalent of { print } or { print $0 }, which prints the current record, which for all accounts and purposes in this example is the current unmodified line of input.
No comments:
Post a Comment