original text file
$ cat test
jason
jason
jason
fffff
jason
Method 1: sort -u
after removal of repetition
sort -u test
fffff
jason
notice that the order is disrupted
sort test|uniq
after removal of repetition
$sort test |uniq
fffff
jason
note that the order is disrupted, the principle and method are the same
method three: awk ‘! A [$0] + + ‘ span> p>
after removal of repetition
$ awk '!a[$0]++' test
jason
fffff
order remains the same, file deduplication example
awk '!a[$0]++' test.txt >test.txt.tmp && mv -f test.txt.tmp test.txt
where awk USES a temporary file to overwrite the result
specific principle is as follows:
awk’s program instructions consist of patterns and actions, in the form of Pattern {Action}. If the Action is omitted, print $0 will be executed by default.
Pattern:
can be used here to remove repetition!a[$0]++
In awk, for uninitialized array variables, an initial value of 0 is assigned to them during numerical operations, so a[$0]=0, and the ++ operator is characterized by first value and then 1, so Pattern is equivalent to
!0
and 0 is false,! In order to get the reverse, the final result of the whole Pattern is 1, which is equivalent to if(1). The Pattern matching is successful, and the current record is output. For the DUP file, the processing method of the first three records is the same.
when the data “Jason” in line 2 is read, a[$0]=1, and the result after taking the reverse is 0, that is, the Pattern is 0, and the Pattern matching fails, so this record is not output, and the subsequent data is followed by the same, and the duplicate lines in the file are finally removed successfully.
p>
p>