original text file
$ cat test
jason
jason
jason
fffff
jason
Method 1: sort -u
after removal of repetition
sort -u test
fffff
jason
notice that the order is disrupted
sort test|uniq
after removal of repetition
$sort test |uniq
fffff
jason
note that the order is disrupted, the principle and method are the same
method three: awk ‘! A [$0] + + ‘ span> p>
after removal of repetition
$ awk '!a[$0]++' test
jason
fffff
order remains the same, file deduplication example
awk '!a[$0]++' test.txt >test.txt.tmp && mv -f test.txt.tmp test.txt
where awk USES a temporary file to overwrite the result
specific principle is as follows:
awk’s program instructions consist of patterns and actions, in the form of Pattern {Action}. If the Action is omitted, print $0 will be executed by default.
Pattern:
can be used here to remove repetition!a[$0]++
In awk, for uninitialized array variables, an initial value of 0 is assigned to them during numerical operations, so a[$0]=0, and the ++ operator is characterized by first value and then 1, so Pattern is equivalent to
!0
and 0 is false,! In order to get the reverse, the final result of the whole Pattern is 1, which is equivalent to if(1). The Pattern matching is successful, and the current record is output. For the DUP file, the processing method of the first three records is the same.
when the data “Jason” in line 2 is read, a[$0]=1, and the result after taking the reverse is 0, that is, the Pattern is 0, and the Pattern matching fails, so this record is not output, and the subsequent data is followed by the same, and the duplicate lines in the file are finally removed successfully.
p>
p>
Read More:
- Linux shell RM deletes all. O suffix files in the subdirectory
- Several methods of deleting all empty lines in text under Linux
- Extracting the first X lines of a file with Linux
- Latex letters with symbols, wavy lines, horizontal lines, horn, etc
- Grep finds all files containing a string in Linux
- Linux shell gets the file name under the folder
- Several methods of executing multiple commands in Linux shell
- Linux shell loop in a line for while
- Removing stop words —— Python Data Science CookBook
- In Linux shell script, about the commonly used flag [- EQ, GT..] in test and if judgment
- Delete files with specified suffix in specified folder under Linux
- Solution to the error $’\ R’: command not found when executing shell script under Linux
- Linux Tomcat accessing files on the server
- Find files with suffix. Sh under Linux
- Common shell (1): shell gets the current time stamp of the system
- Add executable permissions to Linux files
- Linux’s method of clearing DNS cache and refreshing DNS in shell terminal (Ubuntu, Debian)
- The method of removing special characters in ABAP
- Notepad + + removing CRLF
- An undetermined call to function ‘shell’: missing ‘. Stop. Problem encountered when using shell command in makefile