The cause of the hash conflict
Hashing is a solution to improve efficiency by recompressing data. However, because the hash value generated by the hash function is limited and the data may be more, there are still different data corresponding to the same value after processing by the hash function. This is where you have a hash conflict.
The influencing factors of hash conflict
Load factor (load factor = total number of data/hash table length), hash function, method of handling conflicts
Four ways to resolve hash conflicts
1. Open the address method
(1) Linear detection
When determining values sequentially, if a value for a piece of data already exists, one unit is added to the original value until no hash collisions occur.
(2) Ressquare detection
In order to determine the values, if the value of some data already exists, the original value is first added to the square of 1 units, if still exists, then subtracted 1 square units. It’s going to be 2 squared, 3 squared and so on. Until there is no hash conflict.
(3) Pseudo-random detection
When determining values in order, if some data already exists, a random number is generated by a random function, and the random number is added on the basis of the original value until there is no hash conflict.
2. Chain address method (HashMap hash resolution method)
For the same values, join using a linked list. Use an array to store each list.
Advantages:
(1) The zipper method is simple to deal with conflicts and has no accumulation phenomenon, that is, non-synonyms will never conflict, so the average search length is short;
(2) Because the node space of each linked list in the zipper method is dynamically applied, it is more suitable for the situation that the length of the table cannot be determined before making the table;
(3) In order to reduce conflicts, the open addressing method requires a small loading factor α, so a lot of space will be wasted when the node size is large. In the zipper method, α≥1 is desirable, and when the node is large, the added pointer domain in the zipper method can be ignored, so space is saved.
(4) In the hash table constructed by zipper method, the operation of deleting nodes is easy to realize. Simply delete the corresponding node on the linked list.
faults:
Pointers occupying a large space will cause a waste of space. If the space is used to increase the size of the hash table, the efficiency of the open address method will be improved.
3. Establish a public overflow area
Set up a common overflow area to store all hash conflicting data.
4. Re-hashing
The hash value of the conflict is hashed again until there is no hash conflict.
Hashing is a solution to improve efficiency by recompressing data. However, because the hash value generated by the hash function is limited and the data may be more, there are still different data corresponding to the same value after processing by the hash function. This is where you have a hash conflict.
The influencing factors of hash conflict
Load factor (load factor = total number of data/hash table length), hash function, method of handling conflicts
Four ways to resolve hash conflicts
1. Open the address method
(1) Linear detection
When determining values sequentially, if a value for a piece of data already exists, one unit is added to the original value until no hash collisions occur.
(2) Ressquare detection
In order to determine the values, if the value of some data already exists, the original value is first added to the square of 1 units, if still exists, then subtracted 1 square units. It’s going to be 2 squared, 3 squared and so on. Until there is no hash conflict.
(3) Pseudo-random detection
When determining values in order, if some data already exists, a random number is generated by a random function, and the random number is added on the basis of the original value until there is no hash conflict.
2. Chain address method (HashMap hash resolution method)
For the same values, join using a linked list. Use an array to store each list.
Advantages:
(1) The zipper method is simple to deal with conflicts and has no accumulation phenomenon, that is, non-synonyms will never conflict, so the average search length is short;
(2) Because the node space of each linked list in the zipper method is dynamically applied, it is more suitable for the situation that the length of the table cannot be determined before making the table;
(3) In order to reduce conflicts, the open addressing method requires a small loading factor α, so a lot of space will be wasted when the node size is large. In the zipper method, α≥1 is desirable, and when the node is large, the added pointer domain in the zipper method can be ignored, so space is saved.
(4) In the hash table constructed by zipper method, the operation of deleting nodes is easy to realize. Simply delete the corresponding node on the linked list.
faults:
Pointers occupying a large space will cause a waste of space. If the space is used to increase the size of the hash table, the efficiency of the open address method will be improved.
3. Establish a public overflow area
Set up a common overflow area to store all hash conflicting data.
4. Re-hashing
The hash value of the conflict is hashed again until there is no hash conflict.