如何确保hashCode（）与equals（）一致？-HowtoensurehashCode()isconsistentwithequals()?

作者：幸福欧旭旭_320 | 来源：互联网 | 2023-07-06 14:34

Whenoverridingtheequals()functionofjava.lang.Object,thejavadocssuggestthat,当重写java.lang.

When overriding the equals() function of java.lang.Object, the javadocs suggest that,

当重写java.lang.Object的equals()函数时,javadocs建议,

it is generally necessary to override the hashCode method whenever this method is overridden, so as to maintain the general contract for the hashCode method, which states that equal objects must have equal hash codes.

通常需要在重写此方法时覆盖hashCode方法,以便维护hashCode方法的常规协定,该方法声明相等的对象必须具有相等的哈希代码。

The hashCode() method must return a unique integer for each object (this is easy to do when comparing objects based on memory location, simply return the unique integer address of the object)

hashCode()方法必须为每个对象返回一个唯一的整数(这在基于内存位置比较对象时很容易,只需返回对象的唯一整数地址)

How should a hashCode() method be overriden so that it returns a unique integer for each object based only on that object's properities?

应该如何覆盖hashCode()方法,以便它仅基于该对象的特性为每个对象返回一个唯一的整数?


public class People{
   public String name;
   public int age;

   public int hashCode(){
      // How to get a unique integer based on name and age?
   }
}
/*******************************/
public class App{
   public static void main( String args[] ){
       People mike = new People();
       People melissa = new People();
       mike.name = "mike";
       mike.age = 23;
       melissa.name = "melissa";
       melissa.age = 24;
       System.out.println( mike.hasCode() );  // output?
       System.out.println( melissa.hashCode(); // output?
   }
}

8 个解决方案

#1

It doesn't say the hashcode for an object has to be completely unique, only that the hashcode for two equal objects returns the same hashcode. It's entirely legal to have two non-equal objects return the same hashcode. However, the more unique a hashcode distribution is over a set of objects, the better performance you'll get out of HashMaps and other operations that use the hashCode.

它没有说对象的哈希码必须是完全唯一的,只是两个相等对象的哈希码返回相同的哈希码。让两个不相等的对象返回相同的哈希码是完全合法的。但是,哈希码分布在一组对象上越独特,您将从HashMaps和使用hashCode的其他操作中获得更好的性能。

IDEs such as IntelliJ Idea have built-in generators for equals and hashCode that generally do a pretty good job at coming up with "good enough" code for most objects (and probably better than some hand-crafted overly-clever hash functions).

诸如IntelliJ Idea之类的IDE具有用于equals和hashCode的内置生成器,它们通常可以为大多数对象提供“足够好”的代码(并且可能比一些手工制作的过于聪明的哈希函数更好)。

For example, here's a hashCode function that Idea generates for your People class:

例如,这是Idea为您的People类生成的hashCode函数:

public int hashCode() {
    int result = name != null ? name.hashCode() : 0;
    result = 31 * result + age;
    return result;
}

#2

I won't go in to the details of hashCode uniqueness as Marc has already addressed it. For your People class, you first need to decide what equality of a person means. Maybe equality is based solely on their name, maybe it's based on name and age. It will be domain specific. Let's say equality is based on name and age. Your overridden equals would look like

我不会详细介绍hashCode的唯一性,因为Marc已经解决了这个问题。对于您的People类,您首先需要确定一个人的平等意味着什么。也许平等完全取决于他们的名字,也许是基于姓名和年龄。它将是特定领域的。让我们说平等是基于名称和年龄。你被覆盖的等于看起来像

public boolean equals(Object obj) {
    if (this==obj) return true;
    if (obj==null) return false;
    if (!(getClass().equals(obj.getClass())) return false;
    Person other = (Person)obj;
    return (name==null ? other.name==null : name.equals(other.name)) &&
        age==other.age;
}

Any time you override equals you must override hashCode. Furthermore, hashCode can't use any more fields in its computation than equals did. Most of the time you must add or exclusive-or the hash code of the various fields (hashCode should be fast to compute). So a valid hashCode method might look like:

无论何时重写equals,都必须覆盖hashCode。此外,hashCode在其计算中不能使用比equals更多的字段。大多数情况下,您必须添加或排除各个字段的哈希码(hashCode应该快速计算)。所以一个有效的hashCode方法可能如下所示:

public int hashCode() {
    return (name==null ? 17 : name.hashCode()) ^ age;
}

Note that the following is not valid as it uses a field that equals didn't (height). In this case two "equals" objects could have a different hash code.

请注意,以下内容无效,因为它使用的字段等于不(高度)。在这种情况下,两个“等于”对象可以具有不同的哈希码。

public int hashCode() {
    return (name==null ? 17 : name.hashCode()) ^ age ^ height;
}

Also, it's perfectly valid for two non-equals objects to have the same hash code:

此外,两个非等于对象具有相同的哈希码完全有效:

public int hashCode() {    
    return age;    
}

In this case Jane age 30 is not equal to Bob age 30, yet both their hash codes are 30. While valid this is undesirable for performance in hash-based collections.

在这种情况下,Jane年龄30不等于Bob年龄30,但他们的哈希码都是30.虽然有效,但这对于基于散列的集合中的性能是不合需要的。

#3

Another question asks if there are some basic low-level things that all programmers should know, and I think hash lookups are one of those. So here goes.

另一个问题是,是否存在所有程序员都应该知道的基本低级事物,我认为哈希查找就是其中之一。所以这里。

A hash table (note that I'm not using an actual classname) is basically an array of linked lists. To find something in the table, you first compute the hashcode of that something, then mod it by the size of the table. This is an index into the array, and you get a linked list at that index. You then traverse the list until you find your object.

哈希表(注意我没有使用实际的类名)基本上是一个链表的数组。要在表中查找内容,首先要计算该内容的哈希码,然后根据表的大小对其进行修改。这是数组的索引,您将获得该索引的链接列表。然后,您遍历列表,直到找到您的对象。

Since array retrieval is O(1), and linked list traversal is O(n), you want a hash function that creates as random a distribution as possible, so that objects will be hashed to different lists. Every object could return the value 0 as its hashcode, and a hash table would still work, but it would essentially be a long linked-list at element 0 of the array.

由于数组检索是O(1),并且链表遍历是O(n),因此您需要一个尽可能随机创建分布的哈希函数,以便将对象散列到不同的列表。每个对象都可以返回值0作为其哈希码,并且哈希表仍然可以工作,但它本质上是数组元素0处的长链表。

You also generally want the array to be large, which increases the chances that the object will be in a list of length 1. The Java HashMap, for example, increases the size of the array when the number of entries in the map is > 75% of the size of the array. There's a tradeoff here: you can have a huge array with very few entries and waste memory, or a smaller array where each element in the array is a list with > 1 entries, and waste time traversing. A perfect hash would assign each object to a unique location in the array, with no wasted space.

您通常也希望数组很大,这会增加对象在长度为1的列表中的可能性。例如,当地图中的条目数大于75时,Java HashMap会增加数组的大小。数组大小的百分比。这里有一个权衡:你可以拥有一个包含极少数条目和浪费内存的巨大数组,或者一个较小的数组,其中数组中的每个元素都是一个包含> 1个条目的列表,并且浪费时间遍历。完美的哈希会将每个对象分配给数组中的唯一位置,而不会浪费空间。

The term "perfect hash" is a real term, and in some cases you can create a hash function that provides a unique number for each object. This is only possible when you know the set of all possible values. In the general case, you can't achieve this, and there will be some values that return the same hashcode. This is simple mathematics: if you have a string that's more than 4 bytes long, you can't create a unique 4-byte hashcode.

术语“完美散列”是一个实际术语,在某些情况下,您可以创建一个散列函数,为每个对象提供唯一的数字。只有在知道所有可能值的集合时才可以执行此操作。在一般情况下,您无法实现此目的,并且会有一些值返回相同的哈希码。这是简单的数学:如果您的字符串长度超过4个字节,则无法创建唯一的4字节哈希码。

One interesting tidbit: hash arrays are generally sized based on prime numbers, to give the best chance for random allocation when you mod the results, regardless of how random the hashcodes really are.

一个有趣的花絮:哈希数组通常根据素数来确定大小,以便在修改结果时为随机分配提供最佳机会,无论哈希码实际上是多么随机。

Edit based on comments:

根据评论进行编辑:

1) A linked list is not the only way to represent the objects that have the same hashcode, although that is the method used by the JDK 1.5 HashMap. Although less memory-efficient than a simple array, it does arguably create less churn when rehashing (because the entries can be unlinked from one bucket and relinked to another).

1)链表不是表示具有相同哈希码的对象的唯一方法,尽管这是JDK 1.5 HashMap使用的方法。虽然比简单数组的内存效率更低,但在重新散列时可能会产生更少的流失(因为条目可以从一个存储桶取消链接并重新链接到另一个存储桶)。

2) As of JDK 1.4, the HashMap class uses an array sized as a power of 2; prior to that it used 2^N+1, which I believe is prime for N <= 32. This does not speed up array indexing per se, but does allow the array index to be computed with a bitwise AND rather than a division, as noted by Neil Coffey. Personally, I'd question this as premature optimization, but given the list of authors on HashMap, I'll assume there is some real benefit.

2)从JDK 1.4开始,HashMap类使用一个大小为2的幂的数组;在此之前,它使用2 ^ N + 1,我认为这是N <= 32的素数。这不会加速数组索引本身,但允许使用按位AND而不是除法来计算数组索引,如Neil Coffey所述。就个人而言,我认为这是过早的优化,但鉴于HashMap的作者列表,我会假设有一些真正的好处。

#4

In general the hash code cannot be unique, as there are more values than possible hash codes (integers). A good hash code distributes the values well over the integers. A bad one could always give the same value and still be logically correct, it would just lead to unacceptably inefficient hash tables.

通常,哈希码不能是唯一的,因为存在比可能的哈希码(整数)更多的值。良好的哈希代码可以很好地分配整数值。一个坏的可能总是给出相同的值并且仍然在逻辑上是正确的,它只会导致无法接受的低效哈希表。

Equal values must have the same hash value for hash tables to work correctly. Otherwise you could add a key to a hash table, then try to look it up via an equal value with a different hash code and not find it. Or you could put an equal value with a different hash code and have two equal values at different places in the hash table.

等值必须具有相同的哈希值才能使哈希表正常工作。否则,您可以将一个键添加到哈希表中,然后尝试通过具有不同哈希码的相等值查找它,而不是找到它。或者,您可以使用不同的哈希代码放置相等的值,并在哈希表中的不同位置具有两个相等的值。

In practice you usually select a subset of the fields to be taken into account in both the hashCode() and the equals() method.

实际上,您通常会在hashCode()和equals()方法中选择要考虑的字段子集。

#5

I think you misunderstood it. The hashcode does not have to be unique to each object (after all, it is a hash code) though you obviously don't want it to be identical for all objects. You do, however, need it to be identical to all objects that are equal, otherwise things like the standard collections would not work (e.g., you'd look up something in the hash set but would not find it).

我觉得你误会了。哈希码不必对每个对象都是唯一的(毕竟,它是一个哈希码),尽管你显然不希望它对所有对象都是相同的。但是,你需要它与所有相同的对象相同,否则像标准集合这样的东西将不起作用(例如,你在哈希集中查找某些内容但却找不到它)。

For straightforward attributes, some IDEs have hashcode function builders.

对于简单的属性,某些IDE具有哈希码功能构建器。

If you don't use IDEs, consider using Apahce Commons and the class HashCodeBuilder

如果您不使用IDE,请考虑使用Apahce Commons和HashCodeBuilder类

#6

The only contractual obligation for hashCode is for it to be consistent. The fields used in creating the hashCode value must be the same or a subset of the fields used in the equals method. This means returning 0 for all values is valid, although not efficient.

hashCode唯一的合同义务是使其保持一致。用于创建hashCode值的字段必须与equals方法中使用的字段相同或是其子集。这意味着返回0表示所有值都有效,但效率不高。

One can check if hashCode is consistent via a unit test. I written an abstract class called EqualityTestCase, which does a handful of hashCode checks. One simply has to extend the test case and implement two or three factory methods. The test does a very crude job of testing if the hashCode is efficient.

可以通过单元测试检查hashCode是否一致。我写了一个名为EqualityTestCase的抽象类,它执行一些hashCode检查。一个人只需要扩展测试用例并实现两个或三个工厂方法。如果hashCode有效,测试会做一个非常粗略的测试工作。

#7

This is what documentation tells us as for hash code method

这就是文档告诉我们的哈希码方法

@ javadoc

Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.

每当在Java应用程序的执行期间多次在同一对象上调用它时,hashCode方法必须始终返回相同的整数,前提是不修改对象的equals比较中使用的信息。从应用程序的一次执行到同一应用程序的另一次执行,该整数不需要保持一致。

#8

There is a notion of business key, which determines uniqueness of separate instances of the same type. Each specific type (class) that models a separate entity from the target domain (e.g. vehicle in a fleet system) should have a business key, which is represented by one or more class fields. Methods equals() and hasCode() should both be implemented using the fields, which make up a business key. This ensures that both methods consistent with each other.

有一个业务键的概念,它决定了相同类型的单独实例的唯一性。为目标域(例如车队系统中的车辆)建模单独实体的每种特定类型(类)应具有业务密钥,该业务密钥由一个或多个类字段表示。方法equals()和hasCode()都应该使用组成业务键的字段来实现。这确保了两种方法彼此一致。