hashCode，一個實驗引發的思考

作者：由小明發表于文化時間：2017-08-03

一個有趣的實驗

說明：

更換輸出順序後，輸出的結果並沒有改變，讓人好疑惑。

思考：

System。out。println（ o ）方法在編譯後，會在物件o上自動加上“。toString（）”，即實際列印的是o。toString（）的結果。

再看Object。toString（）的原始碼：顯然，上述列印結果@後面的值，就是hashCode的十六進位制表示。

public

String

toString

（）

{

return

getClass

（）。

getName

（）

“@”

Integer

。

toHexString

（

hashCode

（））；

}

由此知道問題出在hashCode（）身上。很多人說，那是物件的記憶體地址，但是，每次執行這段程式碼，輸出都沒變，覺得不太可能。

怎麼辦？繼續往下查，勢必要把真兇揪出來。

hashCode，什麼鬼？

hashCode是 java。lang。Object。hashCode（）或者 java。lang。System。identityHashCode（obj）會返回的值。他是一個物件的身份標識。官方稱呼為：

標識雜湊碼（

identity hash code

）。

哪些特點？

一個物件在其生命期中 identity hash code 必定保持不變；

如果a == b，那麼他們的System。identityHashCode（）必須相等；

如果他們的System。identityHashCode（）不相等，那他們必定不是同一個物件（逆否命題與原命題真實性總是相同）；

如果System。identityHashCode（）相等的話，並不能保證 a == b（畢竟這只是一個雜湊值，是允許衝突的）。

有什麼作用？

加速物件去重

：由特徵2可知，只要判斷出兩個物件的hashCode不一致，就知道這兩個物件不是同一個；又因為hashCode（）的效能比 “ == ”效能高得多，所以多數時候，用它來判斷重複。

怎麼計算出來的？

hashCode( )原始碼:

public

native

int

hashCode

（）；

很遺憾，他是一個本地方法，具體實現依賴於作業系統。

網上查詢得知，JDK原始碼由C++、Java、C、彙編這四種語言組成。JVM主體是C++寫的，JNI部分是C，工具類是Java寫的，JVM裡混有彙編程式碼。

而且JDK裡包含了本地方法的實現原始碼，我們在src/share/vm/prims/jvm。h和src/share/vm/prims/jvm。cpp中可以找到。

我們重點看以下方法：

// hashCode（） generation ：

// Possibilities：

// * MD5Digest of {obj，stwRandom}

// * CRC32 of {obj，stwRandom} or any linear-feedback shift register function。

// * A DES- or AES-style SBox［］ mechanism

// * One of the Phi-based schemes， such as：

// 2654435761 = 2^32 * Phi （golden ratio）

// HashCodeValue = （（uintptr_t（obj） >> 3） * 2654435761） ^ GVars。stwRandom ；

// * A variation of Marsaglia‘s shift-xor RNG scheme。

// * （obj ^ stwRandom） is appealing， but can result

// in undesirable regularity in the hashCode values of adjacent objects

// （objects allocated back-to-back， in particular）。 This could potentially

// result in hashtable collisions and reduced hashtable efficiency。

// There are simple ways to “diffuse” the middle address bits over the

// generated hashCode values：

static

inline

intptr_t

get_next_hash

（

Thread

Self

，

oop

obj

）

{

intptr_t

value

；

（

hashCode

）

{

// This form uses an unguarded global Park-Miller RNG，

// so it’s possible for two threads to race and generate the same RNG。

// On MP system we‘ll have lots of RW access to a global， so the

// mechanism induces lots of coherency traffic。

value

：：

random

（）

；

}

else

（

hashCode

）

{

// This variation has the property of being stable （idempotent）

// between STW operations。 This can be useful in some of the 1-0

// synchronization schemes。

intptr_t

addrBits

intptr_t

（

obj

）

；

value

addrBits

（

addrBits

）

GVars

。

stwRandom

；

}

else

（

hashCode

）

{

value

；

// for sensitivity testing

}

else

（

hashCode

）

{

value

GVars

。

hcSequence

；

}

else

（

hashCode

）

{

value

intptr_t

（

obj

）

；

}

else

{

// Marsaglia’s xor-shift scheme with thread-specific state

// This is probably the best overall implementation —— we‘ll

// likely make this the default in future releases。

unsigned

Self

_hashStateX

；

（

）

；

Self

_hashStateX

Self

_hashStateY

；

Self

_hashStateY

Self

_hashStateZ

；

Self

_hashStateZ

Self

_hashStateW

；

unsigned

Self

_hashStateW

；

（

））

（

））

；

Self

_hashStateW

；

value

；

}

value

markOopDesc

：：

hash_mask

；

（

value

）

value

0xBAD

；

assert

（

value

！=

markOopDesc

：：

no_hash

，

“invariant”

）

；

TEVENT

（

hashCode

：

GENERATE

）

；

return

value

；

}

該函式提供了基於某個hashCode 變數值的六種方法。怎麼生成最終值取決於hashCode這個變數值。

0 - 使用Park-Miller偽隨機數生成器（跟地址無關）

1 - 使用地址與一個隨機數做異或（地址是輸入因素的一部分）

2 - 總是返回常量1作為所有物件的identity hash code（跟地址無關）

3 - 使用全域性的遞增數列（跟地址無關）

4 - 使用物件地址的“當前”地址來作為它的identity hash code（就是當前地址）

5 - 使用執行緒區域性狀態來實現Marsaglia’s xor-shift隨機數生成（跟地址無關）

VM到底用的是哪種方法？

JDK 8 和 JDK 9 預設值：

product

（

intx

，

hashCode

，

“（Unstable） select hashCode generation algorithm”

）

；

JDK 8 以前預設值：

product（intx， hashCode， 0，“（Unstable） select hashCode generation algorithm”）；

不同的JDK，生成方式不一樣。

注意：

雖然方式不一樣，但有個共同點：java生成的hashCode和物件記憶體地址沒什麼關係。

是不是有點出乎意料呢？

修改生成方法？

HotSpot提供了一個VM引數來讓使用者選擇identity hash code的生成方式：

-XX：hashCode

什麼時候計算出來的？

在VM裡，Java物件會在首次真正使用到它的identity hash code（例如透過Object。hashCode（） / System。identityHashCode（））時呼叫VM裡的函式來計算出值，然後會儲存在物件裡，後面對同一物件查詢其identity hash code時總是會返回最初記錄的值。

因此，不是物件建立時。

這組實現程式碼在HotSpot VM裡自從JDK6的早期開發版開始就沒變過，只是hashCode選項的預設值變了而已。

實驗結果釋疑

上面的程式在執行到這個 hashCode（）呼叫時，VM看到物件之前還沒計算 identity hash code，才去計算並記錄它。

這樣的話，先 println（arr1）就會使得 arr0 所引用的陣列物件先被計算 identity hash code，在VM上就是從偽隨機數列中取出某一項，然後再 println（arr2）就會計算並記錄 arr2 所引用的陣列物件的 hash code，也就是取出那個偽隨機數列的下一項。反之亦然。

所以無論先 println（arr1）還是先 println（arr2），看到的都是 VM用來實現 identity hash code 的偽隨機數列的某個位置的相鄰兩項，自然怎麼交換都會看到一樣的結果。

驗證結論

int

［］

arr0

new

int

［

］；

int

［］

arr1

new

int

［

］；

arr0

。

hashCode

（）；

// 觸發arr0計算identity hash code

arr1

。

hashCode

（）；

// 觸發arr1計算identity hash code

// 試著交換下面兩行

System

。

out

。

println

（

arr0

）；

System

。

out

。

println

（

arr1

）；

執行後，可以看到輸出結果已經交換。

成功找到真兇，✌️ 。

標簽： hashCode hash Identity code self

上一篇:要累積多少違法犯罪，才能被認定為黑社會性質組織？

下一篇：有關我萌皇帝們的各種小黑料

hashCode，一個實驗引發的思考

猜你喜歡

深度學習 NLP-Transformer and variant (TF and 魔改s）

FastSpeech閱讀筆記

VS Code上也能玩轉Jupyter Notebook，這是一份完整教程

區塊鏈Python實現

影象相似度比較-pHash演算法（影象感知演算法）