一種好用的樹結構：Trie樹

作者：由致Great 發表于體育時間：2022-05-02

Trie樹簡介

在計算機科學中，trie，又稱字首樹或字典樹，是一種有序樹，用於儲存關聯陣列，其中的鍵通常是字串。與二叉查詢樹不同，鍵不是直接儲存在節點中，而是由節點在樹中的位置決定。一個節點的所有子孫都有相同的字首，也就是這個節點對應的字串，而根節點對應空字串。一般情況下，不是所有的節點都有對應的值，只有葉子節點和部分內部節點所對應的鍵才有相關的值。

Trie這個術語來自於retrieval。根據詞源學，trie的發明者Edward Fredkin把它讀作/ˈtriː/ “tree”。但是，其他作者把它讀作/ˈtraɪ/ “try”。

在圖示中，鍵標註在節點中，值標註在節點之下。每一個完整的英文單詞對應一個特定的整數。Trie可以看作是一個確定有限狀態自動機，儘管邊上的符號一般是隱含在分支的順序中的。 Eg。一個儲存了8個單詞的字典樹的結構如下圖所示，8個單詞分別是：“A”，“to”，“tea”，“ted”，“ten”，“i” ，“in”，“inn”。

另外，單詞查詢樹，Trie樹，是一種樹形結構，是一種雜湊樹的變種。典型應用是用於統計，排序和儲存大量的字串（但不僅限於字串），所以經常被搜尋引擎系統用於文字詞頻統計。它的優點是：利用字串的公共字首來減少查詢時間，最大限度地減少無謂的字串比較，查詢效率比雜湊樹高。

Trie樹性質

它有3個基本性質：

根節點不包含字元，除根節點外每一個節點都只包含一個字元；

從根節點到某一節點，路徑上經過的字元連線起來，為該節點對應的字串；

每個節點的所有子節點包含的字元都不相同。

基本操作

其基本操作有：查詢、插入和刪除，當然刪除操作比較少見。

實現方法

搜尋字典專案的方法為：

（1）從根結點開始一次搜尋；

（2）取得要查詢關鍵詞的第一個字母，並根據該字母選擇對應的子樹並轉到該子樹繼續進行檢索；

（3）在相應的子樹上，取得要查詢關鍵詞的第二個字母，並進一步選擇對應的子樹進行檢索。

（4）迭代過程……

（5）在某個結點處，關鍵詞的所有字母已被取出，則讀取附在該結點上的資訊，即完成查詢。其他操作類似處理

實現 Trie (字首樹)

關於Trie樹實現，可以移步看下LeetCode

208. 實現 Trie (字首樹)

輸入

［“Trie”， “insert”， “search”， “search”， “startsWith”， “insert”， “search”］

［［］，［“apple”］，［“apple”］，［“app”］，［“app”］，［“app”］，［“app”］］

輸出

［null， null， true， false， true， null， true］

解釋

Trie trie = new Trie（）；

trie。insert（“apple”）；

trie。search（“apple”）； // 返回 True

trie。search（“app”）； // 返回 False

trie。startsWith（“app”）； // 返回 True

trie。insert（“app”）；

trie。search（“app”）； // 返回 True

連結：https：//leetcode-cn。com/problems/implement-trie-prefix-tree

具體實現如下：

class TrieNode（object）：

def __init__（self）：

“”“

Initialize your data structure here。

”“”

self。data = {}

self。is_word = False

class Trie：

def __init__（self）：

“”“

Initialize your data structure here。

”“”

self。root = TrieNode（）

def insert（self， word）：

“”“

Inserts a word into the trie。

：type word： str

：rtype： void

”“”

node = self。root

for chars in word：

child = node。data。get（chars）

if not child：

node。data［chars］ = TrieNode（）

node = node。data［chars］

node。is_word = True

def search（self， word）：

“”“

Returns if the word is in the trie。

：type word： str

：rtype： bool

”“”

node = self。root

for chars in word：

node = node。data。get（chars）

if not node：

return False

return node。is_word # 判斷單詞是否是完整的存在在trie樹中

def startsWith（self， prefix）：

“”“

Returns if there is any word in the trie that starts with the given prefix。

：type prefix： str

：rtype： bool

”“”

node = self。root

for chars in prefix：

node = node。data。get（chars）

if not node：

return False

return True

def get_start（self， prefix）：

“”“

Returns words started with prefix

返回以prefix開頭的所有words

如果prefix是一個word，那麼直接返回該prefix

：param prefix：

：return： words （list）

”“”

def get_key（pre， pre_node）：

word_list = ［］

if pre_node。is_word：

word_list。append（pre）

for x in pre_node。data。keys（）：

word_list。extend（get_key（pre + str（x）， pre_node。data。get（x）））

return word_list

words = ［］

if not self。startsWith（prefix）：

return words

if self。search（prefix）：

words。append（prefix）

return words

node = self。root

for chars in prefix：

node = node。data。get（chars）

return get_key（prefix， node）

if __name__ == ‘__main__’：

trie = Trie（）

print（‘trie。insert（“apple”）：’， trie。insert（“apple”））

print（‘trie。insert（“appal”）：’， trie。insert（“appal”））

print（‘trie。insert（“appear”）：’， trie。insert（“appear”））

print（‘trie。insert（“apply”）：’， trie。insert（“apply”））

print（‘trie。insert（“appulse”）：’， trie。insert（“appulse”））

print（‘trie。search（“apple”）：’， trie。search（“apple”）） # 返回 True

print（‘trie。search（“app”）：’， trie。search（“app”）） # 返回 False

print（‘trie。startsWith（“app”）：’， trie。startsWith（“app”）） # 返回 True

print（‘trie。insert（“app”）：’， trie。insert（“app”））

print（‘trie。search（“app”）：’， trie。search（“app”））

print（‘trie。search（“app”）：’， trie。get_start（“app”））

print（‘trie。search（“ap”）：’， trie。get_start（‘ap’））

結果輸出如下：

F：\ProgramData\Anaconda3\python。exe F：/Projects/nlp-trie/main。py

trie。insert（“apple”）： None

trie。insert（“appal”）： None

trie。insert（“appear”）： None

trie。insert（“apply”）： None

trie。insert（“appulse”）： None

trie。search（“apple”）： True

trie。search（“app”）： False

trie。startsWith（“app”）： True

trie。insert（“app”）： None

trie。search（“app”）： True

trie。search（“app”）：［‘app’］

trie。search（“ap”）：［‘app’， ‘apple’， ‘apply’， ‘appal’， ‘appear’， ‘appulse’］

Process finished with exit code 0

應用

輸入框提示/自動補全：trie 常用於搜尋提示。如當輸入一個網址，可以自動搜尋出可能的選擇。當沒有完全匹配的搜尋結果，可以返回字首最相似的可能。

字串檢索、模糊匹配

文字預測、自動完成，see also，拼寫檢查

在NLP中的應用，主要有基於字典樹的文字分詞、短語提取、實體提取等

優缺點

優點：

可以最大限度地減少無謂的字串比較，故可以用於詞頻統計和大量字串排序。跟雜湊表比較：

最壞情況時間複雜度比hash表好

沒有衝突，除非一個key對應多個值（除key外的其他資訊）

自帶排序功能（類似Radix Sort），中序遍歷trie可以得到排序。

缺點：

雖然不同單詞共享字首，但其實trie是一個以空間換時間的演算法。其每一個字元都可能包含至多字符集大小數目的指標。

如果資料儲存在外部儲存器等較慢位置，Trie會較hash速度慢（hash訪問O（1）次外存，Trie訪問O（樹高））。

長的浮點數等會讓鏈變得很長。可用bitwise trie改進。

時間複雜度

時間複雜度：建立時間複雜度為O（L），查詢時間複雜度是O（logL），查詢時間複雜度最壞情況下是O（L），L是字串的長度。

參考資料

字典樹百度百科

Trie wikipedia

Trie Python實現

Trie樹學習及python實現

python 實現 trie(字典) 樹

Trie樹（字典樹）詳細知識點及其應用

標簽： trie APP insert node Search

上一篇:寵滴出行怎麼入駐

下一篇：雞蛋炒飯怎麼炒好吃？有沒有最好的方法

一種好用的樹結構：Trie樹

猜你喜歡

前端技術選型參考

elementel-tree 樹圖顏色配置

DLA：一種深度網路特徵融合方法

gparallel：一個基於DAG的任務排程框架

Stata: 外部命令的搜尋、安裝與使用