如何根據物種拉丁名找到其在NCBI Taxonomy所處的位置
問題描述:
我想知道某個物種在NCBI的分類系統裡被歸為哪個目、哪個科、哪個屬? 單個物種可以手動NCBI網站檢索,如果物種數非常多如何實現?
之前讀
ete3
的幫助文件的時候看到過類似的功能
http://
etetoolkit。org/docs/lat
est/tutorial/tutorial_ncbitaxonomy。html
。最近可能會用到這個功能,記錄自己使用的程式碼 (首先是安裝ete3:自己windows10電腦安裝了Anaconda3,直接在DOS視窗下使用命令
pip install ete3
即可安裝)
單個物種
以石榴
(
Punica granatum
)
為例
from ete3 import NCBITaxa
ncbi = NCBITaxa
name2taxid = ncbi。get_name_translator([“Punica granatum”])
for a,b in name2taxid。items():
lineage = ncbi。get_lineage(b[0])
names = ncbi。get_taxid_translator(lineage)
for taxid in lineage:
print(names[taxid])
輸出結果
root
cellular organisms
Eukaryota
Viridiplantae
Streptophyta
Streptophytina
Embryophyta
Tracheophyta
Euphyllophyta
Spermatophyta
Magnoliophyta
Mesangiospermae
eudicotyledons
Gunneridae
Pentapetalae
rosids
malvids
Myrtales
Lythraceae
Punica
Punica granatum
多個物種 將物種拉丁名放到文字檔案裡,每行一個
Lumnitzera littorea
Punica granatum
Heimia myrtifolia
Sonneratia alba
Epilobium ulleungensis
程式碼
import sys
from ete3 import NCBITaxa
input_file = sys。argv[1]
output_file = sys。argv[2]
ncbi = NCBITaxa()
fw = open(output_file,“w”)
with open(input_file,“r”) as fr:
for line in fr:
species_name = line。strip()
name2taxid = ncbi。get_name_translator([species_name])
for a,b in name2taxid。items():
lineage = ncbi。get_lineage(b[0])
names = ncbi。get_taxid_translator(lineage)
i = 1
for taxid in lineage:
if i < len(lineage):
fw。write(names[taxid]+“,”)
i = i + 1
else:
fw。write(names[taxid]+“\n”)
print(species_name + “:”,“OK”)
fw。close()
#使用方法
python 。\get_species_placement_in_NCBI。py 。\Organism_name。txt placement。txt
#輸出結果
root,cellular organisms,Eukaryota,Viridiplantae,Streptophyta,Streptophytina,Embryophyta,Tracheophyta,Euphyllophyta,Spermatophyta,Magnoliophyta,Mesangiospermae,eudicotyledons,Gunneridae,Pentapetalae,rosids,malvids,Myrtales,Combretaceae,Lumnitzera,Lumnitzera littorea
root,cellular organisms,Eukaryota,Viridiplantae,Streptophyta,Streptophytina,Embryophyta,Tracheophyta,Euphyllophyta,Spermatophyta,Magnoliophyta,Mesangiospermae,eudicotyledons,Gunneridae,Pentapetalae,rosids,malvids,Myrtales,Lythraceae,Punica,Punica granatum
root,cellular organisms,Eukaryota,Viridiplantae,Streptophyta,Streptophytina,Embryophyta,Tracheophyta,Euphyllophyta,Spermatophyta,Magnoliophyta,Mesangiospermae,eudicotyledons,Gunneridae,Pentapetalae,rosids,malvids,Myrtales,Lythraceae,Heimia,Heimia myrtifolia
root,cellular organisms,Eukaryota,Viridiplantae,Streptophyta,Streptophytina,Embryophyta,Tracheophyta,Euphyllophyta,Spermatophyta,Magnoliophyta,Mesangiospermae,eudicotyledons,Gunneridae,Pentapetalae,rosids,malvids,Myrtales,Lythraceae,Sonneratia,Sonneratia alba
root,cellular organisms,Eukaryota,Viridiplantae,Streptophyta,Streptophytina,Embryophyta,Tracheophyta,Euphyllophyta,Spermatophyta,Magnoliophyta,Mesangiospermae,eudicotyledons,Gunneridae,Pentapetalae,rosids,malvids,Myrtales,Onagraceae,Onagroideae,Epilobieae,Epilobium,Epilobium ulleungensis
歡迎大家關注我的公眾號
小明的資料分析筆記本