前往
大廳
主題

tf_idf python

山上樵夫 | 2020-12-09 15:47:40 | 巴幣 0 | 人氣 173

https://codertw.com/%E7%A8%8B%E5%BC%8F%E8%AA%9E%E8%A8%80/363018/
從這邊抄來的做個筆記




from collections import Counter

def tf(word, count):
    return count[word] / sum(count.values())
def n_containing(word, count_list):
    return sum(1 for count in count_list if word in count)
def idf(word, count_list):
    return math.log(len(count_list) / (1+n_containing(word, count_list)))
def tfidf(word, count, count_list):
    return tf(word, count) * idf(word, count_list)


count =Counter(       一個list 放滿了元素          )
count_list = list放滿了counter()



可參考
https://docs.python.org/zh-tw/3/library/collections.html#counter-objects
type(count )
----------->collections.Counter     
是dict的子類,一種集合(set)
可用
clear
copy
elements
fromkeys
get
items
keys
most_common
pop
popitem
setdefault
subtract
update
values
其實不用記,用tab補全

https://nlp.stanford.edu/IR-book/html/htmledition/document-and-query-weighting-schemes-1.html

tf_idf的參考



創作回應

更多創作