1. Decode Issues
1. Error Messages: UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0xd7 in position 1
2. Error Codes:
Insert # -*- coding: utf8 -*-
def write_review(review, stop_words_address, stop_address, no_stop_address):
"""Write out the processed data"""
no_stop_review = remove_stop_words(review, stop_words_address)
with open(no_stop_address, 'a') as f:
for row in no_stop_review:
f.write(row[-1])
f.write("\n")
f.close()
if __name__ == '__main__':
segment_text = cut_words('audito_whole.csv')
review_text = remove_punctuation(segment_text)
write_review(review_text, 'stop_words.txt', 'no_stop.txt', 'stop.txt')
# Model Training Master Program
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)
sentences_1 = word2vec.LineSentence('no_stop.txt')
model_1 = word2vec.Word2Vec(sentences_1)
# model.wv.save_word2vec_format('test_01.model.txt', 'test_01.vocab.txt', binary=False) # 保存模型,后面可直接调用
# model = word2vec.Word2Vec.load("test_01.model") # Calling the model
# Calculate the list of related words for a word
a_1 = model_1.wv.most_similar(u"Kongjian", topn=20)
print(a_1)
# Calculate the correlation of two words
b_1 = model_1.wv.similarity(u"Kongjian", u"Houzuo")
print(b_1)
3. Error parsing: this error means that the character encoded as 0xd7 cannot be parsed during decoding. This is because the encoding method is not specified when saving the text, so the GBK encoding may be used when saving the text
4. Solution: when writing out the data, point out that the coding format is UTF-8, as shown in the following figure
def write_review(review, stop_words_address, stop_address, no_stop_address):
"""Write out the processed data"""
with open(stop_address, 'a', encoding='utf-8') as f:
for row in review:
f.write(row[-1])
f.write("\n")
f.close()
no_stop_review = remove_stop_words(review, stop_words_address)
with open(no_stop_address, 'a', encoding='utf-8') as f:
for row in no_stop_review:
f.write(row[-1])
f.write("\n")
f.close()
5. Expansion – why did this mistake happen
first of all, we click the source code file of the error report, as shown in the following figure:
open utils, and we find the following code
it can be seen that the encoding format set by word2vec is UTF-8. If it is not UTF-8 during decoding, errors = ‘strict’ will be triggered, and then UnicodeDecodeError will be reported
2. Attribute error
1 Error reporting: attributeerror: ‘word2vec’ object has no attribute ‘most_similar’
2. Wrong source code:
# Model Training Master Program
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)
sentences_1 = word2vec.LineSentence('no_stop.txt')
model_1 = word2vec.Word2Vec(sentences_1)
# model.wv.save_word2vec_format('test_01.model.txt', 'test_01.vocab.txt', binary=False) # 保存模型,后面可直接调用
# model = word2vec.Word2Vec.load("test_01.model") # Calling the model
# Calculate the list of related words for a word
a_1 = model_1.most_similar(u"Kongjian", topn=20)
print(a_1)
**3. Analysis of error reports: * * error reports refer to attribute errors because the source code structure has been updated in the new word2vec. See the official website for instructions as follows:
**4. Solution: * * replace the called object as follows:
# Model Training Master Program
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)
sentences_1 = word2vec.LineSentence('no_stop.txt')
model_1 = word2vec.Word2Vec(sentences_1)
# model.wv.save_word2vec_format('test_01.model.txt', 'test_01.vocab.txt', binary=False) # 保存模型,后面可直接调用
# model = word2vec.Word2Vec.load("test_01.model") # Calling the model
# Calculate the list of related words for a word
a_1 = model_1.wv.most_similar(u"Kongjian", topn=20)
print(a_1)
5. Why is model_1.wv.most_similar(), I didn’t find the statement about WV definition in word2vec. Does anyone know?
Read More:
- How to Solve Python AttributeError: ‘module’ object has no attribute ‘xxx’
- How to Solve Pytorch DataLoader Loading Error: UnicodeDecodeError: ‘utf-8‘ codec can‘t decode byte 0xe5 in position 1023
- Python: How to parses HTML, extracts data, and generates word documents
- How to Solve attributeerror: ‘list’ object has no attribute ‘shape‘
- How to Solve Python AttributeError: ‘dict’ object has no attribute ‘item’
- How to Solve PyInstaller Package Error: ModuleNotFoundError: No module named ‘xxxx‘
- How to Solve pyqt5 imports module Error
- Python: How to Solve multiprocessing module Error in Windows
- How to Solve no module fcntl Error in Windows
- [Mac M1] How to Solve import wordcloud Error: ModuleNotFoundError: No module named ‘wordcloud‘
- How to Solve wikiextractor Extract Wikipedia Corpus Error
- How to Solve Error: Error no module “Tkinter”
- [Python] Right-click Selenium to Save the picture error: attributeerror: solution to module ‘pyscreen’ has no attribute ‘locationonwindow’
- [Solved] AttributeError: module ‘logging‘ has no attribute ‘Handler‘
- [Solved] UnicodeDecodeError: ‘utf-8‘ codec can‘t decode byte 0xbf in position 7: invalid start byte
- [Solved] AttributeError: module ‘PIL.Image‘ has no attribute ‘open‘
- [Perfectly Solved] attributeerror: module ‘SciPy. Misc’ has no attribute ‘toimage’ error
- from keras.preprocessing.text import Tokenizer error: AttributeError: module ‘tensorflow.compat.v2‘ has..
- Python AttributeError: module ‘tensorflow‘ has no attribute ‘InteractiveSession‘
- [Solved] AttributeError: module ‘tensorflow._api.v2.train‘ has no attribute ‘AdampOptimizer‘