The root cause lies in the incorrect use of fit, transform and fit_transform.
First, make it clear that the incoming parameters can be possible causes of series
errors:

When using TfidfVectorizer(), transform after fit_transform

``````def tfidf(X_train):
tfidf_vec = TfidfVectorizer()
X_train_tfidf = tfidf_vec.fit_transform(X_train)
X_train_tfidf = tfidf_vec.transform(X_train_tfidf)
return X_train_tfidf``````

Here the result of fit_transform is passed to transform again, and the transform is repeated

The correct way to use it should be:

``````def tfidf(X_train,X_test):
tfidf_vec = TfidfVectorizer()
X_train_tfidf = tfidf_vec.fit_transform(X_train)
X_test_tfidf = tfidf_vec.transform(X_test)
return X_train_tfidf,X_test_tfidf``````

fit_transform into the training set, transform into the test set

Or you can fit first and then transform

``````def tfidf(data):
tfidf_vec = TfidfVectorizer()
tfidf_model = tfidf_vec.fit(data)
print(tfidf_model.dtype)
X_tfidf = tfidf_model.transform(data)
return X_tfidf``````

You can also use CountVectorizer() and TfidfTransformer()

``````def tfidf(data):
vectorizer = CountVectorizer() # This class will transform the words in the text into a word frequency matrix, and the matrix element a[i][j] represents the word frequency of word j under class i text
transformer = TfidfTransformer() # This class will count the tf-idf weights of each word
tfidf_before = vectorizer.fit_transform(data)
tfidf = transformer.fit_transform(tfidf_before)
return tfidf``````

Note that the first type is divided into training set and test set. It needs to be divided into training set and test set before
inputting parameters. The latter two need to extract tfidf features and then divide training set and test set,
otherwise it may cause training set and test set. The feature dimensions of the set are different, and they are not transformed under the same standard