Tag Archives: Machine learning WEKA use

Description of classifier index in Weka

Correlation coefficient (= CC) :
Correlation coefficient, Correlation Coecient, is the statistical Correlation between the real value A and the predicted value P. It is a real number between [-1,1]. 1 means completely correlated, 0 means completely unrelated, and -1 means inversely completely correlated. For a numerical prediction model, the closer the correlation coefficient is to 1, the better the prediction ability is, while other error correlation measures are smaller and closer to 0, the better. The mean square error is the most common basic method, which is not available in the program, but the root mean square error can be obtained.
Mean Absolute error and Root Mean squared error: Root Mean square error and relative square Root error. To measure the difference between the predicted value of the classifier and the actual result, the smaller the better.

Relative Absolute error and Root Relative Squared error: For example, when the actual value is 500 and the predicted value is 450, the absolute error is 50. When the actual value is 2 and the predicted value is 1.8, the absolute error is 0.2. The two Numbers 50 and 0.2 have a big difference, but the error rate expressed is 10%, so sometimes the absolute error cannot reflect the true size of the error, while the relative error reflects the error size by reflecting the proportion of the error in the truth value, which is more effective.

for details, please refer to

TP, FP:
TP represents the recognition rate, the probability of identifying an instance of a class. Improving the recognition rate is very important in the medical system, if the patient is sick, but not recognized, the consequences are very serious! FP represents the miscalculation rate, and how much probability is there to classify the instance identification cost for instances of other classifications.

Precision: Exactly. Represents the ratio of the correct number of instances to the total number in a classification of a category. That’s TP over TP plus FP.

-Leonard: What’s the Recall rate?Represents the number of instances that are correctly identified as the total number of instances in the class. Since there are no unrecognized instances in this example, Recall=TP.

F – Measure:
This value is a combination of precision and recall rate, in real life is often a trade-off between precision and recall rate, so the introduction of the F value, F value, the greater the precision and recall rate is relatively high, see: http://baike.baidu.com/link?Url = 3 motzt44pst0quciabcnqnihv – RI3XrfldYTZrPRxq6uEnttl – IQnVC – c2HOJ3jTvAXgXKSi3htc86bsamPoQq
Accuracy (= ACC) :
Accuracy

ROC Area: The ROC Area is generally greater than 0.5. The closer this value is to 1, the better the classification effect of the model is. This value has a low accuracy at 0.5-0.7, a certain accuracy at 0.7-0.9, and a high accuracy above 0.9. If the value is equal to 0.5, the classification method is completely ineffective and has no value.

Confusion Matrix: the “7” of the first row means that seven instances of the A case are correctly classified, and the “2” of the first row means that two of the A cases are incorrectly classified as B.

The “3” in the second row means that there are 3 instances of B that were misclassified, and the “2” in the second row means that there are 2 cases of B that were correctly classified.

Note that Correction coefficient applies only to the continuous value category and Accuracy only to the discrete category
Kappa statistic:
The Kappa statistical index has the degree of difference between the classification results of the classifier and the random classification. K=1 indicates that the classifier is completely different from random classification, K=0 indicates that the classifier is identical with random classification (i.e., the classifier has no effect), and K=-1 indicates that the classification is worse than random classification. Generally speaking, the results of the Kappa statistical index are positively correlated with the AUC index and accuracy of the classifier, so the closer the value is to 1, the better.

Mean Absolute error:
This indicator is used to evaluate the difference between the predicted value and the actual value. The degree to which the measured values are close to each other is called precision. Precision is expressed by deviation, which refers to the difference between the measured value and the average value. The smaller the deviation, the higher the precision.
Root mean square Error: RMSE:
The square root of the mean sum of squares with weighted residuals as a numerical indicator for measuring accuracy under certain conditions. The mean error is a numerical measure of observation accuracy, also known as “standard deviation” or “root-mean-square difference”. The square root of a number in the square of a set of true errors under the same observation conditions. Since the true error is not easy to find, the observed correction usually obtained by the least square method is used to replace the true error. It’s the square root of the square root of the deviation between the observed value and the truth value and the ratio of the observed number n. The mean error is not equal to the true error, it is just a representative value of a set of true errors. The mean error reflects the accuracy of the group of observed values. Therefore, the mean error is usually called the mean error of the observed values.
Coverage of cases:
The coverage of the case, which is the coverage of all instances by the classification rules used by the classifier, the higher the percentage, the more effective the rule is.