The RandomForestClassifier in Sklearn has one parameter:
oob_score : bool (default=False)
Whether to use out-of-bag samples to estimate the generalization accuracy.
In Chinese, it is called ‘out of pocket error’. This parameter means: use OOB to measure test error.
About oob explanation, there is a more comprehensive explanation on stackoverflow: oob explanation
let me tell you my understanding:
RF needs to sampling from the original feature set and then split to generate a single tree. The training sample of each tree is derived from the original training set Boostraping. Due to the way boostraping is put back in the sample, the training set varies from tree to tree and is only a part of the original training set. For the TTH tree, the data in the original training set that is not in the TTH tree can be tested using the TTH tree. Now n(n is the size of the original data set) trees are generated, and the training sample size of each tree is N-1. For the ith tree, its training set does not include (xi, Yi) this sample. Use all the trees (N-1) that do not contain the (xi, YI) sample, and the result of VOTE is the test result of the final (xi, YI) sample.
This allows you to test while training, and experience shows that:
out-of-bag estimate is as accurate as using a test set of the same size as the training set.
Oob is an unbiased estimate of a test error.
To sum up: suppose Zi=(xi,yi).
The out-of-bag (OOB) error is the average error for each Zi calculated using predictions from the trees that do not contain Zi in their respective bootstrap sample. This allows the RandomForestClassifier to be fit and validated whilst being trained.
OOB explanation on stackoverflow
sklearn OOB explanation on stackoverflow
- The function of structured shuffleplit() in SK learn realizes the division of data set
- Random forest algorithm learning
- ConnectionResetError: [Errno 104] Connection reset by peer
- R language notes – sample() function
- [How to Fix] fatal git-write-tree error building trees
- Tensorflow error record: depreciation warning: elementwise
- ImportError: cannot import name ‘cross_validation’ from ‘sklearn’
- The usage details of SVM
- The command Du – h — max depth = 1 in Linux
- In machine learning, the prediction errors in sklearn, such as mean square error, etc
- Viewing Android dependency tree using gradle
- Keras saves save() and save in the model_ weights()
- Anaconda upgrade sklearn version
- [problem solving] target is multiclass but average =’binary ‘. Please choose another average setting
- InternalError: Failed to create session. Error and solution
- error C2061: syntax error : identifier ‘BinryTreeNode’
- Solve the problem that “figure size 640×480 with 1 axes” does not display pictures in jupyter notebook
- check a tree is balanced or not
- RuntimeError: CUDA error: out of memory solution (valid for pro-test)