The RandomForestClassifier in Sklearn has one parameter:
oob_score : bool (default=False)
Whether to use out-of-bag samples to estimate the generalization accuracy.
In Chinese, it is called ‘out of pocket error’. This parameter means: use OOB to measure test error.
About oob explanation, there is a more comprehensive explanation on stackoverflow: oob explanation
let me tell you my understanding:
RF needs to sampling from the original feature set and then split to generate a single tree. The training sample of each tree is derived from the original training set Boostraping. Due to the way boostraping is put back in the sample, the training set varies from tree to tree and is only a part of the original training set. For the TTH tree, the data in the original training set that is not in the TTH tree can be tested using the TTH tree. Now n(n is the size of the original data set) trees are generated, and the training sample size of each tree is N-1. For the ith tree, its training set does not include (xi, Yi) this sample. Use all the trees (N-1) that do not contain the (xi, YI) sample, and the result of VOTE is the test result of the final (xi, YI) sample.
This allows you to test while training, and experience shows that:
out-of-bag estimate is as accurate as using a test set of the same size as the training set.
Oob is an unbiased estimate of a test error.
To sum up: suppose Zi=(xi,yi).
The out-of-bag (OOB) error is the average error for each Zi calculated using predictions from the trees that do not contain Zi in their respective bootstrap sample. This allows the RandomForestClassifier to be fit and validated whilst being trained.
reference
OOB explanation on stackoverflow
sklearn OOB explanation on stackoverflow
Read More:
- Random forest algorithm learning
- ERROR bag unindexed: 2021-6-23-0103all.bag. Run rosbag reindex.
- Random number random reports an error. Illegalargumentexception: bound must be positive
- 26 English letters in upper and lower case and 0-9 to generate 8-bit random password
- P – Random Teams
- numpy.random.rand()
- Python random selects elements randomly from a collection
- Matlab: Three forms of random number generated (Rand, Randi and Randn)
- Tensorflow image random_ There seems to be something wrong with the shift function
- Error in `./a.out‘: free(): invalid next size (fast): 0x0000000001da8010
- Viewing events.out.tfevents file visually in tensorboard
- Port out of range: – 1 for Tomcat startup in idea
- R language-Error in file(out, “wt”): Unable to open the link problem solved
- Error: command error out with exit status 1: Python when installing mysqlclient in django2.0 setup.py egg_ info
- Nohup command in Linux: nohup: assigning input and attaching output to‘ nohup.out ’
- Error: command error out with exit status 1: Python when installing mysqlclient in centos7 setup.py egg_ info Check
- CheXNet-master: CUDA out of memery [How to Solve]
- RuntimeError: CUDA error: out of memory solution (valid for pro-test)
- MobaXterm error cuda:out of memory
- Runtimeerror using Python training model: CUDA out of memory error resolution