Field studies and sample selection in medical statistics or epidemiology often use one word: random sampling. Random sampling is an important method to ensure the equilibrium among comparison groups. So the first function introduced today is the function sample for sampling:
> x=1:10
> sample(x=x)
[1] 3 5 9 6 10 7 2 1 8 4
The first line represents assigning the x vector 1 to 10, and the second line represents random sampling of the x vector. The output is the result of each sampling, and it can be seen that the sampling is not put back — at most n times, n is the number of elements in the x vector.
if you want to specify the number of elements extracted from the vector, you need to add a parameter size:
> x=1:1000
> sample(x=x,size=20)
[1] 66 891 606 924 871 374 879 573 284 305 914 792 398 497 721 897 324 437
[19] 901 33
This is sampled in positive integers from 1 to 1000, where size specifies the number of times the sample is sampled, 20 times, and the result is shown above.
These are not put back into the sample. No put back sampling means that once an element is selected, there will be no more of that element in the population. If the sample is put back, a parameter repalce=T should be added:
> x=1:10
> sample(x=x,size=5,replace=T)
[1] 4 7 2 4 8
“Replace” means to repeat. So you can sample the elements repeatedly, which is what’s called a put back sample. We look at the results above. Element 4 is selected twice in the course of 5 random sampling.
R language code has a feature is “contraption”, maybe my word is not professional, but it means: if we enter the position of the code corresponds to the position of the parameters in a function, we can not write the parameters of the function, such as:
> x=1:10
> sample(x,20,T)
[1] 1 2 2 1 5 5 5 9 9 5 2 9 8 3 4 8 8 8 1 1
In the above code, we have omitted the parameters x, size and Repalce, but it can still be evaluated and indicates that the x vector is put back to random extraction 20 times. The reason we try to take parameters with us every time we write code is because I think it’s a good habit and it looks clear. In addition, if you are familiar with the location of a function’s arguments, you will get the wrong result if there is no “counterpoint”. And many functions have too many arguments to remember where they are. If the parameters are taken, the operation can be carried out even if the positions do not correspond:
> x=1:10
> sample(size=20,replace=T,x=x)
[1] 4 9 2 6 4 5 4 7 10 5 2 2 3 4 2 4 6 8 7 8
This advantage is obvious, not only clear, but also has no counterpart. And we can also see that if you put it back, the size is infinite, and if you don’t put it back, the size depends on the size of the population.
for the roll of dice, the roll of a coin (this is probably a necessary introduction to sampling), is a put back sampling.
It should be explained here that for the SAMPLE function, the parameter x can be either a numerical value or a character. In fact, the parameter x represents any vector:
> a=c("A","B")
> sample(x=a,size=10,replace=T)
[1] "B" "A" "A" "A" "B" "A" "A" "B" "A" "A"
The code above can be interpreted as the number of flips of A coin, in which heads (A) and tails (B) occur 10 times.
above mentioned sampling process, each element is drawn with equal probability, called random sampling.
Sometimes our probability of extracting elements may not be equal (for example, common binomial distribution probability problems). In this case, we need to add a parameter prob, which is the abbreviation of “probability”. If a doctor performs an operation on a patient with an 80% chance of success, how many times can he operate on 20 patients today?The code is as follows:
> x=c("S","F")
> sample(x,size=20,replace=T,prob=c(0.8,0.2))
[1] "F" "S" "S" "S" "S" "S" "S" "S" "S" "S" "S" "S" "F" "S" "S" "F" "S" "S"
[19] "F" "S"
Where “S” stands for success and “F” for failure.
> x=c(1,3,5,7)
> sample(x,size=20,replace=T,prob=c(0.1,0.2,0.3,0.9))
[1] 3 5 7 3 7 3 7 5 3 7 7 7 1 5 7 5 7 7 3 7
These codes tell us that each element can be given a probability, and each probability is independent, that is, in the parameter PROb, the probability of all elements does not necessarily add up to 1, it only represents the probability of an element being extracted.
for the sample function, the parameter x can be any object in R (such as the sample character above). Another of the same functions is sample.int, short for “intger” or “integer.” Its argument n must be a positive integer:
> x=-10.5:7.5
> sample(x=x,size=3);sample.int(n=x,size=3)
[1] -5.5 -7.5 0.5
Error in sample.int(x, size = 3) : invalid first argument
The first line of code generates an arithmetic sequence of -10.5 to 7.5. The first line of output is the result of SAMPLE. The second line is the result of sample.int with an error: “First argument invalid” because it is not a positive integer. The rest of the usage is the same as sample.
pick from http://www.wtoutiao.com/p/186VWin.html
Read More:
- Solving the problem of saving object set by save() function in R language
- Use of rep function in R
- R learning notes (1) — ARIMA model
- Usage and examples of three important functions of tidyr package in R language: gather, spread and separate
- Renaming the column name of data frame in R language
- Installation and use of R language ggmap package
- R language: na.fail and na.omit
- R language error messages and related solutions
- R language – path setting and working directory modification
- Solutions to the failure of R language loading rjava
- R language error in hist.default ():’x’must be a value
- Analysis of R language error replacement has length zero problem
- The sparse matrix of R language is too large to be used as.matrix
- The function and usage of argc and argv in C language
- Trivia: How does R language solve Error in ts(x):’ts’ object must have one or more observations
- Error analysis of multiple linear regression in R language model.frame.default
- The solution of “error in NLS loop more than 50” in R language
- R language-Error in file(out, “wt”): Unable to open the link problem solved
- The C language qsort() function reports an error for overflow of – 2147483648 and 2147483648
- In R language, for loop or array truncation, the following error occurs only 0’s may be mixed with negative subscripts