What is the sample that has the equal sampling?

Since it is generally impossible to study an entire population (every individual in a country, all college students, every geographic area, etc.), researchers typically rely on sampling to acquire a section of the population to perform an experiment or observational study. It is important that the group selected be representative of the population, and not biased in a systematic manner. For example, a group comprised of the wealthiest individuals in a given area probably would not accurately reflect the opinions of the entire population in that area. For this reason, randomization is typically employed to achieve an unbiased sample. The most common sampling designs are simple random sampling, stratified random sampling, and multistage random sampling.

Simple Random Sampling

Simple random sampling is the basic sampling technique where we select a group of subjects (a sample) for study from a larger group (a population). Each individual is chosen entirely by chance and each member of the population has an equal chance of being included in the sample. Every possible sample of a given size has the same chance of selection.
(Definition taken from Valerie J. Easton and John H. McColl's Statistics Glossary v1.1)

Stratified Random Sampling

There may often be factors which divide up the population into sub-populations (groups / strata) and we may expect the measurement of interest to vary among the different sub-populations. This has to be accounted for when we select a sample from the population in order that we obtain a sample that is representative of the population. This is achieved by stratified sampling.

A stratified sample is obtained by taking samples from each stratum or sub-group of a population.

When we sample a population with several strata, we generally require that the proportion of each stratum in the sample should be the same as in the population.

Stratified sampling techniques are generally used when the population is heterogeneous, or dissimilar, where certain homogeneous, or similar, sub-populations can be isolated (strata). Simple random sampling is most appropriate when the entire population from which the sample is taken is homogeneous. Some reasons for using stratified sampling over simple random sampling are:

a) the cost per observation in the survey may be reduced;
b) estimates of the population parameters may be wanted for each sub-population;
c) increased accuracy at given cost.

Example

Suppose a farmer wishes to work out the average milk yield of each cow type in his herd which consists of Ayrshire, Friesian, Galloway and Jersey cows. He could divide up his herd into the four sub-groups and take samples from these.
(Definition and example taken from Valerie J. Easton and John H. McColl's Statistics Glossary v1.1)

Multistage Random Sampling

A multistage random sample is constructed by taking a series of simple random samples in stages. This type of sampling is often more practical than simple random sampling for studies requiring "on location" analysis, such as door-to-door surveys. In a multistage random sample, a large area, such as a country, is first divided into smaller regions (such as states), and a random sample of these regions is collected. In the second stage, a random sample of smaller areas (such as counties) is taken from within each of the regions chosen in the first stage. Then, in the third stage, a random sample of even smaller areas (such as neighborhoods) is taken from within each of the areas chosen in the second stage. If these areas are sufficiently small for the purposes of the study, then the researcher might stop at the third stage. If not, he or she may continue to sample from the areas chosen in the third stage, etc., until appropriately small areas have been chosen.

The secret to minimizing biased data!

Image created by Author

Be sure to subscribe to never miss another article on data science guides, tricks and tips, life lessons, and more!

Introduction

“Why should I care about random sampling?”

Here’s why you should know about random sampling.

If you’re a data scientist and want to develop models, you need data.

And if you need data, SOMEONE needs to collect data.

And if someone is collecting data, they need to make sure that it is not biased or it will be extremely costly in the long run.

Therefore, if you want to collect unbiased data, then you need to know about random sampling!

What exactly is random sampling?

Random sampling simply describes when every element in a population has an equal chance of being chosen for the sample.

Sounds simple right? Unfortunately, it’s a lot easier said than done. This is because there are a lot of logistics that need to be considered in order to minimize the amount of bias.

Be sure to subscribe to never miss another article on data science guides, tricks and tips, life lessons, and more!

Random Sampling Techniques

There are 4 types of random sampling techniques:

1. Simple Random Sampling

Simple random sampling requires using randomly generated numbers to choose a sample. More specifically, it initially requires a sampling frame, a list or database of all members of a population. You can then randomly generate a number for each element, using Excel for example, and take the first n samples that you require.

Image Created by Author

To give an example, imagine the table on the right was your sampling frame. Using a software like Excel, you can then generate random numbers for each element in the sampling frame. If you need a sample size of 3, then you would take the samples with the random numbers from 1 to 3.

2. Stratified Random Sampling

Stratified random sampling starts off by dividing a population into groups with similar attributes. Then a random sample is taken from each group.

Image created by Author

This method is used to ensure that different segments in a population are equally represented. To give an example, imagine a survey is conducted at a school to determine overall satisfaction. It might make sense here to use stratified random sampling to equally represent the opinions of students in each department.

3. Cluster Random Sampling

Cluster sampling starts by dividing a population into groups, or clusters. What makes this different that stratified sampling is that each cluster must be representative of the population. Then, you randomly selecting entire clusters to sample.

Image Created by Author

For example, if an elementary school had five different grade eight classes, cluster random sampling might be used and only one class would be chosen as a sample, for example.

4. Systematic Random Sampling

Systematic random sampling is a very common technique in which you sample every k’th element. For example, if you were conducting surveys at a mall, you might survey every 100th person that walks in, for example.

If you have a sampling frame then you would divide the size of the frame, N, by the desired sample size, n, to get the index number, k. You would then choose every k’th element in the frame to create your sample.

Using the same example, if we wanted a desired sample size of 2 this time, then we would take every 3rd row in the sampling frame.

Thanks for Reading!

If you enjoyed this, be sure to subscribe to never miss another article on data science guides, tricks and tips, life lessons, and more!

If you made it to the end, you should now have an understanding of what random sampling is and several techniques that are commonly used to conduct it. This is extremely important to minimize bias, and thus, create better models.

Not sure what to read next? I’ve picked another article for you:

Terence Shin

  • If you enjoyed this, follow me on Medium for more
  • Follow me on Kaggle for more content!
  • Let’s connect on LinkedIn
  • Interested in collaborating? Check out my website.
  • Check out my free data science resource with new material every week!

Which sampling is based on equal?

Simple random sampling. In simple random sampling (SRS), each sampling unit of a population has an equal chance of being included in the sample. Consequently, each possible sample also has an equal chance of being selected.

What is a good example of sampling?

An example of a simple random sample would be the names of 25 employees being chosen out of a hat from a company of 250 employees. In this case, the population is all 250 employees, and the sample is random because each employee has an equal chance of being chosen.

Are stratified samples equal?

These subsets of the strata are then pooled to form a random sample. Stratified sampling is used to highlight differences among groups in a population, as opposed to simple random sampling, which treats all members of a population as equal, with an equal likelihood of being sampled.

What is random and stratified sampling?

A simple random sample is used to represent the entire data population and randomly selects individuals from the population without any other consideration. A stratified random sample, on the other hand, first divides the population into smaller groups, or strata, based on shared characteristics.