Statistical Design

Sample Design

The sample design EMIF Norte and EMIF Sur have the objective of estimating the size of migration flows and the characteristics of the individuals who make up those flows. A measurement of flow is defined as the total number of humans moving through a geographical space over a determined period of time.

The period of time considered for this survey is a quarter of a year. After using a multistage sampling for 20 years to obtain these estimates (consult here:, it was decided to restructure the sample design and change it to a stratified sample design. This change took place as part of a joint evaluation by a working group of experts from the Center for Mathematical Investigation (CIMAT) and the participants of the survey projects.

Among the arguments the working group gave to explain the sample design change, what most stands out is that the previous design had sampling stages in which the probability of selection was one, while other sampling stages had a proportional allocation procedure. Thus was proposed that these levels should be incorporated as part of a stratification, which resulted in a significant simplification of the sample design. A brief description of the new stratified sample design is given below.

In the making of the sampling framework, two axes are created: time and space. The time axis is defined as the number of calendar days in the trimester, which can be 90, 91, or 92 (unless it is known that on certain days migration does not occur). Each one of the days is then divided into 1, 2, or 3 shifts, depending on the quantity of the migration flow and its distribution over the day's 24-hour period. The second axis is composed of a list of survey points located in border cities where migrants transit. This list of survey points is the product of exhaustive fieldwork in which the entire border region is visited, entry points are identified and the total immigration flow is measured over the course of a seven day period 24 hours a day.

The synthesis of the time and space axes defines the sampling framework of the study (see Table 1). Each of the cells in the table represents a combination of time and space, which we name a jornada. In terms of the sampling framework, each jornada is a sample point.

Table 1. Sampling Framework

Time axis
DAY 1 DAY 2 ... DAY 91
S1 S2 S3 S1 S2 S3 S1 S2 S3 S1 S2 S3
Spatial axis
Survey point 1
Survey point 2
Survey point 3
Survey point 4
Survey point 5
Survey point 6
Survey point n-1
Survey point n

Note: S=shift

Each of the jornadas has a known and non-zero probability of being selected, which makes it possible to obtain a random sample from, take observational data and estimate within a certain range of certainty and accuracy the total migration flow. The variables associated with migrant characteristics are estimated in the same way. To estimate these variables, a random selection of individuals is made within a tour. That is, a second sampling is done, in which the unit of selection is the individual migrating.

Sample Size and Distribution

First stage of sampling

In the first stage of sampling each analyzed flow has an independent sample size which is defined by statistical and budgetary criteria (see Table 2).

The selection of tours in each migration flow is based on a stratified sample design. This design seeks to take advantage of past data and analysis on the subject to divide the sample framework into subgroups that are characterized by greater homogeneity of the elements within the subgroup.

Each subgroup is defined by a combination of location and shift (time of day), without considering the day itself. For example, a subgroup includes all the tours that are defined by the interval from 8 a.m. to 4 p.m. at a bus station’s arrival gate. Thus, each analyzed flow has as many subgroups as there are combinations of location and shifts and as many tours in each subgroup as there are days in the trimester (except those days in the trimester in which we know there won't be any flow, which mainly occurs in the flows of repatriated migrants).

Table 2. Total of subgroups, tours, and sample size by flow, Northern Border Survey and Southern Border Survey

Flow of migrants Number of subgroups Total number of tours
Number of tours in the sample Number of individuals contacted
Average completed surveys
by trimester
Coming from the North 98 8,820 368 18,228 1,961
Coming from the South 86 7,740 377 16,606 2,142
Returned by U.S. Immigration Officials 20 1,748 153 1,046 1,046
Coming from the United States by plane 13 1,170 300 21,844 2,604
Coming from Guatemala 18 1,620 145 13,875 2,689
Coming from Mexico or the United States 14 1,260 188 20,700 3,899
Returned by Mexican Authorities 4 312 142 2,122 2,122
Returned by U.S. Authorities 3 169 78 1,606 1,606

(1) The number of tours by subgroup varies from quater to quater, according to the number of corresponding days, which can be 90, 91, or 92. In the case of flows of returning migrants, the number of tours varies according to the days of the week in which the migration authorities make the repatriations.
(2) This is the number of individuals given screening questions, of which some then continue to take the entire survey if they fall under a target population. This value varies depending on the trimester and year. 
(3) This is the total number of successful surveys after identifying the population under study that agrees to complete the survey. This value varies depending on the trimester and year.

For the selection of tours, first the sample size is distributed among the subgroups, assigning two tours to each subgroup. The rest are distributed proportionally according to the migrant flow distribution among the subgroups. Later, inside each subgroup the days of the quarter in which the fieldwork will be performed is chosen randomly and with equal probability. The number of days selected is equal to the sample size assigned to the subgroup.

Second stage of sampling

The second stage of the sampling occurs within a tour, on the individual level. The selection is random and is defined by the moment at which the individual arrives at the survey point. To better understand this process it is helpful to mention that the survey fieldwork is designed so that two researchers collaborate during the selected tours. Each one of them carries out one of two activities: A) count B) screen individuals and apply the questionnaire.

Task A: To count. It is very important for the Mexican Migration Survey methodology to record the total number of people that compose a migration flow during a certain tour, since this is an essential input to later estimate the total number of migrants. Thus, a continuous count is carried out: A researcher acting as an enumerator places oneself in an advantageous position to constantly observe the flow of people. The researcher then imagines a straight line crossing the ground, and counts the number of people who cross that line.

Task B: To interview. The other essential piece of fieldwork is the application of the survey, which is generally divided into two sections. The first section is the so-called screening process, a series of eight or nine brief questions given by the survey-taker to a randomly-selected individual to see if that person is part of the target population or not. In the case of identifying a migrant, the survey-taker continues to the second section, which is a more extensive questionnaire whose questions vary depending on the migration flow being analyzed (see Diagram 1).

The implementation of these tasks may vary depending on the circumstances of each migration flow. For example, in the flow of Mexican migrants being repatriated by U.S. authorities, or Central Americans being repatriated by Mexican authorities, the screening questions are not given since all the individuals are known to be study subjects. At other locations and crossing points, conditions may vary so that modifications of tasks A and B become necessary.

Diagram 1