The CHARLS national baseline survey was conducted in 28 provinces, 150 countries/districts, 450 villages/urban communities, across the country. The CHARLS sample is representative of people aged 45 and over, living in households; institutionalized elderly are not sampled, but Wave 1 respondents who later enter into an institution will be followed. All samples were drawn in four stages.
At the first stage, all county-level units with the exception of Tibet were sorted (stratified) by region, within region by urban district or rural county, and by GDP per capita. Region was a categorical variable based on the NBS division of province area. After this sorting (stratification), the population of each county was listed, along with the cumulative population (populations of each county plus all the counties higher on the list). If N is the total population of all the county-level units and 150 is the number of counties to be sampled, then define an interval n=N/150. The first county was selected by choosing a random number r from 0 to 1, and selecting the first neighborhood with cumulative population greater than rn. Then the interval n was added to this starting point, and the second county was the first county on the list with cumulative population greater than rn+n. The third county was chosen by once again adding the interval n, and picking the first county on the list with cumulative population greater than r*n+n+n.
Our sample used administrative villages (cun) in rural areas and neighborhoods (shequ) in urban areas, which comprise one or more former resident committees (juweihui), as primary sampling units (PSUs). We selected 3 PSUs within each county-level unit, using PPS (probabilities proportional to size) sampling. Note that rural counties contain both rural villages and urban neighborhoods and it is also possible for urban districts to contain rural administrative villages. For each county-level unit, the list of all PSUs was randomly sorted. Then, the population of each PSU was listed, along with the cumulative population (populations of each PSU plus all the PSUs higher on the list). If N is the total population of the county-level unit and 3 is the number of PSUs to be sampled, then define an interval n=N/3. The first PSU is selected by choosing a random number r from 0 to 1, and selecting the first neighborhood with cumulative population greater than rn. Then the interval n is added to this starting point, and the second PSU is the first PSU on the list with cumulative population greater than rn+n. The third PSU is chosen by once again adding the interval n, and picking the first PSU on the list with cumulative population greater than r*n+n+n. This procedure was implemented using the Stata command samplepps.
In neighborhoods with very large populations (over 2000 households), given the high costs of preparing map-based sampling frames, supervisors were permitted to select a geographic subset of the neighborhood as the PSU, for example one or more former neighborhood committees (juweihui) in the community (shequ). Enough sub-neighborhoods were to be sampled to ensure that there were a sufficient number of eligible sample respondents. Sub-neighborhoods would then be selected based on the estimated population of each sub-neighborhood There were 30 communities that had to be split this way.
Due to mistakes in the original sampling frame, of the 450 communities originally chosen, we had to replace 6 for the following reasons: two villages disappeared due to resettlement, one urban community was expanded to becoming a county-level urban district, two communities were nearly entirely collective dwelling residents, one being university dormitories and the other being prison, which are not supposed to be part of our samples. The choice of replacement communities followed the exact procedure outlined above. In 6 counties, the administrative boundaries changed so that the chosen communities fell within two counties. We did not replace these communities. As a result, the final number of counties becomes 156.
In each PSU, we selected a sample of dwellings from our frame, which was constructed based on maps prepared by mappers/listers with the support of local informants. In order to get accurate sample frame of household in each village or community, a mapping/listing software named CHARLS-GIS was developed. For each PSU, a mapper was first sent to the community with a GPS unit to collect the boundary, then the CHARLS office used the boundary information to capture Google Earth map images, which were used as the basis for the mapping and listing. Then, all buildings in each PSU were enumerated with photos and GPS readings, and dwellings within each building were listed. Collective living dwellings such as military bases, schools, dormitories or nursing homes, were excluded.
Then each PSU sampling frame was checked by the CHARLS headquarters to ensure that all buildings within the community boundary were enumerated. After verification, the supervisors used CHARLS-GIS software to randomly sample 80 households, which were marked on the map and sent back to mappers/listers in the field to collect information for these households including age of the oldest person, name of household head, telephone number, and whether the dwelling unit was empty or not. The number of households sampled was greater than the targeted sample size of 24 households per PSU in anticipation of sampled households’ not having any members aged 45 or older, the possibility of an empty house and household non-response. . Based on this information, the supervisor randomly sampled a specific number of households for each community/village using the CHARLS-GIS software. The initial sampling was a random sample from the 80 households. From these households we computed the fraction of households that were age-eligible and the number of empty dwellings. From this we derived neighborhood/village-specific sampling proportions and then chose our sample from the entire sampling frame.
After final sampling work in the PSU was completed, the information on the sampled households was sent back to the mappers/listers, who loaded this information in the CHARLS-GIS software on their computer. The mappers/listers then sent ‘A letter to the respondent’. Simultaneously, the IT in CHARLS project office transferred the sampled household lists and addresses for a given PSU to the interviewer’s CAPI system.
We interviewed all age-eligible sample households in each PSU who were found and willing to participate in the survey. Some dwellings had multiple households living in them. In these cases we randomly chose one household that had an age-eligible member. Thus, variation in the share of sampled households that could be found, had an age-eligible member, or were willing to participate in the survey led to different numbers of completed household surveys in each PSU. This is corrected for in the sampling weights.
In each sampled household, a short screening form was used to identify whether the household had a member meeting our age eligibility requirements. If a household had persons older than 40 and meeting our residence criterion, we randomly selected one of them. If the chosen person is 45 or older, then he/she becomes a main respondent and also interviewed his or her spouse. If the chosen person is between ages 40 and 44 he/she is reserved as a refresher sample for future rounds of survey. If an age-eligible person was too frail to answer questions, we identified a proxy respondent to help him/her to answer questions, usually a spouse or knowledgeable adult child, if there was one in the house. Households without members 45 years or older were not interviewed.
Questions concerning household roster in section A, household organization and financial transfers in section C were answered by the “Family Respondent”, who could be either the main respondent or the spouse of the main respondent; whenever possible the person chosen was the individual most able to answer the questions in these sections accurately.
Similarly, a “Financial Respondent” was chosen to answer questions on family income, expenditure, and assets. In this case, any household member aged 18 or above could be selected as the financial respondent (including the main respondent and spouse), with the main criteria again being which person is most knowledgeable about these matters.
The following is the gender,age and area distribution of all respondents.