• More support for book chapters

"Progress in demographic analysis": more support for Key Demographics in Retirement Risk Management?

Note: My fellow demographers form a key segment of the audience(s) that "Key Demographics..." was created to address. This page is designed to highlight what I hope are useful technical aspects of the work behind the book. As well, in the future I will bring to your attention technical developments in the work of others where applications to the study of risk management behaviour are evident.

The Population Association of America was a pioneer in leading the way toward taking demography beyond its traditional "strong suit" (mortality, fertility, migration and morbidity). Now demographers in the "American tradition" deal with a wide range of what can be called "population behaviours" (group-level patterns formed by individual behaviours). Risk management behaviour is a very old area of concern, as we realize when thinking about health-related behaviours.

"Key Demographics ..." argues that people need to become as sophisticated about this matter as corporations are -- and look at networks of linked risks and strategizing about optimal allocation of resources across them (CRM -- comprehensive risk management). How and why key population segments differ in their approach to this ideal is readily seen to be a major topic for demographic research in the 21st century.

The first article follows. Also relevant is the discussion on a "demographic approach" to goodness of fit testing. Go to the chapter support page to find it, please.

Table of Contents

Bootstrapping Confidence Interval Estimates of Small Populations Defined by Complex Combinations of Attributes, by Leroy O. Stone, Département de démographie, Université de Montréal (click here to go to the article)

Comprehensive Personal Risk Management “On The Ground” -- How to Do It, Part One, by Leroy O. Stone, Département de démographie, Université de Montréal

Thinking that demographic analysis deals with populations and never with persons, the reader of this article may feel that it is misplaced under "Advances in demographic analysis". However, once the reader sees the close parallel between what is described here and the computation (and interpretation) of disability-free life expectancy he/she will see why this article should be referenced here. On the surface it deals with the life course of a virtual person; but its applicability to a cohort would be no "big deal". (Click here to get the article.)



Article 1

Bootstrapping Confidence Interval Estimates of Small Populations Defined by Complex Combinations of Attributes, by Leroy O. Stone, Département de démographie, Université de Montréal

The number of attributes needed to define key demographics may be so large that routine statistical cross tabulation is unusable, due to small sample sizes. This route is blocked because there is a need to compute a verifiable conditional probability (using a defined process) of the behavioural outcome of interest. An alternate route involves using the parameters of models that predict probabilities. This is certainly not new in the discipline of statistics, and is probably buried “deep in the trenches” of many microsimulation models. Demography benefits when it is “brought to the surface” and made available to the dozens of demographers around the world who need to provide robust confidence interval estimates of small complex populations for a variety of business and government purposes, as well as for social science. Chapter 6 provides a detailed illustration of the development of the procedure and cites the constraints that should be observed when using it.

This text (an extension of that chapter’s technical annex) deals with the task of estimating how large is the population in a given key demographic when the sub-sample represented by a particular combination of attributes is too small to permit direct estimation (using the weights attached to respondents’ records on the relevant microdata file) and a routine cross-tabulation. The technical annex (Section 6.11) of Chapter 6 of “Key Demographics …” sets out the steps to be followed to execute a “bootstrap procedure” to estimate both an expected size and the 95% (or any other chosen level) confidence interval around that size.

This will allow a government program or business market developer or analyst to be sure that a key demographic of interest “almost certainly at least X persons in size and could be as large as X+N”, with 95% confidence. However, some key details were not stated or illustrated in Chapter 6. The purpose of this text is to provide some more of those missing details as well as illustrative results.

We stated correctly in Chapter 6 that this procedure will “always work”, at least in its mechanical/programming aspects. The quality of the results will depend on having a good estimate of the total population in the “superset” of demographics (population segments) you will use for the calculation, and on how well you have gauged the approximate relative sizes of the segments that comprise the superset (when they are aggregated).

Before starting into the details, one key elaboration of text in Section 6.11 is helpful. I wrote “Repeat this allocation process 1,000 times.” You need to do that in a series of “computing cycles”, and within each cycle the calculation is redone at least several hundred times. Each cycle has its own expected distribution of population sizes among the population segments. We get the confidence intervals by ‘integrating’ information across N cycles. For example, in the results shown below there were 300 “Monte Carlo runs” within each cycle, and I generated 100 cycles. This took less than 5 minutes—up to the production of the output distributions in a comma-delimited text file, which was then loaded into Excel for the final calculations. Thus all kinds of refined small complex-population-segment estimates in many government and business application fields are feasible with this procedure. Here are some key details.

Table A6.1


Table A6.1 presents the estimated population sizes for all of the identified high income Canadian demographics who performed poorly on the composite indicator of retirement related risk management practices. Recall that “key demographics” are the large ones. None of these should be considered large but let us use this table for illustration, and we will target the largest of them— the one with population size shown as 2095.

Who are these people? The code numbers that you see in the same line with 2095 are used to identify them. Here is the identification:

  • 3R+ — residents of Ontario province
  • teX+ — they are ten or more years from their expected retirement dates
  • 1E+ — they have less than a high school diploma, or have a “missing” code for education (“don’t know” or refused to respond)
  • 1H+ — their self-reported health status is less than very good
  • 1M+ — they have English mother tongue
  • 1B+ — they were born in Canada
  • 1O+ — they are in the residual industry-by-occupation category (based on my own highly aggregative industry and occupational classifications)
  • 1s — they are men
  • This population segment (demographic) would have very poor access (compared to most other groups) to informational materials that would bring them knowledge about risk management issues and prompt them to be mindful about them. Moreover, even if they had good access, their low level of education would create a barrier to their understanding of the materials. In effect, giving them a great deal of money (remember these are high income people) gets them nowhere.

    Our bootstrapping procedure assumes that the set of percentages of the 17 segments in the total of 17, 415 provides a good proxy for the correct probability distribution to “drive” the Monte Carlo assignment process -- extensively used in microsimulation work. As mentioned above, there were 100 “cycles” of assignment “work” and within each one the calculation was repeated 300 times. Table A6.2 shows the details used to get the confidence interval estimate. The first three and the last three numbers cover about 5% of their distribution, and thus they set the boundary for the 95% confidence interval. The mean of all the shares is 0.116. Using these results, we have an expected population size of 2027 (instead of the 2095 shown in Table A6.1), with 95% confidence (based on the bootstrap procedure) that the true number lies between 1567 and 2786. This suggests the generalization that were at least 1500 people in that key demographic, and the actual number could be as high as 2500, with 95% confidence. Here’s the table. It covers three pages.

    Table A6.2

    Very important for the quality of these results is the behaviour of the random number generator. Its outputs are fractions between zero and 1, and here is their distribution when that range is split into ten equal parts:

  • Number of random numbers used = 30100
  • 151001 154388 144552 147722 148023 153423 152625 153820 153630 160866
  • The distribution is reasonably, but by no means perfectly, flat. A key issue is how long the numbers “run” before they start to recycle. We used what is reputed to be the best procedure to address this problem that is in the public domain (the Mersenne Twister ), implemented in LibertyBASIC at ) . The author seems to claim (the discussion at this point is not very clear to me) that there will be 624 numbers at least before a cycle restarts (a sequence of numbers previously produced is repeated). The course of the cycle depends on the initial “random number seed”. In addition to the automatic re-seeding at 624 numbers, I have forced the procedure to generate a new seed every 1000 numbers up to the first 6000, and then every 10,000 numbers thereafter. Thus there was frequent regeneration of a new “random number seed” behind the data shown in table A6.2 .

    With today's computers, you can generate a new seed as often as you wish, and create hundreds of thousands of random numbers in a very short space of time; but there would still be some “cycling” within particular series of numbers across such a vast number. It appears that commercial random number generators using computers make use of data supplied from “atmospheric noise” rather than numbers produced by a computerized algorithm (the latter being widely known to be only pseudo-random). In any event, our relatively flat distribution shown above seems reasonable.

    In short, the procedure demonstrated here will always work to “bootstrap” a confidence interval estimate around population segment sizes associated with the key demographics identification process. The more confident we are that the cross tabulation-based estimate of the population size for the superset of demographics (e.g., the 17,415 shown in our illustration) is good, and that we have a reasonable proxy of the correct probability distribution to guide the Monte Carlo assignment process, the more highly we should regard the quality of the final numbers. In this illustration we drew the probability distribution directly from the weighted sample data; but in theory we are not required to do that. For example, we could supply a probability distribution based on data from an independent source such as a census. The census could not be used to define the segments based on fine details as shown in table A6.1 above; but a smaller combination of census variables could be used to provide an alternate to reliance upon data from the sample.

    One thing that I believe is notable about this discussion is that it illustrates an available route to decent estimates of population size for segments defined by complex combinations of variables in a very wide field of applications of demography. And take note that the attributes used to define the key demographic go far beyond the traditional population variables to include self-reporting of particular aspects of personal status or attitudes. This makes the procedure applicable in market segmentation work where there is heavy focus on the latter type of variable -- “soft” qualitative data about aspects of status and attitudes. (I will gladly share the program code and illustrative data with anyone who sends a request to me at ).

    To benefit well from these materials, you need Chapter 6 of Key Demographics ..., entitled "Distinctive Population Segments in Multi-Mode Risk Management". To get these benefits purchase the book at Springer (or your favorite bookstore), or purchase chapter 6 only as an electronic download (look for the "eBook" link at the Springer page).


    > back to the main page>

    (c) 2012 Leroy O. Stone. All rights reserved.