Differences between revisions 1 and 4 (spanning 3 versions)
Revision 1 as of 2019-11-23 20:48:33
Size: 5519
Editor: khellman
Comment: link to submission requirements
Revision 4 as of 2019-11-24 20:40:42
Size: 7380
Editor: khellman
Comment:
Deletions are marked like this. Additions are marked like this.
Line 6: Line 6:
{{{#!wiki important {{{#!wiki comment
Line 19: Line 19:
$r$ equal parts; each contributing to the production of $r$ new (distinct) items in generation $g+1$. This cycle continues (each item in generation $g+1$ has probability $p$ of continuing on to generation $g+2$ as $r$ new distinct items. The initial generation will be $g=0$, so any item in the use-recycle-use chain at generation $i = 1,2,3,4\ldots$ will have $\frac{1}{r^i}$ of the original ($g=0$) item within it. $r$ equal parts; each contributing to the production of $r$ new (distinct) items in generation $g+1$. This cycle continues (each item in generation $g+1$ has probability $p$ of continuing on to generation $g+2$ as $r$ new distinct items. The initial generation will be $g=0$, so any item in the use-recycle-use chain at generation $g = 1,2,3,4\ldots$ will have $\frac{1}{r^g}$ of the original ($g=0$) item within it.
Line 23: Line 23:
The parameters $p$ and $r$ will be provided to your `SIM` via command line parameters, and that wraps it up for this projects specification model. The parameters $p$ and $r$ will be provided to your `SIM` via command line parameters.  The confidence intervals will be calculated to the half width $\pm0.1$.
Line 33: Line 33:
|| `GENS` || The number of generations to simulate ||
Line 36: Line 37:
|| `RANDOM.DAT` || As usual, a source of random values $Uniform(0,1)$.||
Line 39: Line 40:
You will report the following values from `SIM` (in this order, and according to the course [[Assignments/Requirements|submission requirements]]): You will `OUTPUT` the following values from `SIM` (in this order, and according to the course [[Assignments/Requirements|submission requirements]]):
Line 41: Line 42:
  1. ...
  1. ...

  1. the number of simulations run in order to achieve the confidence interval required
  1. the lower bound of the confidence interval, '''use eight digit precision and scientific notation output'''
  1. the upper bound of the confidence interval, '''use eight digit precision and scientific notation output'''
Line 48: Line 48:
 1. ...
 1. ...
 1. While a particular call to $Generate()$ in your `SIM` will return an integer value for $g$, the average and confidence interval bounds will of course be $\Re$ valued numbers.
 1. Beware of tracking each individual item at each generation with an object - with $p=0.70$, $r=10$ you will have about $75\times10^6$ individual items in a data structure by the ninth generation. Yikes :o

 It is better to maintain the following '''two pieces of information''' as a "generation" loop plods forward, breaking when $f$ becomes less than $F$: the ''number of items'' in the current generation, and of course the ''current generation'' $g$. In this case, the remaining fraction of the original item is
 \[ f=\frac{\mbox{items}}{r^g} \]

 {{{#!wiki warning
Even with this more efficient approach, we must beware of numerical issues! Depending on the random numbers (of course), you can still overflow a 32-bit integer tracking the number of items currently in the recycling chain. '''Use a 64-bit integer, or a language with arbitrary precision integers (Python, for instance) for your `items` variable.'''
 }}}

 Ideally, we would use a $Binomial(\mbox{items},p)$ random variate to determine how many progress to the next generation. But this would require a numerical inversion of the incomplete beta function (Appendix D), which is beyond the scope of this course. Your `SIM` can simply "flip" a $Bernoulli(p)$ coin for each item.
Line 55: Line 64:
First, download <<DjHR(XXX-student.tar.bz2,this tarball)>> to your Mines Linux account ("alamode" machines!) and unroll it in a temporary directory. First, download <<DjHR(alrecycling-student.tar.bz2,this tarball)>> to your Mines Linux account ("alamode" machines!) and unroll it in a temporary directory.
Line 58: Line 67:
XXX-student.tar.bz2 alrecycling-student.tar.bz2
Line 61: Line 70:
$ tar xjf ../XXX-student.tar.bz2 $ tar xjf ../alrecycling-student.tar.bz2
Line 69: Line 78:
Now go to the directory holding your !XxXx `SIM` and execute the `grader.sh` script from the `XXX-grader.tar.bz2` resource. Now go to the directory holding your !AlRecycling `SIM` and execute the `grader.sh` script from the `alrecycling-grader.tar.bz2` resource.
Line 71: Line 80:
$ cd ~/sim/cwalksim $ cd ~/sim/alrsim
Line 74: Line 83:
$ ~/tmp/Xyz/grader.sh $ ~/tmp/AlRecycling/grader.sh
Line 85: Line 94:
The latter test generates NNN PDF files for your inspection ($OutputVar1$, $OutputVar2$ curves, and $OutputVar3$ confidence intervals). For plots generating
CDF comparison curves: the <<SpanText(red line,color=red)>> is the result of '''your `SIM`''' with $N=GRADER_RUNS$ data points; the <<SpanText(blue dots,color=blue)>> are the expected results from an $N=GOLD_RUNS$ trial, and the <<SpanText(blue lines,color=blue)>> are the results of three separate trials with $N=EXAMPLE_RUNS$ for comparison. Your `SIM`'s red line should be a better approximation of the blue dots than the blue lines.
The latter test generates 3 PDF files for your inspection (one $n$ curve comparison, and two confidence interval comparisons). For plots generating
CDF comparison curves: the <<SpanText(red line,color=red)>> is the result of '''your `SIM`''' with $N=200$ data points; the <<SpanText(blue dots,color=blue)>> are the expected results from an $N=500$ trial, and the <<SpanText(blue lines,color=blue)>> are the results of three separate trials with $N=100$ for comparison. Your `SIM`'s red line should be a better approximation of the blue dots than the blue lines.
Line 88: Line 97:
For plots generating overlaped confidence intervals: the <<SpanText(red lines,color=red)>> are the $GRADER_RUNS$ intervals generated by '''your `SIM`'''; the <<SpanText(green lines,color=green)>> are $GOLD_RUNS$ intervals from a good implementation, and the <<SpanText(blue lines,color=blue)>> are the results of three separate trials with $N=EXAMPLE_RUNS$ intervals. Your `SIM`'s <SpanText(red intervals,color=red) should not be any more varied from the <SpanText(green intervals)> than the <SpanText(blue lines,color=blue) are. For plots generating overlapped confidence intervals: the <<SpanText(red lines,color=red)>> are the $200$ intervals generated by '''your `SIM`'''; the <<SpanText(green lines,color=green)>> are $500$ intervals from a good implementation, and the <<SpanText(blue lines,color=blue)>> are the results of three separate trials with $N=100$ intervals. Your `SIM`'s <SpanText(red intervals,color=red) should not be any more varied from the <SpanText(green intervals)> than the <SpanText(blue lines,color=blue) are.
Line 92: Line 101:
{{attachment:_s_metric1-xxx1.svg|xxx1-alternate-title|width="60%"}}
{{attachment:_s_metric2-xxx2.svg|xxx2-alternate-title|width="60%"}}
{{attachment:_s_n-2,70,8.svg|ns for 2 gens, 0.70%, 8 parts|width="60%"}}
{{attachment:_s_cis-2,70,8.svg|CIs for 2 gens, 0.70%, 8 parts|width="60%"}}

In the Fall of 2019, I heard the following radio story: Eliminating Single Use Plastic.... In it we learn that a single use aluminum container is recycled $70\%$ of the time, to which the program host says (surprisingly):1

  • Host:

    "There's probably a diet-rite that I was drinking in the early 90s whose aluminum is still in the system, is what you're saying."

    Guest:

    "That's correct."

Really? :\

Conceptual Model

We want to know the fraction remaining of a single use recyclable aluminum container after a certain number of "generations". The probability that a single use item in generation $g$ will be recycled and "live on" to generation $g+1$ is $p$. When an item is recycled, we envision it being divided into $r$ equal parts; each contributing to the production of $r$ new (distinct) items in generation $g+1$. This cycle continues (each item in generation $g+1$ has probability $p$ of continuing on to generation $g+2$ as $r$ new distinct items. The initial generation will be $g=0$, so any item in the use-recycle-use chain at generation $g = 1,2,3,4\ldots$ will have $\frac{1}{r^g}$ of the original ($g=0$) item within it.

Specification Model

The parameters $p$ and $r$ will be provided to your SIM via command line parameters. The confidence intervals will be calculated to the half width $\pm0.1$.

Project Requirements

Your SIM will calculate a confidence interval for the average number of generations it takes for an item's remaining fraction $f$ to fall below some threshold $F$. Your SIM will use Algorithm 8.1.2 for the calculation.

Input

The command line parameters provided to your SIM will be:

Argument

Value

GENS

The number of generations to simulate

$t$

idfNormal() value to use for confidence interval construction, called $t^{*}_\infty$ the text

$p$

The probability that an item in generation $g$ is recycled and reused in generation $g+1$

$r$

The (equally sized) number of parts an item is broken into during the recycling process, each goes into a new distinct item.

RANDOM.DAT

As usual, a source of random values $Uniform(0,1)$.

Output

You will OUTPUT the following values from SIM (in this order, and according to the course submission requirements):

  1. the number of simulations run in order to achieve the confidence interval required
  2. the lower bound of the confidence interval, use eight digit precision and scientific notation output

  3. the upper bound of the confidence interval, use eight digit precision and scientific notation output

Hints and SIM Testing

  1. While a particular call to $Generate()$ in your SIM will return an integer value for $g$, the average and confidence interval bounds will of course be $\Re$ valued numbers.

  2. Beware of tracking each individual item at each generation with an object - with $p=0.70$, $r=10$ you will have about $75\times10^6$ individual items in a data structure by the ninth generation. Yikes :o

    It is better to maintain the following two pieces of information as a "generation" loop plods forward, breaking when $f$ becomes less than $F$: the number of items in the current generation, and of course the current generation $g$. In this case, the remaining fraction of the original item is \[ f=\frac{\mbox{items}}{r^g} \]

    Even with this more efficient approach, we must beware of numerical issues! Depending on the random numbers (of course), you can still overflow a 32-bit integer tracking the number of items currently in the recycling chain. Use a 64-bit integer, or a language with arbitrary precision integers (Python, for instance) for your items variable.

    Ideally, we would use a $Binomial(\mbox{items},p)$ random variate to determine how many progress to the next generation. But this would require a numerical inversion of the incomplete beta function (Appendix D), which is beyond the scope of this course. Your SIM can simply "flip" a $Bernoulli(p)$ coin for each item.

grader.sh

I am providing to students the same tarball the grader will use for testing your SIM. Here is how to use it:

First, download this tarball to your Mines Linux account ("alamode" machines!) and unroll it in a temporary directory.

$ ls XXX-student.*
alrecycling-student.tar.bz2
$ mkdir tmp
$ cd tmp
$ tar xjf ../alrecycling-student.tar.bz2

Second, set the SIMGRADING environmental variable with:

$ source ~khellman/SIMGRADING/setup.sh ~khellman/SIMGRADING

Now go to the directory holding your AlRecycling SIM and execute the grader.sh script from the alrecycling-grader.tar.bz2 resource.

$ cd ~/sim/alrsim
$ ls SIM
SIM
$ ~/tmp/AlRecycling/grader.sh
:
:
:

You will need to read any messages from the script carefully, and hit ENTER several times throughout its course. This script checks for:

  1. missing tracefiles
  2. truncated tracefiles
  3. and the difference between SIM results and expected results

The latter test generates 3 PDF files for your inspection (one $n$ curve comparison, and two confidence interval comparisons). For plots generating CDF comparison curves: the red line is the result of your SIM with $N=200$ data points; the blue dots are the expected results from an $N=500$ trial, and the blue lines are the results of three separate trials with $N=100$ for comparison. Your SIM's red line should be a better approximation of the blue dots than the blue lines.

For plots generating overlapped confidence intervals: the red lines are the $200$ intervals generated by your SIM; the green lines are $500$ intervals from a good implementation, and the blue lines are the results of three separate trials with $N=100$ intervals. Your SIM's <SpanText(red intervals,color=red) should not be any more varied from the <SpanText(green intervals)> than the <SpanText(blue lines,color=blue) are.

Here is an example of one of the plots generated by grader.sh:

ns for 2 gens, 0.70%, 8 parts CIs for 2 gens, 0.70%, 8 parts

Submit Your Work

Rubric

This work is worth ?? points.

Requirements

Points

Notes

Meets simulation course project requirements

10

ABCDEFG

1000

  1. At about 3:00 minutes into the interview (1)

Assignments/AlRecycling (last edited 2023-12-27 12:09:45 by khellman)