7328
Comment: intermediate save
|
10081
student grading tarball ready to go
|
Deletions are marked like this. | Additions are marked like this. |
Line 2: | Line 2: |
#acl _Students:read | #acl Students:read |
Line 7: | Line 7: |
Students may post their results (for comparison purposes) on this [[/Results|Results]] page. | 1. It's been death by a thousand cuts, but [[#simgrading|grader tarball]] ready to go. 1. Here is [[#|the change log|&action=info]] for this assignment write-up. I will try to be descriptive in my log messages. 1. You can also [[#|subscribe|&action=subscribe]] to this page and receive Emails when changes are made. |
Line 26: | Line 29: |
Your `SIM` will calculate a confidence interval for the average number of generations it takes for an item's remaining fraction $f$ to fall below some threshold $F$. Your `SIM` will use '''Algorithm 8.1.2''' for the calculation. |
Your `SIM` will calculate a confidence interval for the average remaining fraction $f$ of an item after a certain number of generations. Your `SIM` will use '''Algorithm 8.1.2''' for the calculation. |
Line 33: | Line 36: |
|| $F$ || The threshold fraction, one replication is determining the smallest $g$ where the remaining fraction $f<F$ || | || `GENS` || The number of generations to simulate || |
Line 37: | Line 40: |
|| `RANDOM.DAT` || As usual, a source of random values $Uniform(0,1)$.|| | |
Line 42: | Line 45: |
1. the number of simulations run in order to achieve the confidence interval required 1. the lower bound of the confidence interval 1. the upper bound of the confidence interval |
1. the number of simulations run in order to achieve the confidence interval required ($n$) 1. the lower bound of the confidence interval, '''use eight digit precision and scientific notation output''' 1. the upper bound of the confidence interval, '''use eight digit precision and scientific notation output''' |
Line 46: | Line 49: |
= Documentation = Of course, your documentation should be handed off according to [[Assignments/Requirements|the usual submission requirements]]. 1. Document the location of your Algorithm 8.1.2 implementation (source and lines) as well as your "Welford object" (if you are using one). 2. Provide a paragraph or two of well written and thoughtful analysis, address the following topics: a. Does the simulation, as written address the fundamental question? If so, what assumptions are being made? If not, what do you believe is ''not'' being modeled correctly and how would you implement remedies? a. Estimate what fraction of the pop can used by the radio show host in the 90s is still with us today in the use-recycle-use chain. This can be done in one of two ways: i. Using some math and the results of small `GENS` and $r$ `SIM` runs, i. Making some much longer `SIM` runs with appropriate parameters (beware of overflow and you are going to need '''a lot''' of random numbers and an efficiency bump!) '''Either approach yields an equal amount of credit''', but you must '''explain your methods clearly'''. You are not required to use the ''graded'' `SIM` implementation to support your findings. You may augment your `SIM` with optional invocation magic (just so long as it still works with `grader.sh`) or hack up another analysis. Choosing a language or library with an `idfBinomial()` variate would be very wise! |
|
Line 48: | Line 63: |
1. While a particular call to $Generate()$ in your `SIM` will return an integer value for $g$, the average and confidence interval bounds will of course be $\Re$ valued numbers. | |
Line 51: | Line 65: |
It is better to maintain the following '''two pieces of information''' as a "generation" loop plods forward, breaking when $f$ becomes less than $F$, these the ''number of items'' in the current generation, and of course the ''current generation'' $g$. In this case, the remaining fraction of the item is | It is better to maintain the following '''two pieces of information''' as a "generation" loop plods forward, breaking when the $f$ falls to zero or the required `GENS` have been simulated: the ''number of items'' in the current generation, and of course the ''current generation'' $g$. In this case, the remaining fraction of the original item is |
Line 54: | Line 68: |
Ideally, we would use a $Binomial(\mbox{items},p)$ random variate to determine how many progress to the next generation. But this would require a numerical inversion of the incomplete beta function (Appendix D), which is beyond the scope of this course. | {{{#!wiki warning Even with this more efficient approach, we must beware of numerical issues! Depending on the random numbers (of course), you can still overflow a 32-bit integer tracking the number of items currently in the recycling chain. '''Use a 64-bit integer, or a language with arbitrary precision integers (Python, for instance) for your `items` variable.''' }}} |
Line 56: | Line 72: |
To determine how many items progress to the next generation, your `SIM` can use one of two approaches: a. "flip" a $Bernoulli(p)$ coin for each item, a. or accumulate $Geometric(p)$ values until all the coins in a generation have been accounted for. {{{#!wiki tip Recall that $Geometric(p)$ is a random variate that returns the number of $p$-bias successes before the first failure. So if $x=Geometric(p)$, you've accounted for $x+1$ items in the generation; the $x+1$^th^ one is a failure and is not recycled! }}} |
Ideally, we would use a $Binomial(\mbox{items},p)$ random variate to determine how many progress to the next generation. But this would require a numerical inversion of the incomplete beta function (Appendix D), which is beyond the scope of this course. Your `SIM` can simply "flip" a $Bernoulli(p)$ coin for each item. |
Line 68: | Line 78: |
First, download <<DjHR(XXX-student.tar.bz2,this tarball)>> to your Mines Linux account ("alamode" machines!) and unroll it in a temporary directory. | First, download <<DjHR(alrecycling-student.tar.bz2,this tarball)>> to your Mines Linux account ("alamode" machines!) and unroll it in a temporary directory. |
Line 71: | Line 81: |
XXX-student.tar.bz2 | alrecycling-student.tar.bz2 |
Line 74: | Line 84: |
$ tar xjf ../XXX-student.tar.bz2 | $ tar xjf ../alrecycling-student.tar.bz2 |
Line 82: | Line 92: |
Now go to the directory holding your !XxXx `SIM` and execute the `grader.sh` script from the `XXX-grader.tar.bz2` resource. | Now go to the directory holding your !AlRecycling `SIM` and execute the `grader.sh` script from the `alrecycling-grader.tar.bz2` resource. |
Line 84: | Line 94: |
$ cd ~/sim/cwalksim | $ cd ~/sim/alrsim |
Line 87: | Line 97: |
$ ~/tmp/Xyz/grader.sh | $ ~/tmp/AlRecycling/grader.sh |
Line 98: | Line 108: |
The latter test generates NNN PDF files for your inspection ($OutputVar1$, $OutputVar2$ curves, and $OutputVar3$ confidence intervals). For plots generating CDF comparison curves: the <<SpanText(red line,color=red)>> is the result of '''your `SIM`''' with $N=GRADER_RUNS$ data points; the <<SpanText(blue dots,color=blue)>> are the expected results from an $N=GOLD_RUNS$ trial, and the <<SpanText(blue lines,color=blue)>> are the results of three separate trials with $N=EXAMPLE_RUNS$ for comparison. Your `SIM`'s red line should be a better approximation of the blue dots than the blue lines. |
The latter test generates 3 PDF files for your inspection (one $n$ curve comparison, and two confidence interval comparisons). For plots generating CDF comparison curves: the <<SpanText(red line,color=red)>> is the result of '''your `SIM`''' with $N=200$ data points; the <<SpanText(blue dots,color=blue)>> are the expected results from an $N=500$ trial, and the <<SpanText(blue lines,color=blue)>> are the results of three separate trials with $N=100$ for comparison. Your `SIM`'s red line should be a better approximation of the blue dots than the blue lines. |
Line 101: | Line 111: |
For plots generating overlaped confidence intervals: the <<SpanText(red lines,color=red)>> are the $GRADER_RUNS$ intervals generated by '''your `SIM`'''; the <<SpanText(green lines,color=green)>> are $GOLD_RUNS$ intervals from a good implementation, and the <<SpanText(blue lines,color=blue)>> are the results of three separate trials with $N=EXAMPLE_RUNS$ intervals. Your `SIM`'s <SpanText(red intervals,color=red) should not be any more varied from the <SpanText(green intervals)> than the <SpanText(blue lines,color=blue) are. | For plots generating overlapped confidence intervals: the <<SpanText(red lines,color=red)>> are the $200$ intervals generated by '''your `SIM`'''; the <<SpanText(green lines,color=green)>> are $500$ intervals from a good implementation, and the <<SpanText(blue lines,color=blue)>> are the results of three separate trials with $N=100$ intervals. Your `SIM`'s <<SpanText(red intervals,color=red)>> should not be any more varied from the <<SpanText(green intervals,color=green)>> than the <<SpanText(blue lines,color=blue)>> are. |
Line 105: | Line 115: |
{{attachment:_s_metric1-xxx1.svg|xxx1-alternate-title|width="60%"}} {{attachment:_s_metric2-xxx2.svg|xxx2-alternate-title|width="60%"}} |
{{attachment:_s_n-2,70,8.svg|ns for 2 gens, 0.70%, 8 parts|width="60%"}} {{attachment:_s_cis-2,70,8.svg|CIs for 2 gens, 0.70%, 8 parts|width="60%"}} |
Line 115: | Line 125: |
= Rubric = This work is worth ?? points. |
{{{#!wiki important To ease the logistics of grading, there are two Rubrics and two Submission Slots in the course interface. '''Submit the same tarball or zip file for both!''' The course grader will handle the computational assessment, and your instructor will grade the analysis portion of your submission. }}} = Computational Rubric (part 1) = This work is worth 50 points. |
Line 119: | Line 136: |
|| Meets [[Assignments/Requirements|simulation course project requirements]] || 10 || || || ABCDEFG || 1000 /* blah */ || || |
|| Meets [[Assignments/Requirements|simulation course project requirements]] || 10 || || || Algorithm 8.1.2 and Welford documentation || 5 || || || Output meets requirements (eight significant digits) || 5 || Some libraries will print in "fixed" notation when the sci notation exponent is 0. || || Calculation of $n$ (`GENS=2`, $r=70\%$, $parts=8$) || 10 || || || Confidence Intervals (`GENS=2`, $r=70\%$, $parts=8$) || 10 || || || Confidence Intervals (`GENS=6`, $r=70\%$, $parts=8$) || 10 || || = Analysis Rubric (part 2) = This work is worth 15 points. ||<tableclass="rubric"> Requirements || Points || Notes || || Meets [[Assignments/Requirements|simulation course project requirements]] || 5 || || || Clarity and prose || 10 || || || Analysis || 10 || || |
It's been death by a thousand cuts, but grader tarball ready to go.
Here is the change log for this assignment write-up. I will try to be descriptive in my log messages.
You can also subscribe to this page and receive Emails when changes are made.
In the Fall of 2019, I heard the following radio story: Eliminating Single Use Plastic.... In it we learn that a single use aluminum container is recycled $70\%$ of the time, to which the program host says (surprisingly):1
Host:
"There's probably a diet-rite that I was drinking in the early 90s whose aluminum is still in the system, is what you're saying."
Guest:
"That's correct."
Really?
Conceptual Model
We want to know the fraction remaining of a single use recyclable aluminum container after a certain number of "generations". The probability that a single use item in generation $g$ will be recycled and "live on" to generation $g+1$ is $p$. When an item is recycled, we envision it being divided into $r$ equal parts; each contributing to the production of $r$ new (distinct) items in generation $g+1$. This cycle continues (each item in generation $g+1$ has probability $p$ of continuing on to generation $g+2$ as $r$ new distinct items. The initial generation will be $g=0$, so any item in the use-recycle-use chain at generation $g = 1,2,3,4\ldots$ will have $\frac{1}{r^g}$ of the original ($g=0$) item within it.
Specification Model
The parameters $p$ and $r$ will be provided to your SIM via command line parameters. The confidence intervals will be calculated to the half width $\pm0.1$.
Project Requirements
Your SIM will calculate a confidence interval for the average remaining fraction $f$ of an item after a certain number of generations. Your SIM will use Algorithm 8.1.2 for the calculation.
Input
The command line parameters provided to your SIM will be:
Argument |
Value |
GENS |
The number of generations to simulate |
$t$ |
idfNormal() value to use for confidence interval construction, called $t^{*}_\infty$ the text |
$p$ |
The probability that an item in generation $g$ is recycled and reused in generation $g+1$ |
$r$ |
The (equally sized) number of parts an item is broken into during the recycling process, each goes into a new distinct item. |
RANDOM.DAT |
As usual, a source of random values $Uniform(0,1)$. |
Output
You will OUTPUT the following values from SIM (in this order, and according to the course submission requirements):
- the number of simulations run in order to achieve the confidence interval required ($n$)
the lower bound of the confidence interval, use eight digit precision and scientific notation output
the upper bound of the confidence interval, use eight digit precision and scientific notation output
Documentation
Of course, your documentation should be handed off according to the usual submission requirements.
- Document the location of your Algorithm 8.1.2 implementation (source and lines) as well as your "Welford object" (if you are using one).
- Provide a paragraph or two of well written and thoughtful analysis, address the following topics:
Does the simulation, as written address the fundamental question? If so, what assumptions are being made? If not, what do you believe is not being modeled correctly and how would you implement remedies?
- Estimate what fraction of the pop can used by the radio show host in the 90s is still with us today in the use-recycle-use chain. This can be done in one of two ways:
Using some math and the results of small GENS and $r$ SIM runs,
Making some much longer SIM runs with appropriate parameters (beware of overflow and you are going to need a lot of random numbers and an efficiency bump!)
Either approach yields an equal amount of credit, but you must explain your methods clearly. You are not required to use the graded SIM implementation to support your findings. You may augment your SIM with optional invocation magic (just so long as it still works with grader.sh) or hack up another analysis. Choosing a language or library with an idfBinomial() variate would be very wise!
Hints and SIM Testing
Beware of tracking each individual item at each generation with an object - with $p=0.70$, $r=10$ you will have about $75\times10^6$ individual items in a data structure by the ninth generation. Yikes
It is better to maintain the following two pieces of information as a "generation" loop plods forward, breaking when the $f$ falls to zero or the required GENS have been simulated: the number of items in the current generation, and of course the current generation $g$. In this case, the remaining fraction of the original item is \[ f=\frac{\mbox{items}}{r^g} \]
Even with this more efficient approach, we must beware of numerical issues! Depending on the random numbers (of course), you can still overflow a 32-bit integer tracking the number of items currently in the recycling chain. Use a 64-bit integer, or a language with arbitrary precision integers (Python, for instance) for your items variable.
Ideally, we would use a $Binomial(\mbox{items},p)$ random variate to determine how many progress to the next generation. But this would require a numerical inversion of the incomplete beta function (Appendix D), which is beyond the scope of this course. Your SIM can simply "flip" a $Bernoulli(p)$ coin for each item.
grader.sh
I am providing to students the same tarball the grader will use for testing your SIM. Here is how to use it:
First, download this tarball to your Mines Linux account ("alamode" machines!) and unroll it in a temporary directory.
$ ls XXX-student.* alrecycling-student.tar.bz2 $ mkdir tmp $ cd tmp $ tar xjf ../alrecycling-student.tar.bz2
Second, set the SIMGRADING environmental variable with:
$ source ~khellman/SIMGRADING/setup.sh ~khellman/SIMGRADING
Now go to the directory holding your AlRecycling SIM and execute the grader.sh script from the alrecycling-grader.tar.bz2 resource.
$ cd ~/sim/alrsim $ ls SIM SIM $ ~/tmp/AlRecycling/grader.sh : : :
You will need to read any messages from the script carefully, and hit ENTER several times throughout its course. This script checks for:
- missing tracefiles
- truncated tracefiles
and the difference between SIM results and expected results
The latter test generates 3 PDF files for your inspection (one $n$ curve comparison, and two confidence interval comparisons). For plots generating CDF comparison curves: the red line is the result of your SIM with $N=200$ data points; the blue dots are the expected results from an $N=500$ trial, and the blue lines are the results of three separate trials with $N=100$ for comparison. Your SIM's red line should be a better approximation of the blue dots than the blue lines.
For plots generating overlapped confidence intervals: the red lines are the $200$ intervals generated by your SIM; the green lines are $500$ intervals from a good implementation, and the blue lines are the results of three separate trials with $N=100$ intervals. Your SIM's red intervals should not be any more varied from the green intervals than the blue lines are.
Here is an example of one of the plots generated by grader.sh:
Submit Your Work
To ease the logistics of grading, there are two Rubrics and two Submission Slots in the course interface. Submit the same tarball or zip file for both!
The course grader will handle the computational assessment, and your instructor will grade the analysis portion of your submission.
Computational Rubric (part 1)
This work is worth 50 points.
Requirements |
Points |
Notes |
10 |
|
|
Algorithm 8.1.2 and Welford documentation |
5 |
|
Output meets requirements (eight significant digits) |
5 |
Some libraries will print in "fixed" notation when the sci notation exponent is 0. |
Calculation of $n$ (GENS=2, $r=70\%$, $parts=8$) |
10 |
|
Confidence Intervals (GENS=2, $r=70\%$, $parts=8$) |
10 |
|
Confidence Intervals (GENS=6, $r=70\%$, $parts=8$) |
10 |
|
Analysis Rubric (part 2)
This work is worth 15 points.
Requirements |
Points |
Notes |
5 |
|
|
Clarity and prose |
10 |
|
Analysis |
10 |
|
At about 3:00 minutes into the interview (1)