/advancedstatistics

Accompanying data for my book Advanced Statistics with Application in R

Primary LanguageR

Advanced Statistics with Application in R

Cover Description
Front Book Cover This repository contains accompanying data for my book Advanced Statistics with Application in R. Please visit my website www.eugened.org for more information about the book. Below is the Table of Contents for the accompanying data provided with the book. You can individually download each piece of data; but my recommendation is to download everything using the zip option in GitHub.
Code/Data Chapter Section Page Description
mvsd.r 1 1.5.5 19 "Computation of mean, variance and SD"
vecomp.r 1 1.5.5 20 Vectorized computation of the double integral
my1sr.r 1 1.5.6 23 An example for graphics in R
frlJLET.r 1 1.56 24 Frequency of English letters in a Jack London novel
Jack_London_Call_of_the_Wild_The_f1.char 1 1.56 24 "Char by char ""Call of the Wild"""
birthdaysim.r 1 1.6 30 Simulations for the birthday problem
sampP.r 1 1.8.1 38 Illustration of the sample command
sudoku.r 1 1.8.2 40 Random sudoku problem
webhits.r 2 2.1.2 45 Cdf of website hits
comwebhits.dat 2 2.1.2 45 Data for website hits
webhitsQ.r 2 2.2.1 55 Quartiles of web hits cdf
tr.binom.r 2 2.2.2 57 Confidence range for binomial distribution
longpiece.r 2 2.3 60 Simulations for the broken stick
simCookie.r 2 2.3 61 Simulations for raisin in the cookie
truck.turn.r 2 2.4 65 Simulations for safe turn
gampois.r 2 2.6.1 78 Simulation for approximation Poison by gamma cdf
pti.gamma.r 2 2.6.3 80 Newton's algorithm for the gamma tight confidence range
pti.gamma.rome.r 2 2.6.3 81 Modified pti.gamma for Rome example
varX2.r 2 2.7 87 Simulations for var(X^2)
mile.r 2 2.9.1 97 Simulations for distance to work (LLN)
LLNintegral.r 2 2.9.2 102 Integral approximation via Monte Carlo
LLNIntegral2.r 2 2.9.2 103 Optimal lambda for Monte Carlo approximation
LLNintegral3.r 2 2.1 106 Normal approximation of the binomial cdf
clt.binom.r 2 2.1 106 Dynamic convergence of the binomial distribution
jb.r 2 2.1 109 James Bond chase problem
cltP.r 2 2.1 111 Violated CLT
lognDS.r 2 2.11 117 Simulations for mean and variance of the lognormal distribution
logLnr.r 2 2.11.1 119 Tight confidence range for the lognormal distribution
expr.r 2 2.13 127 Simulations for logistic distribution
rugf2.r 2 2.13 127 Newton's algorithm for Gaussian mixture
genmixN.r 2 2.13 128 Random number generation for Gaussian mixture
discr.gen.r 2 2.13 129 Random number generation from a discrete distribution
meanC.r 2 2.13.1 130 Simulations for Cauchy distribution
pidistr.r 2 2.15 136 Pmf of pi digits
pidistr1010.r 2 2.15 137 The 10x10 probability matrix for pi digits
benfordEXP.r 2 2.16 142 Benford's law for Pareto distribution
sales.RData 2 2.16 142 Sales transactions data (R object)
benfordFT.r 2 2.16 142 Benford's law analysis of fraudulent transactions
benfordN.r 2 2.16.1 143 Almost-Benford's law distributions
fracksim.r 2 2.16 144 Benford's law for the t-distribution
sbmultD.r 3 3.1 152 Density surface viewed at different angle
CondDensity.mkv 3 3.3 169 Conditional density movie file
mixedDens.r 3 3.3.2 180 Gaussian mixture distribution of heights
randS.r 3 3.3.3 184 Random sums
cancgr.r 3 3.3.4 185 Simulations for cancer growth
heightweight.r 3 3.5 202 Scatterplot weight versus height
dn2.r 3 3.5 204 Contours of the bivariate normal density
oilspill.r 3 3.5 206 Simulations for oil spill in the ocean
dn3.r 3 3.5.3 209 Generation of random numbers from the bivariate normal distribution
dint.r 3 3.5.3 213 Double integral approximation using simulations
copRN.r 3 3.5.4 216 Contours of the copula density
ell2.r 3 3.5.4 216 Contours of the bivariate normal density
filelXY.r 3 3.6 221 Simulations for Fieller density
metpr2.r 3 3.7.1 225 Simulations for meeting problem 2
buffon.r 3 3.7.2 225 Buffon's needle problem
stickSQ.r 3 3.7.2 226 Simulations for random segment example
sbE.r 3 3.7.2 227 "Simulations for ""Who wins the game?"" example"
s47.r 3 3.7.2 228 Simulations for random squares
prmb.r 3 3.8.4 235 Probability bullet
cdfMED.r 3 3.9 238 Cdf of the median
pmaxmin.r 3 3.9 238 Distribution of the range
cov01.r 3 3.9 239 Simulations for random segments
normQQ.r 3 3.10.2 247 Testing multivariate normal distribution
intrD3.r 3 3.10.3 250 Simulations for the root of the random cubic equation
multB.r 3 3.10.4 252 Simulations for the multinomial distribution
mn3.r 4 4.1.1 260 Genetation and viewing of 3D points
foold.csv 4 4.1.2 263 Data for temperature in March
multCLT.r 4 4.1.3 269 Bivariate CLT
chidf1.r 4 4.2.1 277 The cdf of the chi-square cdf
salary.r 5 5.1 294 CDF of salary in Connecticut and Vermont
mortgageROC.csv 5 5.1.1 301 Data on 375 mortgage applicants
mortgageROC.r 5 5.1.1 302 ROC curve for mortgage applicants
vomit.r 5 5.1.1 304 Time to vomiting ROC curve
emesis.txt 5 5.1.1 304 Data for the vomit code
survROC.r 5 5.1.1 307 ROC curve for cancer patients
survcanc.csv 5 5.1.1 307 Data for the survROC code
SCcancer.r 5 5.1.2 306 Survival analysis for cancer patients
DeathYears.csv 5 5.1.1 306 Survival data for the SCcancer code
usbflash.csv 5 5.1 311 Data for Problem 29
creditpr.r 5 5.1 312 1996 credit card applicants' analysis
creditpr.csv 5 5.1 312 Data for the creditpr code
webhits.hist.r 5 5.2 313 Histogram plot for website hits
histN.r 5 5.2 315 Histogram and density for normal distribution
SCcancerQQ.r 5 5.3 318 Q-q plot for cancer patients
Qqmurder.r 5 5.3 318 Q-q plot to test the uniformity of murders
wikmurdr.txt 5 5.3 318 Murder rates in 51 states
qqband.r 5 5.3.1 320 Q-q plot for uniform distribution
qqLOGband.r 5 5.3.1 321 Q-q plot for lognormal distribution
mortgageQQ.r 5 5.3.1 322 Q-q plot for mortgage data
qqnill.r 5 5.3 323 Problem 5
toears.txt 5 5.3 323 Toenail arsenic data
Goldman.csv 5 5.3 323 Anatomical data
salaryBAR.r 5 5.4 324 Barplot for Vermont and Connecticut salaries
kernavN.r 5 5.5 326 Gaussian kernel densities
kernM.r 5 5.5 327 Gaussian kernel densities with two bw
n.density.my.r 5 5.5 327 In-house Gaussian kernel density
eppendorf.r 5 5.5 329 Rat brain oxygen distribution
eppendorf.txt 5 5.5 329 Data for rat brain oxygen
toears.r 5 5.5 330 Distribution of toenail arsenic in NH
toears.txt 5 5.5 330 Data for toenail distribution
asviol.r 5 5.5 330 Asymmetric violin for salary in VT and CT
kern.movie.r 5 5.5.1 332 Density movie
alc3d.r 5 5.5.2 333 3D alcohol consumption
alcoholUSA.csv 5 5.5.2 333 Data for 3D plots
autocrash.csv 5 5.5 335 Automobile accident data
bvn.density.my.r 5 5.6 337 Bivariate normal density
bvex.r 5 5.6 337 2D and 3D bivariate kernel densities
Eyx.r 5 5.6 337 E(Y|X=x)
matimage.r 5 5.6.1 339 Matrix image
R.smooth.r 5 5.6.1 340 Smoothed images of R
R.pgm 5 5.6.1 340 The pgm image data for R
salmark.r 5 5.6.2 342 Scatterplot of Forbes data
Forbes2000.csv 5 5.6.2 342 Forbes data
nhcancer.r 5 5.6.3 342 NH lung cancer spatial distribution
NHtowns.csv 5 5.6.3 342 NH town names
xyNHcancer.csv 5 5.6.3 343 Coordinates of 10439 cancer cases
xyNHpopulation.csv 5 5.6.3 343 Geographic location of random NH residents
Lena.pgm 5 5.6 346 Lena image
IBM_daily.csv 5 5.6 346 IBM stock prices
robpol.r 6 6.2 353 Police and bank robber
gMMgamma.r 6 6.2.1 356 Newton's algorithm for the MM estimation of gamma parameters
pois2est.r 6 6.4.2 371 RMSE for two estimators of lambda
luR.r 6 6.4.3 373 Simulations for lower and upper bounds of uniform distribution
arMSE.r 6 6.6.1 389 Simulations for estimation of area of the circle
robias.r 6 6.6.2 393 Simulations for the bias of c.c.
cimcorSP.r 6 6.6.2 395 17x17 stock correlation heatmap
stocks.zip 6 6.6.2 394 Data on 17 stocks (must be unzipped)
Rcolor.pdf 6 6.6.2 396 Colors in R (by name)
olsim.r 6 6.7.2 405 Simulations for simple regression
lm.trendSP.r 6 6.7.3 408 Prediction of the Google stock price
truckR.r 6 6.7.4 411 Coefficient of determination for truck driers
truckR.data.csv 6 6.7.4 411 Data for truck drivers' problem
betaMM.apply.r 6 6.8.2 430 Simulations for the MM estimator for alpha and beta
betaMM.r 6 6.8 433 MM estimation of gamma distribution
gammaInf.r 6 6.9.2 449 ML estimation of alpha and beta of the gamma distribution
bufprob.r 6 6.10.1 456 Estimation of L/D in the Buffon problem
regrD.r 6 6.10.1 464 Linear regression with random X
father.son.csv 6 6.10.1 466 Galton data for father and son heights
piest.r 6 6.10.1 471 Comparison of four estimators of pi
gotobed.r 6 6.10.2 474 When students go to bed
autocrash.r 6 6.10.2 475 Parzen density for autocrash circular data
autocrash.csv 6 6.10.2 475 Autocrash circular data
cubMLE.r 6 6.10.5 494 ML estimation of the quadratic model
mleUNE.r 6 6.10.5 497 MLE for unemployment rate
cauchy.theta.r 6 6.10.6 502 MLE for the Cauchy distribution
cauchy.google.r 6 6.10.6 503 MLE for the Google stock price
mle.gamma.OPT.r 6 6.10.6 506 Comparison of three algorithms for ML estimation
mle.gamma.CT.r 6 6.10.6 506 Estimation of people in poverty in CT
bufprobSA.r 6 6.10.6 508 Simulation-based ML for the Buffon problem
ranlRest.r 6 6.10.6 508 Estimation of the radius of the disk for the random lines problem
heightweight.r 6 6.1 509 Scatterplot height versus weight of Korean young people
HeightWeight.csv 6 6.1 509 Height and weight data
rws.r 6 6.1 510 Random walk on the lattice square
meanmed.r 6 6.11 516 Simulations for mean and median in Laplace distribution
robloc.r 6 6.11.1 519 Estimation for noisy data via Gaussian mixture
gng.r 6 6.11.1 521 Estimation of Google stock price using Gaussian mixture
AMZN.csv 6 6.11 522 Amazon stock prices
pvsim.r 7 7.1.1 526 Illustration of p-value using simulations
boengtr.csv 7 7.1.2 530 Data for Boeing inside trading
NHBirths2003_2009.csv 7 7.2 533 Data on 33666 NH newborns
houseprice.txt 7 7.2 534 House prices
familyincome.r 7 7.2 535 Poverty test
pvalcost.r 7 7.4 550 Simulations for the p-value living cost example
powsim.r 7 7.4.2 553 Confirmation of the power of the t-test via simulations
stucost.r 7 7.4.2 554 Living cost for freshmen and sophomores
stucost.csv 7 7.4.2 554 Data for living cost
ttest2pow.r 7 7.4.2 557 Simulations for Welsh and t-test
sampt2.r 7 7.4.3 557 One- versus two-sided t-test
salaryMW.r 7 7.4.4 559 Testing the salary for men and women
salaryMW_paired.csv 7 7.4.4 559 Salary data
nonpN.r 7 7.4.4 560 Comparison of power functions for parametric and nonparametric test
varq.r 7 7.5.1 563 Two-sided variance test
vartest.r 7 7.5.1 563 Equal-tail probabilities and unbiased test for variance
smallF.r 7 7.6.2 572 Newton's algorithm for unbiased test for variances
vartestSP.r 7 7.6.2 573 Testing the volatility of two stocks
binprop1.r 7 7.6.3 575 Simulations for the power function of the binomial test
binprop2.r 7 7.6.3 577 Simulations for the test of two binomial proportions
poistest.r 7 7.6.4 578 Power function for the Poison distribution
poissamn.r 7 7.6.4 579 Sample size for the Poisson distribution
vartestSP2.r 7 7.6 580 Variance test for GOOGLE and AMAZON
GOOG.csv 7 7.6 580 Data for GOOGLE
AMZN.csv 7 7.6 580 Data for AMZN
corn0.r 7 7.7 581 Power functions for correlation coefficient
cor0.r 7 7.7 582 Simulations for testing correlation coefficient
ciumax.r 7 7.8 585 CI for uniform distribution
Cimovie.r 7 7.8 586 CI animation
houprCI.r 7 7.8 588 CI and confidence range for the house price
civar.r 7 7.8.3 592 Shortest CI for variance and SD
ci.binpr.r 7 7.8.4 593 Shortest CI for the binomial probability
cfmus.r 7 7.8.5 596 "Simulations for confidence region for (mu,sigma)"
thastest.r 7 7.9 600 Three asymptotic tests for binomial probability
powlinmod.r 7 7.9 602 Simulations for testing the regression coefficient by three tests
thlinmod.r 7 7.9 602 Simulation-derived cdfs for four tests
powlinmodC.r 7 7.9 604 Type I adjustments for three tests
chismult.r 7 7.9.1 607 Simulations for chi-square and Wald tests
frlJL.r 7 7.9.1 607 Wald test for English letter analysis
Mark_Twain_The_Adventures_of_Tom_Sawyer_f1.txt.char 7 7.9.1 607 Mark Twain novel char data
mnist_train.csv.zip 7 7.9.2 608 Handwritten digit train set
mnist_test.csv 7 7.9.2 608 Handwritten digit test set
dig.mnist.r 7 7.9.2 609 Plotting and classification of handwritten digits
wtext.r 7 7.9 611 Analysis of English letters in the novel by Jane Austen
Jane_Austen_Pride_and_Prejudice.char 7 7.9 611 "Char by char ""Pride and Prejudice"""
saldisc.r 7 7.1 623 Drug or not to drug example
saldisc.csv 7 7.1 623 Data for the saldisc code
dvalREG.r 7 7.1 626 d-value for linear regression
simLM.r 8 8.3.1 647 Simulations for statistical properties of the quadratic regression
roblinreg2.r 8 8.3.1 648 Simulations to study violation of the normal assumption
roblinreg.r 8 8.3.1 649 Simulations with the apply function
CDpf.r 8 8.4.1 653 Estimation of the Cobb-Douglas production function
CDpf.csv 8 8.4.1 653 Data for pf
qtrpow.r 8 8.4.3 660 Sample size determination for the regression coefficient
linpower.r 8 8.4.3 660 Simulations for the F-test in quadratic regression
simCB.r 8 8.4.3 664 Simultaneous confidence band for quadratic regression
olsnormT.r 8 8.4.5 668 Simulations for linear regression with random predictor
pfx123.csv 8 8.4 670 Data for pf from Problem 8
dvalPMED.r 8 8.5.2 675 D-value for personalized medicine
dvalPMED.csv 8 8.5.2 675 Data for dvalPMED
kidsdrink.r 8 8.6.1 677 Kids drinking alcohol example
kidsdrink.csv 8 8.6.1 677 Data for kids drinking
leftright.r 8 8.6.2 680 False discovery example
leftright.csv 8 8.6.2 680 Data for false discovery example
hfn.r 8 8.6.3 682 "Heigh, foot, and nose example"
HeightFootNose.csv 8 8.6.3 682 "Data for height, foot, and nose example"
amzn.r 8 8.6.5 692 Autoregression for AMZN stock price
AMZN_weekly.csv 8 8.6.5 691 Data for AMZN autoregression example
salMW.r 8 8.7.1 697 Gender difference in salary
Salary.csv 8 8.7.1 696 Salary data for men and women
nile.r 8 8.7.1 700 Nile river example
NileFlow.csv 8 8.7.1 700 Data for Nile flow
housepr.r 8 8.7.1 701 Regressions for house price in two areas
houseprice.csv 8 8.7.1 701 Data for house prices in two areas
obesegene.r 8 8.7.1 702 BMI-gene interaction example
obesegene.csv 8 8.7.1 702 Data for the BMI-gene regression
simpson.r 8 8.7.1 704 Simpson paradox
simpson.csv 8 8.7.2 704 Data for Simpson paradox example
movrat.r 8 8.7.2 706 Movie rating example
movrat.csv 8 8.7.2 706 Data for movie rating
BPlong.r 8 8.7.3 708 Blood pressure treatment
BPdata.csv 8 8.7.3 708 Data for blood pressure
QoL.csv 8 8.7.3 710 Quality of life data
qolS.r 8 8.7.3 710 Quality of life example
flu.r 8 8.7.4 714 Flu incidence ANOVA example
linrep.r 8 8.7.4 716 Regression on averages and ANOVA
consIR.r 8 8.7.5 721 Internet radio example
consIR.csv 8 8.7.5 721 Data for internet radio example
CollegeSalary.csv 8 8.7 772 Salary data on 36 college employees
walD.r 8 8.8 724 Black Friday shopping example
blackfriday.csv 8 8.8 724 Data for Black Friday shoppers
lungsm.r 8 8.8.2 733 Two-by-two table via logistic regression
geodrink.r 8 8.8.2 735 Binge drinking among kids
kidsdrinkDAT.csv 8 8.8.2 735 Data for binge drinking
poisR.r 8 8.8.3 737 Poisson regression for traffic tickets
Traffic.Viol.csv 8 8.8.3 737 Data for traffic violations
cloglogV.r 8 8.8.3 738 Poisson and log-log regression
amazshop.csv 8 8.8 740 Data for Problem 10
marathon.r 9 9.5 770 Marathon nonlinear regression example
marathonWR2.txt 9 9.5 770 Data for marathon records
dnaRAD.csv 9 9.5 773 Cell survival data
dnaRAD.r 9 9.5 774 Change-point nonlinear regression for cell survival
twocph.r 9 9.5 775 Two-compartment pharmacokinetics model
twocph.csv 9 9.5 775 Data for the two-compartment example
twocphCI.r 9 9.5 778 Three CIs for the two-compartment model
ces.r 9 9.5 780 CES and Cobb-Douglas production functions
CES.csv 9 9.5 780 Data for the ces R code
FallingHat.csv 9 9.5 781 Data for the falling example
freefall.r 9 9.5 781 The R code for the falling hat example
gammaNLS.r 9 9.5 784 Simulations for the gamma distribution
SCcancerQQ2.r 9 9.5 784 Mixture exponential distribution
nen.r 9 9.6.1 787 Comparison of three distribution approximations
power.nls.r 9 9.6.2 789 Comparison of three power functions
expgr.r 9 9.6.3 791 Confidence region for the two-parameter regression
ci.nls.r 9 9.6.4 792 Three CIs
q2pr.r 9 9.7.2 798 Probability of two local minima
michm.r 9 9.9.1 806 Michaelis-Menten nonlinear regression
berndet.r 10 10.2 818 Bernoulli matrix
block.inv.r 10 10.2 818 Block inverse
nLINEXP.r 10 10.6.3 836 Newton's algorithm for a nonlinear equation