--- title: "Worksheet 7" output: pdf_document: default html_notebook: default --- ## Author: Enter name of the author here ## Discussants: Enter the names of other people who you discussed this problem set with $\\$ The purpose of this worksheet is to get practice calculating probabilities using tree diagrams, examining the binomial and normal distributions, and using dplyr to analyze the retro-data data set. Some functions you might find useful are: factorial(), choose(), dbinom(), hist(), mean(), sd(), filter(), dim(). ```{r message=FALSE, warning=FALSE, tidy=TRUE, echo = FALSE} library(knitr) # makes sure the code is wrapped to fit when it creats a pdf opts_chunk$set(tidy.opts=list(width.cutoff=60)) set.seed(1) # set the random number generator to always give the same sequence of random numbers require('dplyr') # use the dplyr package to manipulate data frames require('Lahman') ``` $\\$ **Exercise 1:** For the first question, please draw a tree diagram to calculate the probability that Matt Holliday will hit a single against Chad Billingsley based on using the Strat-o-matic cards at [http://helios.hampshire.edu/~emmCS/classes/baseball-stats/strat.jpg](http://helios.hampshire.edu/~emmCS/classes/baseball-stats/strat.jpg). Please turn in the drawing of your tree diagram in class on Tuesday April 4rd, and report the probability below. ![Strat-o-matic Matt vs Chad](http://helios.hampshire.edu/~emmCS/classes/past_classes/baseball_stats_2016_worksheets/strat.jpg) ```{r message=FALSE, warning=FALSE, tidy=TRUE} ``` **Answers:** $\\$ **Exercise 2:** As discussed in class, we can use the binomial distribution to describe the probability of getting k "successful" outcomes out of n trials, where the probability of sucess on any trial is given by $\pi$, i.e., we have the a distribution $Pr(X = k; n, \pi)$. In R, the function dbinom(k, n, p) will return this probability. Suppose David Ortiz hits a home run at a rate of 5 per 100 plate appearances (i.e., at a rate of .05). If David Ortiz had 600 plate appearances in a season, what is the probability that he would hit 30 home runs? Also, what is the probability he would hit 30 to 32 home runs in a season? (Hint: identify $\pi$, n, and k, and then use the dbinom(k, n, pi) function). ```{r message=FALSE, warning=FALSE, tidy=TRUE} # the probability of hitting 30 home runs # the probability of hitting 30, 31 or 32 home runs is: ``` **Answers:** $\\$ **Exercise 3:** As discussed earlier in the semester, we can get the sense of the shape of a distribution of data by creating a histogram. Below is some code to get information about how much each baseball player weighs. Using this data, create a histogram of player weights and describe approximately what shape the distribution is (e.g., is it Bernoulli, Binomial, etc.). ```{r message=FALSE, warning=FALSE, tidy=TRUE} # get the weights of all baseball players player.weights <- Master$weight ``` **Answers:** $\\$ **Exercise 4:** The retro-data play-by-play data set has information about every single plate appearance from every single player in a season. I have put this data set on the R server so that you can access it for you final project, and how to load this data set for the 2014 season is shown below (calling this load function will create a variable called retro.data that has the data in a data frame). The column (i.e., variable) in the retro.data data frame named EVENT_CD list what the outcome of each plate appearance is. When the variable EVENT_CD has a value of 23 (i.e., EVENT_CD == 23) it means a home run was hit. Use dplyr's filter() function to get only those players in which a home run was hit, and then calculate and report the total number of home runs hit in the 2014 season. Note that the retro data could be a useful data set for your final project! ```{r message=FALSE, warning=FALSE, tidy=TRUE} # Load the retro data from the 2014 season load('/home/shared/all_retrosheet_data_Rda/retro.data.2014.Rda') ``` **Answers:**