# Data Analytics Video

Step 1. Watch the portion of this video (starting at time 16:30) on running regression analysis in excel:

Step 2. Open the pizza dataset from the drive folder.

Step 3. Use regression modeling to examine the relationship between:

“phy_fov” (stands for pizza hut frequency of visits over last 6 months) as the dependent Y variable, and

taste, coupon, discount, favorite, flavor, leave, specials, serve (which have to do with people’s attitudes on the importance of those topics when choosing a pizza place) as the independent X variables

step 3b. run regression analysis again, but this time only use taste as the single independent X variable (drop the other ones). (note: “phv_fov” remains as the Y variable for all of the steps.)

step 3c. run regression analysis again, but this time only use coupon as the single independent X variable (drop the other ones).

step 3d. run regression analysis again, but this time only use taste and coupon as the two independent X variables (drop the other ones).

step 3e. run regression analysis again, but this time only use favorite as the single independent X variable (drop the other ones).

step 3f. run regression analysis again, but this time only use taste and favorite as the two independent X variables (drop the other ones).

step 3g. notice how the coefficients for taste in 3b and coupon in 3c don’t change a lot when combined together in 3d. however, the coefficients for taste in 3b and for favorite in 3e do change when combined together in 3f.

step 3h. run a correlation analysis (located in the analysis toolpack) highlighting the data for taste, coupon, discount, favorite, flavor, leave, specials, serve. Examine the correlations–notice that the correlations for taste, favorite, flavor, serve are all really high (close to 1). In fact, one of them is 1. This means that if you enter them into the model together in regression, you will have multicolinearity. You are not putting in variables that contain additional variance (same answer patterns) and so it completely messes up the regression beta calculations. The same holds for coupon, discount, leave, specials. When we have high correlations about the mid .8 range (e.g., .85) etc then we have to decide which of the highly correlated variables to enter and which to leave out–you can only put one of the highly correlated variables into the model so it doesn’t mess up. So you could run taste and coupon, or favorite and coupon, etc., or you could take an average of taste, favorite, flavor, and serve and use that instead plus coupon. But you want to not have multicolinearity in regression. Its a topics that is often mentioned in passing in stats classes but never shown as to what it looks like or does–and it is easy to overlook.

Step 4. Copy your output from step 3d as a screenshot to a microsoft Word document and type an answers to the following: