Skip to content Skip to sidebar Skip to footer

How to Read a Latin Squares Main Effects Plot

Latin Squares, Graeco-Latin Squares, & Factorial experiments

Evaluate the NYC Sabbatum scores data and deal with its missing values, then evaluate Latin Foursquare, Graeco-Latin Foursquare, and Factorial experiments.


Latin Squares Video


NYC Sabbatum Scores EDA

Math is a subject the U.Southward. is consistently behind the rest of the world on, and so our experiments will focus on Math score. While the original dataset is an open dataset downloaded from Kaggle, throughout this chapter I volition add together a few variables that will allow you to pretend y'all are an education researcher conducting experiments ideally aimed at raising students' scores, hopefully increasing the likelihood they will be admitted to colleges.

Before diving into analyzing the experiments, we should do some EDA to brand sure we fully sympathize the nyc_scores data. In this lesson, we'll do experiments where we block by Borough and Teacher_Education_Level, so let'due south examine math scores by those variables. The nyc_scores dataset has been loaded for you lot.


Exercise

  • Observe the mean, variance, and median of Average_Score_SAT_Math by Borough using dplyr methods for EDA as nosotros have used them throughout the course.
                                      # Mean, var, and median of Math score                    nyc_scores                    %>%                                                            group_by(Borough)                    %>%                                                                                                    summarize(mean =                    mean(Average_Score_SAT_Math,                    na.rm =                    True),                    var =                    var(Average_Score_SAT_Math,                    na.rm =                    TRUE),                    median =                    median(Average_Score_SAT_Math,                    na.rm =                    TRUE))                
                # A tibble: 5 x 4   Civic        mean   var median   <fct>         <dbl> <dbl>  <dbl> 1 Bronx          404. 2727.   396. 2 Brooklyn       416. 3658.   395  3 Manhattan      456. 7026.   433  4 Queens         462. 5168.   448  5 Staten Isle  486. 6911.   466.              
  • Detect the mean, variance, and median of Average_Score_SAT_Math by Teacher_Education_Level using dplyr EDA methods.
                                      # Mean, var, and median of Math score by Teacher Instruction Level                    nyc_scores                    %>%                                                            group_by(Teacher_Education_Level)                    %>%                                                                                                    summarize(mean =                    mean(Average_Score_SAT_Math,                    na.rm =                    Truthful),                    var =                    var(Average_Score_SAT_Math,                    na.rm =                    TRUE),                    median =                    median(Average_Score_SAT_Math,                    na.rm =                    TRUE))                
                # A tibble: 5 x 4   Teacher_Education_Level  mean   var median   <fct>                   <dbl> <dbl>  <dbl> 1 BA                       427. 4114.   415  2 College Student          423. 4043.   410. 3 Grad Student             430. 4694.   416. four MA                       449. 6520.   418. five PhD                      432. 6654.   408                              
  • Observe the hateful, variance, and median of Average_Score_SAT_Math by both Civic and Teacher_Education_Level using dplyr EDA methods.
                                      # Hateful, var, and median of Math score past both                    nyc_scores                    %>%                                                            group_by(Civic, Teacher_Education_Level)                    %>%                                                                                                    summarize(mean =                    hateful(Average_Score_SAT_Math,                    na.rm =                    True),                    var =                    var(Average_Score_SAT_Math,                    na.rm =                    TRUE),                    median =                    median(Average_Score_SAT_Math,                    na.rm =                    TRUE))                
                # A tibble: 25 x five # Groups:   Borough [5]    Borough  Teacher_Education_Level  mean   var median    <fct>    <fct>                   <dbl> <dbl>  <dbl>  1 Bronx    BA                       394. 1458.   383   2 Bronx    Higher Pupil          401. 1291.   390   3 Bronx    Grad Student             403. 1274.   402.  four Bronx    MA                       418. 6047.   400.  5 Bronx    PhD                      390.  621.   393   6 Brooklyn BA                       416. 2335.   394   7 Brooklyn Higher Student          432. 5811.   420   8 Brooklyn Grad Pupil             401. 1988.   394   9 Brooklyn MA                       434. 4955.   408  ten Brooklyn PhD                      388. 2033.   380  # … with 15 more rows              

At present that nosotros've examined the data, we can move on to cleaning it, the next important step earlier analysis.


Dealing with Missing Exam Scores

If we desire to use Sat scores as our outcome, we should examine missingness. Examine the pattern of missingness across all the variables in nyc_scores using miss_var_summary() from the naniar packet written by Tierney et al. (2019). naniar integrates with Tidyverse code styling, including the pipe operator (%>%).

There are threescore missing scores in each field of study. Though in that location are many R packages which assistance with more advanced forms of imputation, such as MICE, Amelia, and mi, we will continue to utilise simputation and impute_median().

Create a new dataset, nyc_scores_2 by imputing Math score by Borough, but note that impute_median() returns the imputed variable equally blazon "impute". Y'all'll convert the variable to the numeric in a separate step.

simputation and dplyr are loaded.


Exercise

  • Load the `naniar packet.
                                      # Load naniar                    library(naniar)                
                                  Attaching package: 'naniar'              
                The post-obit object is masked from 'packet:simputation':      impute_median              
  • Examine the missingness of variables in nyc_scores past piping it to miss_var_summary().
                                      # Examine missingness with miss_var_summary()                    nyc_scores                    %>%                                                            miss_var_summary()                
                # A tibble: 23 ten iii    variable                  n_miss pct_miss    <chr>                      <int>    <dbl>  1 Average_Score_SAT_Math        60    xiii.8   2 Average_Score_SAT_Reading     60    13.8   3 Average_Score_SAT_Writing     lx    thirteen.8   four Percent_Tested                49    11.3   five Student_Enrollment             7     1.61  6 Percent_White                  7     one.61  vii Percent_Black                  seven     1.61  8 Percent_Hispanic               seven     1.61  9 Percent_Asian                  vii     1.61 10 School_ID                      0     0    # … with 13 more rows              
  • Create nyc_scores_2 by imputing the Math score by Borough (nosotros're only using Math in our experiments.)
                                      # Examine missingness with md.blueprint()---from mice package!                    mice::                    physician.pattern(nyc_scores)                

                                  School_ID School_Name Civic Building_Code Street_Address City Land 375         1           1       1             1              1    i     i xi          1           1       1             ane              1    1     1 42          1           1       1             1              1    1     1 7           1           1       1             ane              1    1     1             0           0       0             0              0    0     0     Zip_Code Latitude Longitude Phone_Number Start_Time End_Time 375        one        1         1            1          1        1 eleven         1        1         1            1          i        1 42         1        1         ane            1          i        one seven          1        1         i            i          i        1            0        0         0            0          0        0     Teacher_Education_Level Student_Enrollment Percent_White Percent_Black 375                       one                  one             1             1 11                        1                  1             ane             1 42                        1                  1             1             1 vii                         1                  0             0             0                           0                  7             7             7     Percent_Hispanic Percent_Asian Percent_Tested Average_Score_SAT_Math 375                1             1              1                      i 11                 ane             1              i                      0 42                 1             1              0                      0 vii                  0             0              0                      0                    seven             seven             49                     threescore     Average_Score_SAT_Reading Average_Score_SAT_Writing     375                         1                         1   0 xi                          0                         0   iii 42                          0                         0   four 7                           0                         0   9                            threescore                        lx 264              
                                      library(simputation)                    # Impute the Math score past Civic                    # Note loading nanier masks the impute_median function from                    # simputation....so must specify the package!!!!                    nyc_scores_ii                    <-                    simputation::                    impute_median(nyc_scores, Average_Score_SAT_Math                    ~                                        Borough)                
  • Catechumen nyc_scores_2$Average_Score_SAT_Math to numeric.
                                      # Convert Math score to numeric                    nyc_scores_2                    $Average_Score_SAT_Math <-                                        as.numeric(nyc_scores_2                    $Average_Score_SAT_Math)                
  • Use dplyr to examine the median and mean of math score before and after imputation.
                                      # Examine scores by Borough in both datasets, before and later on imputation                    nyc_scores                    %>%                                                                                                    group_by(Borough)                    %>%                                                                                                    summarize(median =                    median(Average_Score_SAT_Math,                    na.rm =                    TRUE),                    hateful =                    mean(Average_Score_SAT_Math,                    na.rm =                    TRUE))                
                # A tibble: five x 3   Borough       median  mean   <fct>          <dbl> <dbl> i Bronx           396.  404. two Brooklyn        395   416. 3 Manhattan       433   456. four Queens          448   462. v Staten Isle   466.  486.              
                  nyc_scores_2                    %>%                                                                                                    group_by(Borough)                    %>%                                                                                                    summarize(median =                    median(Average_Score_SAT_Math,                    na.rm =                    TRUE),                    mean =                    mean(Average_Score_SAT_Math,                    na.rm =                    Truthful))                
                # A tibble: 5 x 3   Civic       median  mean   <fct>          <dbl> <dbl> 1 Bronx           396.  403. 2 Brooklyn        395   414. three Manhattan       433   452. four Queens          448   460. 5 Staten Island   466.  486.              

Did the median scores change before and afterwards imputation? (Hint: they shouldn't have changed past much, simply rounding may have offset them by an integer or 2.)


Cartoon Latin Squares with agricolae

Nosotros return, once again, to the agricolae package to examine what a Latin Square pattern tin can wait like. Hither'south an instance:

                              [,ane] [,2] [,three] [,4] [i,] "A"  "C"  "D"  "B"  [2,] "D"  "B"  "C"  "A"  [3,] "B"  "D"  "A"  "C"  [4,] "C"  "A"  "B"  "D"                          

Since a Latin Foursquare experiment has two blocking factors, you can encounter that in this pattern, each handling appears once in both each row (blocking factor 1) and each column (blocking factor 2).

Look at the help page for blueprint.lsd() by typing ?blueprint.lsd in the console for any assist yous need designing your Latin Square experiment.


  • Load the agricolae packet.
                                  # Load agricolae                  library(agricolae)              
  • Create and view the sketch of a Latin Square design, my_design_lsd, using treatments A, B, C, D, & Eastward, and a seed of 42.
                                  # Design a LS with 5 treatments A:E then look at the sketch                  my_design_lsd <-                                    pattern.lsd(Messages[1                  :                  5],                  seed =                  42) my_design_lsd$sketch              
                              [,ane] [,2] [,3] [,4] [,5] [i,] "B"  "E"  "D"  "A"  "C"  [2,] "A"  "D"  "C"  "E"  "B"  [iii,] "E"  "C"  "B"  "D"  "A"  [iv,] "C"  "A"  "Eastward"  "B"  "D"  [v,] "D"  "B"  "A"  "C"  "Eastward"                          

Possibly yous're thinking to yourself… This looks a lot like a RCBD … It does, only equally nosotros know from the video, there are now two blocking factors in a LS design.


Latin Square with NYC Sat Scores

To execute a Latin Square design on this data, suppose we want to know the result of our tutoring program, which includes 1-on-1 tutoring, two small groups, and an in and later on-school SAT prep class. A new dataset nyc_scores_ls is available that represents this experiment. Feel free to explore the dataset in the console.

Nosotros'll block past Borough and Teacher_Education_Level to reduce their known variance on the score outcome. Borough is a skilful blocking gene because schools in America are funded partly based on taxes paid in each city, so it will likely make a deviation in the quality of education.


Exercise

  • Use lm() to exam the changes in Average_Score_SAT_Math using nyc_scores_ls.
                                      # Build nyc_scores_ls_lm                    nyc_scores_ls_lm <-                                        lm(Average_Score_SAT_Math                    ~                                        Tutoring_Program                    +                                                                                Borough                    +                                        Teacher_Education_Level,                    information =                    nyc_scores_ls )                
  • Tidy nyc_scores_ls_lm with the appropriate broom function.
                                      # Tidy the results with broom                    broom::                    tidy(nyc_scores_ls_lm)                
                # A tibble: thirteen x v    term                               guess std.fault statistic  p.value    <chr>                                 <dbl>     <dbl>     <dbl>    <dbl>  1 (Intercept)                         413.         64.0   6.45     three.14e-5  2 Tutoring_ProgramSAT Prep Form (a…  -54.4        61.0  -0.892    iii.90e-ane  3 Tutoring_ProgramSAT Prep Class (southward…  -30.9        64.4  -0.480    6.40e-1  4 Tutoring_ProgramSmall Groups (2-iii)    0.417      60.4   0.00690  9.95e-ane  v Tutoring_ProgramSmall Groups (four-6)  -36.0        58.1  -0.619    5.47e-1  6 BoroughBrooklyn                      55.0        53.6   1.03     three.24e-1  7 BoroughManhattan                     21.9        45.4   0.482    six.39e-1  8 BoroughQueens                        46.0        53.6   0.858    4.08e-1  9 BoroughStaten Island                352.         89.9   3.91     2.05e-3 10 Teacher_Education_LevelCollege St…    7.thirty       52.1   0.140    8.91e-1 11 Teacher_Education_LevelGrad Stude…   72.4        46.v   1.56     1.45e-1 12 Teacher_Education_LevelMA            36.6        47.6   0.768    4.57e-1 13 Teacher_Education_LevelPhD          113.         59.3   1.91     8.04e-2              
  • Examine nyc_scores_ls_lm with anova().
                                      # Examine the results with anova                    anova(nyc_scores_ls_lm)                
                Analysis of Variance Table  Response: Average_Score_SAT_Math                         Df Sum Sq Mean Sq F value  Pr(>F)   Tutoring_Program         4  19238  4809.four  0.9087 0.48959   Borough                  four  78483 19620.vii  3.7071 0.03457 * Teacher_Education_Level  4  26884  6721.0  ane.2698 0.33496   Residuals               12  63514  5292.viii                   --- Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1              

Question At the 0.05 significance level, practice we have bear witness to believe the tutoring programme has an consequence on math SAT scores, when blocked by Borough and Teacher_Education_Level?

  • Nope! Given the p-value (0.2261), nosotros have no reason to reject the nothing hypothesis.

  • Yes! Given the p-value (0.2261), we have reason to believe that the tutoring plan had an effect on the Math score.

It seems that when nosotros block for Civic of the school and Teacher_Education_Level, our Tutoring_Program isn't having a statistically meaning effect on the Math Sabbatum score.


Graeco-Latin Squares Video


NYC Sat Scores Data Viz

In the concluding lesson, when discussing Latin Squares, nosotros did numerical EDA in the course of looking at means, variances, and medians of the math SAT scores. Some other crucial part of the EDA is data visualization, as it ofttimes helps in spotting outliers plus gives yous a visual representation of the distribution of your variables.

ggplot2 has been loaded for you lot and the nyc_scores dataset is bachelor. Create and examine the requested boxplot. How practise the medians differ past Civic? How many outliers are nowadays, and where are they more often than not present?


Practise

  • Create a boxplot of Math Saturday scores by Civic.
                                      library(ggplot2)                    ggplot(data =                    nyc_scores,                    aes(x =                    Borough,                    y =                    Average_Score_SAT_Math))                    +                                                            geom_boxplot()                
                Warning: Removed 60 rows containing non-finite values (stat_boxplot).              

  • Run the lawmaking to include a title: "Boilerplate SAT Math Scores by Borough, NYC".
                                      ggplot(information =                    nyc_scores,                    aes(ten =                    Borough,                    y =                    Average_Score_SAT_Math))                    +                                                            geom_boxplot()                    +                                                                                                    ggtitle("Average SAT Math Scores by Borough, NYC")                
                Alarm: Removed lx rows containing non-finite values (stat_boxplot).              

  • Change the x- and y-axis labels to read "Borough (NYC)" and "Average SAT Math Scores (2014-15)", respectively, using the correct arguments to labs().
                                      ggplot(data =                    nyc_scores,                    aes(ten =                    Borough,                    y =                    Average_Score_SAT_Math))                    +                                                            geom_boxplot()                    +                                                                                                    ggtitle("Boilerplate Saturday Math Scores by Civic, NYC")                    +                                                            labs(xlab("Borough (NYC)") ,                    ylab("Average SAT Math Scores (2014-xv)"))                
                Alarm: Removed 60 rows containing non-finite values (stat_boxplot).              

                  ## or                    ggplot(data =                    nyc_scores,                    aes(10 =                    Borough,                    y =                    Average_Score_SAT_Math))                    +                                                            geom_boxplot()                    +                                                            labs(championship =                    "Average Sat Math Scores by Borough, NYC",                    x =                    "Civic (NYC)",                    y =                    "Average Sat Math Scores (2014-15)")                    +                                                                                                    theme_bw()                
                Warning: Removed 60 rows containing non-finite values (stat_boxplot).              

It's interesting to run across the unlike distribution of scores past Civic and to see that every civic has scores that are outliers, though some more than than others.


Drawing Graeco-Latin Squares with agricolae

As we've seen, agricolae provides united states of america the power to draw all of the experimental designs we've used so far, and they can besides depict Graeco-Latin squares. Ane difference in the input to design.graeco() that we haven't seen earlier is that we'll need to input 2 vectors, trt1 and trt2, which must be of equal length. Y'all can call up of trt1 equally your bodily treatment and trt2 as one of your blocking factors. agricolae has been loaded for yous.


Practice

  • Create vectors trt1 with LETTERS A through E and trt2 with numbers 1 through 5.
                                      # Create trt1 and trt2                    trt1 <-                    Letters[1                    :                    5] trt2 <-                                        1                    :                    5                                  
  • Brand my_graeco_design with blueprint.graeco(), using and seed = 42.
                                      # Create my_graeco_design                    my_graeco_design <-                                        design.graeco(trt1, trt2,                    seed =                    42)                
  • Examine the parameters and sketch of my_graeco_design.
                                      # Examine the parameters and sketch                    my_graeco_design$parameters                
                $design [1] "graeco"  $trt1 [i] "A" "B" "C" "D" "Eastward"  $trt2 [one] ane ii 3 4 5  $r [ane] v  $serie [one] 2  $seed [1] 42  $kinds [1] "Super-Duper"  [[8]] [i] TRUE              
                                  [,one]  [,2]  [,3]  [,4]  [,5]  [1,] "D 2" "E 3" "A 1" "C 5" "B 4" [2,] "E 1" "A 5" "C iv" "B 2" "D 3" [iii,] "A 4" "C two" "B 3" "D 1" "E 5" [4,] "C 3" "B i" "D 5" "E 4" "A two" [five,] "B five" "D 4" "E two" "A three" "C 1"              

Yous tin run across that this time the sketch object includes your handling (the capital letter) and a blocking factor (the number.)


Graeco-Latin Square with NYC Saturday Scores

Recall that our Latin Foursquare exercise in this chapter tested the effect of our tutoring programme, blocked by Civic and Teacher_Education_Level.

For our Graeco-Latin Square, say we also want to block out the known effect of Homework_Type, which indicates what kind of homework the student was given: individual merely, small or large group homework, or some combination. We tin can add this as another blocking gene to create a Graeco-Latin Square experiment.


Do

  • Utilize lm() to test the changes in Average_Score_SAT_Math using the nyc_scores_gls data.
                                      # Build nyc_scores_gls_lm                    nyc_scores_gls_lm <-                                        lm(Average_Score_SAT_Math                    ~                                        Tutoring_Program                    +                                                                                Borough                    +                                        Teacher_Education_Level                    +                                        Homework_Type,                    information =                    nyc_scores_gls)                
  • Tidy nyc_scores_gls_lm with the appropriate broom function.
                                      # Tidy the results with broom                    broom::                    tidy(nyc_scores_gls_lm)                
                # A tibble: 17 x five    term                                estimate std.error statistic p.value    <chr>                                  <dbl>     <dbl>     <dbl>   <dbl>  1 (Intercept)                           405.        lxxx.iv    5.03   0.00102  2 Tutoring_ProgramSAT Prep Course (af…   -47.2       69.0   -0.684  0.513    3 Tutoring_ProgramSAT Prep Form (sc…    33.2       80.3    0.414  0.690    4 Tutoring_ProgramSmall Groups (2-iii)      4.02      75.half dozen    0.0531 0.959    5 Tutoring_ProgramSmall Groups (iv-six)    -29.7       64.five   -0.461  0.657    6 BoroughBrooklyn                        33.ii       75.iii    0.441  0.671    7 BoroughManhattan                       35.6       51.0    0.697  0.505    eight BoroughQueens                          93.5       66.ix    i.40   0.200    ix BoroughStaten Island                  300.       107.     2.81   0.0229  ten Teacher_Education_LevelCollege Stu…    17.0       67.2    0.252  0.807   11 Teacher_Education_LevelGrad Student    54.0       63.vi    0.849  0.421   12 Teacher_Education_LevelMA              15.iv       55.7    0.276  0.789   13 Teacher_Education_LevelPhD            133.        72.3    i.84   0.104   14 Homework_TypeLarge Group               53.4       seventy.5    0.757  0.471   15 Homework_TypeMix of Large Group/In…   -49.8       69.4   -0.717  0.494   16 Homework_TypeMix of Minor Group/In…   -38.7       58.8   -0.659  0.529   17 Homework_TypeSmall Group              -47.5       62.9   -0.756  0.472                              
  • Examine nyc_scores_gls_lm with anova().
                                      # Examine the results with anova                    anova(nyc_scores_gls_lm)                
                Analysis of Variance Table  Response: Average_Score_SAT_Math                         Df Sum Sq Hateful Sq F value  Pr(>F)   Tutoring_Program         4  19238  4809.4  0.8277 0.54329   Borough                  four  78483 19620.7  3.3765 0.06725 . Teacher_Education_Level  4  26884  6721.0  ane.1566 0.39734   Homework_Type            iv  17026  4256.5  0.7325 0.59474   Residuals                8  46487  5810.9                   --- Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1              

Question

At the 0.05 significance level, exercise we take evidence to believe the tutoring program has an effect on math SAT scores, when blocked by Borough, Teacher_Education_Level, and Homework_Type?

  • Nope! Given the p-value, we have no reason to reject the null hypothesis.

  • Yes! Given the p-value, we have reason to believe that the tutoring program had an event on the Math score.

It seems that here, when blocked out past all the other factors, our Tutoring programme has no effect on the Math score.


Factorial Experiments Video


NYC SAT Scores Factorial EDA

Permit's do some more EDA earlier we dive into the analysis of our factorial experiment.

Let's examination the effect of Percent_Black_HL, Percent_Tested_HL, and Tutoring_Program on the effect, Average_Score_SAT_Math. The HL stands for loftier-low, where a one indicates respectively that less than 50% of Blackness students or that less than l% of all students in an entire schoolhouse were tested, and a 2 indicates that greater than l% of either were tested.

Build a boxplot of each factor vs. the outcome to have an idea of which have a difference in median by factor level (ultimately, mean difference is what'due south tested.) The nyc_scores dataset has been loaded for you.


Exercise

  • Load ggplot2. Create a boxplot of the outcome versus Tutoring_Program.
                                      # More made upward information                    nyc_scores$Tutoring_Program <-                                        rep(c("Yep",                    "No"),                    supersede =                    Truthful,                    length =                    435)                    # Load ggplot2                    library(ggplot2)                    # Build the boxplot for the tutoring programme vs. Math SAT score                    ggplot(data =                    nyc_scores,                    aes(Tutoring_Program, Average_Score_SAT_Math))                    +                                                                                                    geom_boxplot()                    +                                                                                                    theme_bw()                
                Warning: Removed 60 rows containing not-finite values (stat_boxplot).              

  • Using ggplot2, create a boxplot of the issue versus Percent_Black_HL.
                                      # Feature engineering                    Percent_Black_HL <-                                        factor(ifelse(nyc_scores$Percent_Black                    <                                                            0.5,                    one,                    2)) Percent_Tested_HL <-                                        factor(ifelse(nyc_scores$Percent_Tested                    <                                                            0.5,                    1,                    2)) nyc_scores$Percent_Black_HL <-                    Percent_Black_HL nyc_scores$Percent_Tested_HL <-                    Percent_Tested_HL                    rm("Percent_Tested_HL",                    "Percent_Black_HL")                    # Build the boxplot for the pct black vs. Math Saturday score                    ggplot(data =                    nyc_scores,                    aes(x =                    Percent_Black_HL,                    y =                    Average_Score_SAT_Math))                    +                                                                                                    geom_boxplot()                
                Warning: Removed 60 rows containing non-finite values (stat_boxplot).              

  • Using ggplot2, create a boxplot of the consequence versus Percent_Tested_HL.
                                      # Build the boxplot for percent tested vs. Math SAT score                    ggplot(information =                    nyc_scores,                    aes(10 =                    Percent_Tested_HL,                    y =                    Average_Score_SAT_Math))                    +                                                                                                    geom_boxplot()                
                Warning: Removed sixty rows containing non-finite values (stat_boxplot).              


Factorial Experiment with NYC SAT Scores

Now we want to examine the consequence of tutoring programs on the NYC schools' Sat Math score. As noted in the last exercise: the variable Tutoring_Program is just yes or no, depending on if a school got a tutoring programme implemented. For Percent_Black_HL and Percent_Tested_HL, HL stands for high/low. A i indicates less than 50% Black students or overall students tested, and a 2 indicates greater than 50% of both.

Remember that because nosotros intend to test all of the possible combinations of factor levels, we need to write the formula like:

                outcome                  ~                                    factor1                  *                                    factor2                  *                                    factor3              

Exercise

  • Utilize aov() to create a model to test how Percent_Tested_HL, Percent_Black_HL, and Tutoring_Program bear upon the issue Average_Score_SAT_Math.

  • Salvage the outcome as a model object, nyc_scores_factorial, and examine this with tidy().

                                      # Create nyc_scores_factorial and examine the results                    nyc_scores_factorial <-                                        aov(Average_Score_SAT_Math                    ~                                                                                Percent_Tested_HL*Percent_Black_HL*Tutoring_Program,                    information =                    nyc_scores) broom::                    tidy(nyc_scores_factorial)                
                # A tibble: eight x 6   term                               df   sumsq  meansq statistic   p.value   <chr>                           <dbl>   <dbl>   <dbl>     <dbl>     <dbl> 1 Percent_Tested_HL                   1  1.90e5  1.90e5  43.7      1.35e-ten 2 Percent_Black_HL                    one  1.09e5  1.09e5  25.0      9.07e- seven three Tutoring_Program                    ane  6.03e3  6.03e3   1.39     2.40e- 1 iv Percent_Tested_HL:Percent_Blac…     1  3.29e4  3.29e4   seven.57     6.23e- iii 5 Percent_Tested_HL:Tutoring_Pro…     1  2.19e1  2.19e1   0.00504  9.43e- 1 6 Percent_Black_HL:Tutoring_Prog…     1  2.30e1  2.30e1   0.00528  9.42e- 1 vii Percent_Tested_HL:Percent_Blac…     ane  1.58e3  one.58e3   0.362    5.48e- ane 8 Residuals                         367  1.60e6  4.35e3  NA       NA                              

Check the circumlocution on datacamp here every bit information technology makes little to no sense…


Evaluating the NYC SAT Scores Factorial Model

We've built our model, so we know what'southward next: model checking! We demand to examine both if our outcome and our model residuals are unremarkably distributed. We'll check the normality assumption using shapiro.examination(). A low p-value means we can reject the cipher hypothesis that the sample came from a normally distributed population.

Let'southward carry out the requisite model checks for our \(two^k\) factorial model, nyc_scores_factorial, which has been loaded for you.


Exercise

  • Examination the result Average_Score_SAT_Math from nyc_scores for normality using shapiro.test().
                                      # Use shapiro.exam() to test the issue                    shapiro.examination(nyc_scores$Average_Score_SAT_Math)                
                                  Shapiro-Wilk normality examination  data:  nyc_scores$Average_Score_SAT_Math W = 0.84672, p-value < 2.2e-16              
  • Set a ii by 2 grid for plots and plot the nyc_scores_factorial model object to create the remainder plots.
                                      # Plot nyc_scores_factorial to examine residuals                    par(mfrow =                    c(2,2))                    plot(nyc_scores_factorial)                

In that location are bug with this model!


What's side by side in Experimental Design Video


References

bagwellcined2002.blogspot.com

Source: https://stat-ata-asu.github.io/ExperimentalDesignInR/latin-squares-graeco-latin-squares-factorial-experiments.html

Post a Comment for "How to Read a Latin Squares Main Effects Plot"