class: title-slide <br> <br> .right-panel[ # Subsetting Data ## Dr. Mine Dogucu ] --- ```r glimpse(lapd) ``` ``` ## Rows: 68,564 ## Columns: 35 ## $ `Row ID` <chr> "3-1000027830ctFu", "3-1000155488ctFu",… ## $ Year <dbl> 2013, 2013, 2013, 2013, 2013, 2013, 201… ## $ `Department Title` <chr> "Police (LAPD)", "Police (LAPD)", "Poli… ## $ `Payroll Department` <dbl> 4301, 4302, 4301, 4301, 4302, 4302, 430… ## $ `Record Number` <dbl> 1000027830, 1000155488, 1000194958, 100… ## $ `Job Class Title` <chr> "Police Detective II", "Clerk Typist", … ## $ `Employment Type` <chr> "Full Time", "Full Time", "Full Time", … ## $ `Hourly or Event Rate` <dbl> 53.16, 23.77, 60.80, 60.98, 45.06, 34.4… ## $ `Projected Annual Salary` <dbl> 110998.08, 49623.67, 126950.40, 127326.… ## $ `Q1 Payments` <dbl> 24931.20, 11343.96, 24184.00, 29391.20,… ## $ `Q2 Payments` <dbl> 29181.61, 13212.37, 28327.20, 36591.20,… ## $ `Q3 Payments` <dbl> 26545.80, 11508.36, 28744.20, 32904.81,… ## $ `Q4 Payments` <dbl> 29605.30, 13442.53, 33224.88, 37234.03,… ## $ `Payments Over Base Pay` <dbl> 4499.12, 1844.82, 13192.43, 18034.53, 1… ## $ `% Over Base Pay` <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, … ## $ `Total Payments` <dbl> 110263.91, 49507.22, 114480.28, 136121.… ## $ `Base Pay` <dbl> 105764.79, 47662.40, 101287.85, 118086.… ## $ `Permanent Bonus Pay` <dbl> 3174.12, 0.00, 7363.95, 7086.67, 0.00, … ## $ `Longevity Bonus Pay` <dbl> 0.00, 1310.82, 0.00, 0.00, 1251.19, 172… ## $ `Temporary Bonus Pay` <dbl> 1325.00, 0.00, 1205.00, 1325.00, 125.00… ## $ `Lump Sum Pay` <dbl> 0.00, 0.00, 2133.18, 0.00, 2068.80, 0.0… ## $ `Overtime Pay` <dbl> 0.00, 0.00, 4424.32, 9839.33, 0.00, 0.0… ## $ `Other Pay & Adjustments` <dbl> 0.00, 534.00, -1934.02, -216.47, -2068.… ## $ `Other Pay (Payroll Explorer)` <dbl> 4499.12, 1844.82, 8768.11, 8195.20, 137… ## $ MOU <chr> "24", "3", "24", "24", "12", "3", "24",… ## $ `MOU Title` <chr> "POLICE OFFICERS UNIT", "CLERICAL UNIT"… ## $ `FMS Department` <dbl> 70, 70, 70, 70, 70, 70, 70, 70, 70, 70,… ## $ `Job Class` <chr> "2223", "1358", "2227", "2232", "1839",… ## $ `Pay Grade` <chr> "2", "0", "1", "1", "0", "2", "3", "1",… ## $ `Average Health Cost` <dbl> 11651.40, 10710.24, 11651.40, 11651.40,… ## $ `Average Dental Cost` <dbl> 898.08, 405.24, 898.08, 898.08, 405.24,… ## $ `Average Basic Life` <dbl> 191.04, 11.40, 191.04, 191.04, 11.40, 1… ## $ `Average Benefit Cost` <dbl> 12740.52, 11126.88, 12740.52, 12740.52,… ## $ `Benefits Plan` <chr> "Police", "City", "Police", "Police", "… ## $ `Job Class Link` <chr> "http://per.lacity.org/perspecs/2223.pd… ``` --- ```r lapd <- clean_names(lapd) glimpse(lapd) ``` ``` ## Rows: 68,564 ## Columns: 35 ## $ row_id <chr> "3-1000027830ctFu", "3-1000155488ctFu", "3-… ## $ year <dbl> 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2… ## $ department_title <chr> "Police (LAPD)", "Police (LAPD)", "Police (… ## $ payroll_department <dbl> 4301, 4302, 4301, 4301, 4302, 4302, 4301, 4… ## $ record_number <dbl> 1000027830, 1000155488, 1000194958, 1000232… ## $ job_class_title <chr> "Police Detective II", "Clerk Typist", "Pol… ## $ employment_type <chr> "Full Time", "Full Time", "Full Time", "Ful… ## $ hourly_or_event_rate <dbl> 53.16, 23.77, 60.80, 60.98, 45.06, 34.42, 4… ## $ projected_annual_salary <dbl> 110998.08, 49623.67, 126950.40, 127326.24, … ## $ q1_payments <dbl> 24931.20, 11343.96, 24184.00, 29391.20, 208… ## $ q2_payments <dbl> 29181.61, 13212.37, 28327.20, 36591.20, 241… ## $ q3_payments <dbl> 26545.80, 11508.36, 28744.20, 32904.81, 215… ## $ q4_payments <dbl> 29605.30, 13442.53, 33224.88, 37234.03, 252… ## $ payments_over_base_pay <dbl> 4499.12, 1844.82, 13192.43, 18034.53, 1376.… ## $ percent_over_base_pay <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0… ## $ total_payments <dbl> 110263.91, 49507.22, 114480.28, 136121.24, … ## $ base_pay <dbl> 105764.79, 47662.40, 101287.85, 118086.71, … ## $ permanent_bonus_pay <dbl> 3174.12, 0.00, 7363.95, 7086.67, 0.00, 0.00… ## $ longevity_bonus_pay <dbl> 0.00, 1310.82, 0.00, 0.00, 1251.19, 1726.16… ## $ temporary_bonus_pay <dbl> 1325.00, 0.00, 1205.00, 1325.00, 125.00, 68… ## $ lump_sum_pay <dbl> 0.00, 0.00, 2133.18, 0.00, 2068.80, 0.00, 0… ## $ overtime_pay <dbl> 0.00, 0.00, 4424.32, 9839.33, 0.00, 0.00, 4… ## $ other_pay_adjustments <dbl> 0.00, 534.00, -1934.02, -216.47, -2068.80, … ## $ other_pay_payroll_explorer <dbl> 4499.12, 1844.82, 8768.11, 8195.20, 1376.19… ## $ mou <chr> "24", "3", "24", "24", "12", "3", "24", "24… ## $ mou_title <chr> "POLICE OFFICERS UNIT", "CLERICAL UNIT", "P… ## $ fms_department <dbl> 70, 70, 70, 70, 70, 70, 70, 70, 70, 70, 70,… ## $ job_class <chr> "2223", "1358", "2227", "2232", "1839", "22… ## $ pay_grade <chr> "2", "0", "1", "1", "0", "2", "3", "1", "B"… ## $ average_health_cost <dbl> 11651.40, 10710.24, 11651.40, 11651.40, 107… ## $ average_dental_cost <dbl> 898.08, 405.24, 898.08, 898.08, 405.24, 405… ## $ average_basic_life <dbl> 191.04, 11.40, 191.04, 191.04, 11.40, 11.40… ## $ average_benefit_cost <dbl> 12740.52, 11126.88, 12740.52, 12740.52, 111… ## $ benefits_plan <chr> "Police", "City", "Police", "Police", "City… ## $ job_class_link <chr> "http://per.lacity.org/perspecs/2223.pdf", … ``` --- class: middle ## subsetting variables/columns <img src="img/data-wrangle.001.jpeg" width="80%" /> -- `select()` --- class: middle ## subsetting observations/rows <img src="img/data-wrangle.002.jpeg" width="80%" /> `slice()` and `filter()` --- `select` is used to select certain variables in the data frame. .left-panel[ ```r select(lapd, year, base_pay) ``` ``` ## # A tibble: 68,564 × 2 ## year base_pay ## <dbl> <dbl> ## 1 2013 105765. ## 2 2013 47662. ## 3 2013 101288. ## 4 2013 118087. ## 5 2013 90322. ## 6 2013 62770. ## 7 2013 93718. ## 8 2013 0 ## 9 2013 51246. ## 10 2013 74227. ## # … with 68,554 more rows ``` ] -- .right-panel[ ```r lapd %>% select(year, base_pay) ``` ``` ## # A tibble: 68,564 × 2 ## year base_pay ## <dbl> <dbl> ## 1 2013 105765. ## 2 2013 47662. ## 3 2013 101288. ## 4 2013 118087. ## 5 2013 90322. ## 6 2013 62770. ## 7 2013 93718. ## 8 2013 0 ## 9 2013 51246. ## 10 2013 74227. ## # … with 68,554 more rows ``` ] --- `select` can also be used to drop certain variables if used with a negative sign. ```r select(lapd, -row_id, -department_title) ``` ``` ## # A tibble: 68,564 × 33 ## year payroll_department record_number job_class_title employment_type ## <dbl> <dbl> <dbl> <chr> <chr> ## 1 2013 4301 1000027830 Police Detective II Full Time ## 2 2013 4302 1000155488 Clerk Typist Full Time ## 3 2013 4301 1000194958 Police Sergeant I Full Time ## 4 2013 4301 1000232317 Police Lieutenant I Full Time ## 5 2013 4302 1000329284 Principal Storekeeper Full Time ## 6 2013 4302 1001124320 Police Service Repres… Full Time ## 7 2013 4301 1001221822 Police Officer III Full Time ## 8 2013 4301 1001243583 Police Sergeant I Full Time ## 9 2013 4301 1001317832 Police Officer II Full Time ## 10 2013 4301 100162910 Police Officer II Full Time ## # … with 68,554 more rows, and 28 more variables: hourly_or_event_rate <dbl>, ## # projected_annual_salary <dbl>, q1_payments <dbl>, q2_payments <dbl>, ## # q3_payments <dbl>, q4_payments <dbl>, payments_over_base_pay <dbl>, ## # percent_over_base_pay <dbl>, total_payments <dbl>, base_pay <dbl>, ## # permanent_bonus_pay <dbl>, longevity_bonus_pay <dbl>, ## # temporary_bonus_pay <dbl>, lump_sum_pay <dbl>, overtime_pay <dbl>, ## # other_pay_adjustments <dbl>, other_pay_payroll_explorer <dbl>, mou <chr>, … ``` --- class: middle ## Selection helpers `starts_with()` `ends_with()` `contains()` --- ```r select(lapd, starts_with("q")) ``` ``` ## # A tibble: 68,564 × 4 ## q1_payments q2_payments q3_payments q4_payments ## <dbl> <dbl> <dbl> <dbl> ## 1 24931. 29182. 26546. 29605. ## 2 11344. 13212. 11508. 13443. ## 3 24184 28327. 28744. 33225. ## 4 29391. 36591. 32905. 37234. ## 5 20813 24136 21518. 25231. ## 6 16057. 17927. 14150. 17052. ## 7 22162. 25664. 23404. 24586. ## 8 0 0 331. 0 ## 9 11941. 14330. 13404. 14537. ## 10 17046. 20457. 18777. 21371. ## # … with 68,554 more rows ``` --- ```r select(lapd, ends_with("pay")) ``` ``` ## # A tibble: 68,564 × 8 ## payments_over_ba… percent_over_ba… base_pay permanent_bonus… longevity_bonus… ## <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 4499. 0 105765. 3174. 0 ## 2 1845. 0 47662. 0 1311. ## 3 13192. 0 101288. 7364. 0 ## 4 18035. 0 118087. 7087. 0 ## 5 1376. 0 90322. 0 1251. ## 6 2415. 0 62770. 0 1726. ## 7 2099. 0 93718. 866. 0 ## 8 331. 0 0 0 0 ## 9 2967. 0 51246. 1540. 0 ## 10 3424. 0 74227. 2233. 0 ## # … with 68,554 more rows, and 3 more variables: temporary_bonus_pay <dbl>, ## # lump_sum_pay <dbl>, overtime_pay <dbl> ``` --- ```r select(lapd, contains("pay")) ``` ``` ## # A tibble: 68,564 × 17 ## payroll_department q1_payments q2_payments q3_payments q4_payments ## <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 4301 24931. 29182. 26546. 29605. ## 2 4302 11344. 13212. 11508. 13443. ## 3 4301 24184 28327. 28744. 33225. ## 4 4301 29391. 36591. 32905. 37234. ## 5 4302 20813 24136 21518. 25231. ## 6 4302 16057. 17927. 14150. 17052. ## 7 4301 22162. 25664. 23404. 24586. ## 8 4301 0 0 331. 0 ## 9 4301 11941. 14330. 13404. 14537. ## 10 4301 17046. 20457. 18777. 21371. ## # … with 68,554 more rows, and 12 more variables: payments_over_base_pay <dbl>, ## # percent_over_base_pay <dbl>, total_payments <dbl>, base_pay <dbl>, ## # permanent_bonus_pay <dbl>, longevity_bonus_pay <dbl>, ## # temporary_bonus_pay <dbl>, lump_sum_pay <dbl>, overtime_pay <dbl>, ## # other_pay_adjustments <dbl>, other_pay_payroll_explorer <dbl>, ## # pay_grade <chr> ``` --- class: middle ## subsetting variables/columns <img src="img/data-wrangle.001.jpeg" width="80%" /> -- `select()` --- ## subsetting observations/rows <img src="img/data-wrangle.002.jpeg" width="80%" /> `slice()` and `filter()` --- .pull-left[ `slice()` subsetting rows based on a row number. The data below include all the rows from third to seventh. Including third and seventh. ```r slice(lapd, 3:7) ``` ``` ## # A tibble: 5 × 35 ## row_id year department_title payroll_departm… record_number job_class_title ## <chr> <dbl> <chr> <dbl> <dbl> <chr> ## 1 3-10001… 2013 Police (LAPD) 4301 1000194958 Police Sergean… ## 2 3-10002… 2013 Police (LAPD) 4301 1000232317 Police Lieuten… ## 3 3-10003… 2013 Police (LAPD) 4302 1000329284 Principal Stor… ## 4 3-10011… 2013 Police (LAPD) 4302 1001124320 Police Service… ## 5 3-10012… 2013 Police (LAPD) 4301 1001221822 Police Officer… ## # … with 29 more variables: employment_type <chr>, hourly_or_event_rate <dbl>, ## # projected_annual_salary <dbl>, q1_payments <dbl>, q2_payments <dbl>, ## # q3_payments <dbl>, q4_payments <dbl>, payments_over_base_pay <dbl>, ## # percent_over_base_pay <dbl>, total_payments <dbl>, base_pay <dbl>, ## # permanent_bonus_pay <dbl>, longevity_bonus_pay <dbl>, ## # temporary_bonus_pay <dbl>, lump_sum_pay <dbl>, overtime_pay <dbl>, ## # other_pay_adjustments <dbl>, other_pay_payroll_explorer <dbl>, mou <chr>, … ``` ] -- .pull-right[ `filter()` subsetting rows based on a condition. The data below includes rows when the recorded year is 2018. ```r filter(lapd, year == 2018) ``` ``` ## # A tibble: 14,824 × 35 ## row_id year department_title payroll_departm… record_number job_class_title ## <chr> <dbl> <chr> <dbl> <dbl> <chr> ## 1 8-1000… 2018 Police (LAPD) 4301 1000027830 Police Detecti… ## 2 8-1000… 2018 Police (LAPD) 4301 1000194958 Police Sergean… ## 3 8-1000… 2018 Police (LAPD) 4301 1000232317 Police Lieuten… ## 4 8-1001… 2018 Police (LAPD) 4302 1001124320 Police Service… ## 5 8-1001… 2018 Police (LAPD) 4301 1001221822 Police Officer… ## 6 8-1001… 2018 Police (LAPD) 4301 1001317832 Police Officer… ## 7 8-1001… 2018 Police (LAPD) 4301 100162910 Police Officer… ## 8 8-1001… 2018 Police (LAPD) 4301 1001675957 Police Officer… ## 9 8-1001… 2018 Police (LAPD) 4302 1001884819 Equipment Mech… ## 10 8-1001… 2018 Police (LAPD) 4302 1001893163 Detention Offi… ## # … with 14,814 more rows, and 29 more variables: employment_type <chr>, ## # hourly_or_event_rate <dbl>, projected_annual_salary <dbl>, ## # q1_payments <dbl>, q2_payments <dbl>, q3_payments <dbl>, q4_payments <dbl>, ## # payments_over_base_pay <dbl>, percent_over_base_pay <dbl>, ## # total_payments <dbl>, base_pay <dbl>, permanent_bonus_pay <dbl>, ## # longevity_bonus_pay <dbl>, temporary_bonus_pay <dbl>, lump_sum_pay <dbl>, ## # overtime_pay <dbl>, other_pay_adjustments <dbl>, … ``` ] --- class: middle .pull-left[ ### Relational Operators in R | Operator | Description | |----------|--------------------------| | < | Less than | | > | Greater than | | <= | Less than or equal to | | >= | Greater than or equal to | | == | Equal to | | != | Not equal to | ] .pull-right[ ### Logical Operators in R | Operator | Description | |----------|-------------| | & | and | | | | or | ] --- class: middle Q. According to [datausa.io](https://datausa.io/profile/geo/los-angeles-ca#:~:text=In%202018%2C%20Los%20Angeles%2C%20CA,%2462%2C474%2C%20a%203.78%25%20increase.) Los Angeles had a median household income of $62474 in 2018. How many LAPD staff members had a base pay higher than $62474 in year 2018 according to this data? .footnote[Median household income is **not** the same thing as median employee income. Our aim is data wrangling and not necessarily statistical analysis for now.] --- ```r lapd %>% filter(year == 2018 & base_pay > 62474) ``` ``` ## # A tibble: 11,690 × 35 ## row_id year department_title payroll_departm… record_number job_class_title ## <chr> <dbl> <chr> <dbl> <dbl> <chr> ## 1 8-1000… 2018 Police (LAPD) 4301 1000027830 Police Detecti… ## 2 8-1000… 2018 Police (LAPD) 4301 1000194958 Police Sergean… ## 3 8-1000… 2018 Police (LAPD) 4301 1000232317 Police Lieuten… ## 4 8-1001… 2018 Police (LAPD) 4302 1001124320 Police Service… ## 5 8-1001… 2018 Police (LAPD) 4301 1001221822 Police Officer… ## 6 8-1001… 2018 Police (LAPD) 4301 1001317832 Police Officer… ## 7 8-1001… 2018 Police (LAPD) 4301 100162910 Police Officer… ## 8 8-1001… 2018 Police (LAPD) 4301 1001675957 Police Officer… ## 9 8-1001… 2018 Police (LAPD) 4302 1001884819 Equipment Mech… ## 10 8-1001… 2018 Police (LAPD) 4302 1001893163 Detention Offi… ## # … with 11,680 more rows, and 29 more variables: employment_type <chr>, ## # hourly_or_event_rate <dbl>, projected_annual_salary <dbl>, ## # q1_payments <dbl>, q2_payments <dbl>, q3_payments <dbl>, q4_payments <dbl>, ## # payments_over_base_pay <dbl>, percent_over_base_pay <dbl>, ## # total_payments <dbl>, base_pay <dbl>, permanent_bonus_pay <dbl>, ## # longevity_bonus_pay <dbl>, temporary_bonus_pay <dbl>, lump_sum_pay <dbl>, ## # overtime_pay <dbl>, other_pay_adjustments <dbl>, … ``` --- class: middle ```r lapd %>% filter(year == 2018 & base_pay > 62474) %>% nrow() ``` ``` ## [1] 11690 ``` --- class: middle Q. How many observations are available between 2013 and 2015 including 2013 and 2015? -- ```r lapd %>% filter(year >= 2013 & year <= 2015) ``` ``` ## # A tibble: 40,227 × 35 ## row_id year department_title payroll_departm… record_number job_class_title ## <chr> <dbl> <chr> <dbl> <dbl> <chr> ## 1 3-1000… 2013 Police (LAPD) 4301 1000027830 Police Detecti… ## 2 3-1000… 2013 Police (LAPD) 4302 1000155488 Clerk Typist ## 3 3-1000… 2013 Police (LAPD) 4301 1000194958 Police Sergean… ## 4 3-1000… 2013 Police (LAPD) 4301 1000232317 Police Lieuten… ## 5 3-1000… 2013 Police (LAPD) 4302 1000329284 Principal Stor… ## 6 3-1001… 2013 Police (LAPD) 4302 1001124320 Police Service… ## 7 3-1001… 2013 Police (LAPD) 4301 1001221822 Police Officer… ## 8 3-1001… 2013 Police (LAPD) 4301 1001243583 Police Sergean… ## 9 3-1001… 2013 Police (LAPD) 4301 1001317832 Police Officer… ## 10 3-1001… 2013 Police (LAPD) 4301 100162910 Police Officer… ## # … with 40,217 more rows, and 29 more variables: employment_type <chr>, ## # hourly_or_event_rate <dbl>, projected_annual_salary <dbl>, ## # q1_payments <dbl>, q2_payments <dbl>, q3_payments <dbl>, q4_payments <dbl>, ## # payments_over_base_pay <dbl>, percent_over_base_pay <dbl>, ## # total_payments <dbl>, base_pay <dbl>, permanent_bonus_pay <dbl>, ## # longevity_bonus_pay <dbl>, temporary_bonus_pay <dbl>, lump_sum_pay <dbl>, ## # overtime_pay <dbl>, other_pay_adjustments <dbl>, … ``` --- class: middle Q. How many observations are available between 2013 and 2015 including 2013 and 2015? ```r lapd %>% filter(year >= 2013 & year <= 2015) %>% nrow() ``` ``` ## [1] 40227 ``` --- class: middle Q. How many LAPD staff were employed full time in 2018? ```r lapd %>% filter(employment_type == "Full Time" & year == 2018) %>% nrow() ``` ``` ## [1] 14664 ```