California School Dashboards Part 2: Visualizing the Data
This is part two of a three part series where I’ll be working with California School Dashboard data by cleaning, visualizaing, and exploring through modeling. You can read the first part of this series, which shows one way to clean and prepare the data, at this link.
Now that our dashboard data is cleaned up so you have one school/subgroup combination per row and one variable per column, we’re ready to start visualizing the data. Cleaning the data in this way is helpful because the we’ll be using the
ggplot2 package, which makes it easy for us to map variables in our cleaned dataset to graphical features of our chart. To illustrate that, we’re going to work with three anonymized district English Language Arts datasets. You can view the script I used to anonymize the data on my GitHub page.
Our goal will be to use data visualization to compare the number of subgroups who fell in each of the six categories of distance from three change. You can visit the CDE website for a more complete description of the distance from three score and the categories of change.
clean_data %>% bind_rows() %>% ggplot(data = .) + geom_bar(aes(x = changelevel), fill = "violetred3") + labs(title = "English Language Arts Change Levels: Distance From Three", x = "", y = "Number of Subgroups") + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + facet_wrap(~districtname)
This kind of visualization is useful for comparing the distribution of subgroups across categories. Each district has a different total enrolllment, so the actual count of subgroups isn’t as meaningful as the shape of each district’s data. Looking at these differences can help teams identify which groups they want to visit and observe the underlying processes that generate the data. The important question here is: What’s happening in real life that generates these numbers?
district_1 has a flatter profile compared to
district_3. I would be interested in visiting
district_1 to see what their processes are for determining which students take the English Language Arts test. Some curiosities are:
- Do these processes look similar or different from the other two districts?
- If we listen to each district’s staff describe all the events that lead up to an English Language Arts subgroup score being produced, how would those descriptions differ from each other?
- Do these different descriptions plausibly explain the differing shapes of this visualization?
- If we changed some of these processes, would we expect a different shape to this data?
You can view the full code for this post on my GitHub site. Next up, modeling the California School Dashboard data!