Code Along With Me (Episode 1)

An assessment of the livestock numbers in the six counties declared to be ‘disturbed and dangerous’ in Kenya

RStudio
R
Tutorial
Blog
Author

William Okech

Published

November 28, 2023

Pastoral livestock running through a dry savanna followed by a traditional herder on horseback

Image created using “Bing Image Creator” with the prompt keywords “cows, sheep, indigenous cattle, running, dry savanna, river bed, traditional herdsman, nature photography, –ar 5:4 –style raw”

Introduction

In February 2023, the government of Kenya described six counties as “disturbed” and “dangerous.” This is because in the preceding six months, over 100 civilians and 16 police officers have lost their lives, as criminals engaged in banditry have terrorized the rural villages. One of the main causes of the conflict is livestock theft, therefore, my goal is to perform an exploratory data analysis of livestock numbers from the Kenya Population and Housing Census (2019) report.

References

  1. Nation Media News Brief
  2. Citizen TV Summary

Section 1: Load all the required libraries

Code
library(tidyverse) # a collection of packages used to model, transform, and visualize data
library(rKenyaCensus) # tidy datasets obtained from the Kenya Population and Housing Census results
library(patchwork) # combine separate ggplots into the same graphic
library(janitor) # initial data exploration and cleaning for a new data set
library(ggrepel)# repel overlapping text labels
library(ggthemes) # Extra Themes, Scales and Geoms for 'ggplot2'
library(scales) # tools to override the default breaks, labels, transformations and palettes
# install.packages("treemapify")
library(treemapify) # allows the creation of treemaps in ggplot2

library(sf) # simple features, a method to encode spatial vector data
#install.packages("devtools")
library(devtools) # helps to install packages not on CRAN
#devtools::install_github("yutannihilation/ggsflabel")
library(ggsflabel) # place labels on maps

library(knitr) # a tool for dynamic report generation in R
#install.packages("kableExtra")
library(kableExtra) # build common complex tables and manipulate table styles

Note: If you have package loading issues check the timeout with getOption(‘timeout’) and use options(timeout = ) to increase package loading time.

Section 2: Create a map of the “dangerous and disturbed” counties

The rKenyaCensus package includes a built-in county boundaries dataset to facilitate mapping of the various indicators in the Census. The required shapefile for this analysis is KenyaCounties_SHP

a) Sample plot of the map of Kenya

Code
# Load the shapefile
kenya_counties_sf <- st_as_sf(KenyaCounties_SHP)

# Plot the map of Kenya
p0 <- ggplot(kenya_counties_sf) + 
  geom_sf(fill = "bisque2", linewidth = 0.6, color = "black") + 
  theme_void()
p0

b) Highlight the dangerous and disturbed counties in Kenya

Code
# First, remove the "/" from the county names

kenya_counties_sf$County <- gsub("/", 
                                 " ", 
                                 kenya_counties_sf$County)

# Select the six counties to highlight
highlight_counties <- c("TURKANA", "WEST POKOT", "ELGEYO MARAKWET", "BARINGO", "LAIKIPIA", "SAMBURU")

# Filter the counties dataset to only include the highlighted counties
highlighted <- kenya_counties_sf %>% filter(County %in% highlight_counties)

# Plot the highlighted counties in the map
p1 <- ggplot() + 
  geom_sf(data = kenya_counties_sf, fill = "bisque2", linewidth = 0.6, color = "black") + 
  geom_sf(data  = highlighted, fill = "chocolate4", linewidth = 0.8, color = "black") +
  theme_void()
p1

c) Plot only the required counties

Code
p2 <- ggplot(data = highlighted) +
  geom_sf(aes(fill = County), linewidth = 1, show.legend = FALSE) +
  geom_label_repel(aes(label = County, geometry = geometry), size = 3,
                      stat = "sf_coordinates",
                      force=10, # force of repulsion between overlapping text labels
                      seed = 1,
                      segment.size = 0.75,
                      min.segment.length=0) +
  scale_fill_brewer(palette = "OrRd") +
  labs(title = "",
       caption = "") +
  theme(plot.title = element_text(family = "Helvetica",size = 10, hjust = 0.5),
        legend.title = element_blank(),
        legend.position = "none",
        plot.caption = element_text(family = "Helvetica",size = 12)) +
  theme_void() 
p2

Code
# Notes: geom_label_repel() with geometry and stat defined can be used as an 
# alternative to geom_sf_label_repel()

d) Combine the plots using patchwork to clearly highlight the counties of interest

Code
p1 + 
  p2 + 
  plot_annotation(title = "",
                  subtitle = "",
                  caption = "",
                  theme = theme(plot.title = element_text(family="Helvetica", face="bold", size = 25),
                                plot.subtitle = element_text(family="Helvetica", face="bold", size = 15),
                                plot.caption = element_text(family = "Helvetica",size = 12, face = "bold")))

Section 3: Load the livestock data from the census report and generate the dataframes required for analysis.

a) View the data available in the data catalogue

Code
data("DataCatalogue")

b) Load the livestock data

Here, pastoral livestock are defined as sheep, goats, and indigenous cattle.

Code
# Select the livestock data from the census report
df_livestock <- V4_T2.24
livestock <- df_livestock[2:393, ]
livestock <- livestock %>%
  clean_names()

# Remove the "/" from the county names in the dataset
livestock$county <- gsub("/", " ", livestock$county)
livestock$sub_county <- gsub("/", " ", livestock$sub_county)

# Select the variables of interest from the dataset
# These include the county, subcounty, land area, number of farming households, 
# sheep, goats, and indigenous cattle.

# New variables listed below include:
# pasto_livestock is the total number of sheep, goats, and indigenous cattle
# ind_cattle_household is the number of indigenous cattle per household
# goats_household is the number of goats per household
# sheep_household is the number of sheep per household
# pasto_livestock_household is the number of pastoral livestock per household

livestock_select <- livestock %>%
  select(county, sub_county, admin_area, farming, sheep, goats, indigenous_cattle) %>%
  mutate(pasto_livestock = sheep + goats + indigenous_cattle) %>% 
  mutate(ind_cattle_household = round(indigenous_cattle/farming)) %>%
  mutate(goats_household = round(goats/farming)) %>%
  mutate(sheep_household = round(sheep/farming)) %>%
  mutate(pasto_livestock_household = round(pasto_livestock/farming))

c) Filter data for the selected “disturbed and dangerous” counties

Code
# Select the data for the "dangerous and disturbed" counties
dan_dist <- c("TURKANA", "WEST POKOT", "ELGEYO MARAKWET", "BARINGO", "LAIKIPIA", "SAMBURU")

livestock_select_county <- livestock_select %>%
  filter(admin_area == "County") %>%
  filter(county %in% dan_dist)

# Select subcounty data for the "disturbed and dangerous" counties
livestock_select_subcounty <- livestock_select %>%
  filter(admin_area == "SubCounty") %>%
  filter(county %in% dan_dist)

# Create an area dataset for the "dangerous and disturbed" counties
df_land_area <- V1_T2.7
land_area <- df_land_area[2:396,]
land_area <- land_area %>%
  clean_names()
Code
# Create a function to remove the "/", " County" from label, and change the label to UPPERCASE

clean_county_names <- function(dataframe, column_name) {
  dataframe[[column_name]] <- toupper(gsub("/", " ", gsub(" County", "", dataframe[[column_name]])))
  return(dataframe)
}

land_area <- clean_county_names(land_area, 'county')
land_area <- clean_county_names(land_area, 'sub_county')

# The code above does the processes listed below:

# land_area$county <- gsub("/", " ", land_area$county)
# land_area$county <- gsub(" County", "", land_area$county)
# land_area$county <- toupper(land_area$county)
# land_area$sub_county <- toupper(land_area$sub_county)
Code
# Obtain the area data for "disturbed and dangerous" counties
land_area_county <- land_area %>%
  filter(admin_area == "County") %>%
  select(county, land_area_in_sq_km) %>%
  filter(county %in% dan_dist)

# Get the subcounty area data for "disturbed and dangerous" counties
land_area_subcounty <- land_area %>%
  filter(admin_area == "SubCounty") %>%
  select(county, sub_county, land_area_in_sq_km) %>%
  filter(county %in% dan_dist) %>%
  select(-county)

d) Create the final datasets to be used for analysis. Use inner_join() and creating new variables.

Code
# Create a county dataset with area and livestock numbers for the disturbed and dangerous regions

livestock_area_county <- inner_join(livestock_select_county, land_area_county, by = "county")

# New variables listed below include:
# ind_cattle_area is the number of indigenous cattle per area_in_sq_km
# goats_area is the number of goats per household per area_in_sq_km
# sheep_area is the number of sheep per area_in_sq_km
# pasto_livestock_area is the number of pastoral livestock per area_in_sq_km

livestock_area_county <- livestock_area_county %>%
  mutate(ind_cattle_area = round(indigenous_cattle/land_area_in_sq_km),
         sheep_area = round(sheep/land_area_in_sq_km),
         goats_area = round(goats/land_area_in_sq_km),
         pasto_livestock_area = round(pasto_livestock/land_area_in_sq_km))

# Create a subcounty dataset with area and livestock numbers
# for the disturbed and dangerous regions

livestock_area_subcounty <- inner_join(livestock_select_subcounty, land_area_subcounty, by = "sub_county")

# New variables listed below include:
# ind_cattle_area is the number of indigenous cattle per area_in_sq_km
# goats_area is the number of goats per household per area_in_sq_km
# sheep_area is the number of sheep per area_in_sq_km
# pasto_livestock_area is the number of pastoral livestock per area_in_sq_km

livestock_area_subcounty <- livestock_area_subcounty %>%
  mutate(ind_cattle_area = round(indigenous_cattle/land_area_in_sq_km),
         sheep_area = round(sheep/land_area_in_sq_km),
         goats_area = round(goats/land_area_in_sq_km),
         pasto_livestock_area = round(pasto_livestock/land_area_in_sq_km))

Section 4: Create a table with land area (sq. km) for the six counties

These six counties cover approximately one-fifth (1/5) of Kenya

Code
livestock_area_county %>%
  select(county, land_area_in_sq_km) %>%
  mutate(county = str_to_title(county)) %>%
  arrange(desc(land_area_in_sq_km)) %>%
  adorn_totals("row") %>%
  rename("County" = "county",
         "Land Area (sq. km)" = "land_area_in_sq_km") %>%
  kbl(align = "c") %>%
  kable_classic() %>% 
  row_spec(row = 0, font_size = 21, color = "white", background = "#000000") %>%
  row_spec(row = c(1:7), font_size = 15) %>%
  row_spec(row = 6, extra_css = "border-bottom: 1px solid;") %>%
  row_spec(row = 7, bold = T)
County Land Area (sq. km)
Turkana 68232.9
Samburu 21065.1
Baringo 10976.4
Laikipia 9532.2
West Pokot 9123.2
Elgeyo Marakwet 3032.0
Total 121961.8

Section 5: Perform an exploratory data analysis to gain key insights about the data

a) Number of Farming Households at the County Level

Code
lac_1 <- livestock_area_county %>%
  ggplot() + 
  geom_col(aes(x= reorder(county, farming), y = farming, fill = county)) +
  geom_text(aes(x = county, y = 0, label = county),
            hjust = 0, nudge_y = 0.25) +
  geom_text(aes(x = county, y = farming, label = farming),
            hjust = 1, nudge_y = 15000) +
  scale_fill_brewer(palette = "OrRd") +
  coord_flip() + 
  labs(x = "County",
       y = "Number of Farming Households",
       title = "",
       subtitle = "",
       caption = "") +
  theme_minimal() +
  scale_y_continuous(labels = comma) +
  theme(axis.title.x =element_blank(), 
        axis.title.y =element_blank(),
        axis.text.x = element_blank(),
        axis.text.y = element_blank(),
        plot.title = element_text(family="Helvetica", face="bold", size = 20),
        plot.subtitle = element_text(family="Helvetica", face="bold", size = 15),
        plot.caption = element_text(family = "Helvetica",size = 12, face = "bold"),
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        legend.title = element_blank(),
        legend.position = "none",
        plot.margin = unit(c(1, 1, 1, 1), "cm"))

# Create a patchwork plot with the map and the bar graph

lac_1 + p2

b) Number of Farming Households at the Subcounty Level

Code
livestock_area_subcounty %>%
  ggplot() + 
  geom_col(aes(x= reorder(sub_county, farming), y = farming, fill = county), width = 0.95) +   geom_text(aes(x = sub_county, y = farming, label = farming),
            hjust = 1, nudge_y = 2500, size = 4) +
  scale_fill_brewer(palette = "OrRd") +
  coord_flip() +
  guides(fill = guide_legend(nrow = 1)) +
  labs(x = "County",
       y = "Number of Farming Households",
       fill = "County",
       title = "",
       subtitle = "",
       caption = "") +
  theme_minimal() +
  scale_y_continuous(labels = comma, expand = expansion(mult = c(0, 0.01))) +
  theme(axis.title.x =element_blank(),
        axis.title.y =element_blank(),
        axis.text.x = element_text(size = 12),
        axis.text.y = element_text(size = 12),
        plot.title = element_text(family="Helvetica", face="bold", size = 20),
        plot.subtitle = element_text(family="Helvetica", face="bold", size = 15),
        plot.caption = element_text(family = "Helvetica",size = 12, face = "bold"),
        plot.margin = unit(c(1, 1, 1, 1), "cm"),
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        legend.title = element_text(size = 10),
        legend.text=element_text(size=8),
        legend.position = "top")

c) Number of pastoral livestock per county

Treemap

Code
ggplot(livestock_area_county, aes(area = pasto_livestock, fill = county, label = comma(pasto_livestock)
                              )) +
  geom_treemap() +
  geom_treemap_text(colour = "black",
                    place = "centre",
                    size = 24) +
  scale_fill_brewer(palette = "OrRd") +
  labs(x = "",
       y = "",
       title = "",
       caption = "") +
  theme_minimal() +
  theme(axis.title.x =element_text(size = 20),
        axis.title.y =element_text(size = 20),
        axis.text.x = element_text(size = 15),
        axis.text.y = element_text(size = 15),
        plot.title = element_text(family="Helvetica", face="bold", size = 28),
        plot.subtitle = element_text(family="Helvetica", face="bold", size = 15),
        plot.caption = element_text(family = "Helvetica",size = 12, face = "bold"),
        legend.title = element_blank(),
        legend.text=element_text(size=12),
        legend.position = "bottom") 

d) Number of pastoral livestock per subcounty

Code
livestock_area_subcounty %>%
  ggplot() + 
  geom_col(aes(x= reorder(sub_county, pasto_livestock), y = pasto_livestock, fill = county), width = 0.95) + 
  geom_text(aes(x = sub_county, y = pasto_livestock, label = pasto_livestock),
            hjust = 1, nudge_y = 125000) +
  scale_fill_brewer(palette = "OrRd") +
  coord_flip() + 
  guides(fill = guide_legend(nrow = 1)) +
  labs(x = "Subcounty",
       y = "Number of Pastoral Livestock",
       fill = "County",
       title = "",
       caption = "") +
  theme_minimal() +
  scale_y_continuous(labels = comma, expand = expansion(mult = c(0, 0.01))) +
  theme(axis.title.x =element_blank(),
        axis.title.y =element_blank(),
        axis.text.x = element_text(size = 12),
        axis.text.y = element_text(size = 12),
        plot.title = element_text(family="Helvetica", face="bold", size = 20),
        plot.subtitle = element_text(family="Helvetica", face="bold", size = 15),
        plot.caption = element_text(family = "Helvetica",size = 12, face = "bold"),
        plot.margin = unit(c(1, 1, 1, 1), "cm"),
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        legend.title = element_text(size = 10),
        legend.text=element_text(size=8),
        legend.position = "top")

e) Number of pastoral livestock per household at the county level

Code
lac_2 <- livestock_area_county %>%
  ggplot() + 
  geom_col(aes(x= reorder(county, pasto_livestock_household), y = pasto_livestock_household, fill = county)) + 
geom_text(aes(x = county, y = pasto_livestock_household, label = pasto_livestock_household), hjust = 1, nudge_y = 5) +
  scale_fill_brewer(palette = "OrRd") +
  coord_flip() + 
  labs(x = "County",
       y = "Number of Pastoral Livestock per household",
       title = "",
       caption = "") +
  theme_minimal() +
  scale_y_continuous(labels = comma, expand = expansion(mult = c(0, 0.01))) +
  theme(axis.title.x =element_blank(), 
        axis.title.y =element_blank(),
        axis.text.x = element_text(size = 12),
        axis.text.y = element_text(size = 12),
        plot.title = element_text(family="Helvetica", face="bold", size = 20),
        plot.subtitle = element_text(family="Helvetica", face="bold", size = 15),
        plot.caption = element_text(family = "Helvetica",size = 12, face = "bold"),
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        legend.title = element_blank(),
        legend.position = "none",
        plot.margin = unit(c(1, 1, 1, 1), "cm"))

# Create a patchwork plot with the map and the bar graph

lac_2 + p2

f) Number of pastoral livestock per household at the subcounty level

Code
livestock_area_subcounty %>%
  ggplot() + 
  geom_col(aes(x= reorder(sub_county, pasto_livestock_household), y = pasto_livestock_household, fill = county), width = 0.95) + 
  scale_fill_brewer(palette = "OrRd") +
  geom_text(aes(x = sub_county, y = pasto_livestock_household, label = pasto_livestock_household),
            hjust = 1, nudge_y = 5) +
  coord_flip() + 
  guides(fill = guide_legend(nrow = 1)) +
  labs(x = "Subcounty",
       y = "Number of Pastoral Livestock per household",
       fill = "County",
       title = "",
       caption = "") +
  theme_minimal() +
  scale_y_continuous(labels = comma, expand = expansion(mult = c(0, 0.01))) +
  theme(axis.title.x =element_blank(),
        axis.title.y =element_blank(),
        axis.text.x = element_text(size = 12),
        axis.text.y = element_text(size = 12),
        plot.title = element_text(family="Helvetica", face="bold", size = 20),
        plot.subtitle = element_text(family="Helvetica", face="bold", size = 15),
        plot.caption = element_text(family = "Helvetica",size = 12, face = "bold"),
        plot.margin = unit(c(1, 1, 1, 1), "cm"),
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        legend.title = element_text(size = 10),
        legend.text=element_text(size=8),
        legend.position = "top")

Section 5: Conclusion

In this post, I have assessed the pastoral livestock (indigenous cattle, sheep, and goats) populations in the Kenyan counties described as “disturbed and dangerous.” A major contributor to this classification is the banditry, livestock theft, and limited amounts of pasture and water available to livestock owners. To get a better sense of the number of households engaged in farming and the pastoral livestock populations in these counties, I performed an exploratory data analysis and visualized my results.

Key findings from the study were that: 1. Turkana and Samburu had some of the highest numbers of pastoral livestock, yet they had the fewest numbers of households engaged in farming. This meant that both these counties had the highest average livestock to farming household ratios in the region (Turkana: 55 per household and Samburu: 36 per household). 2. The top three (3) subcounties with the highest average ratios of pastoral livestock to farming households were in Turkana. They included Turkana East (126), Kibish (96), and Turkana West (56). Surprisingly, counties such as Keiyo North (4), Koibatek (4), and Nyahururu (4) had very low ratios of pastoral livestock to farming household ratios despite having relatively high numbers of farming households. This may have resulted from the unsuitability of the land for grazing animals, small average land sizes per farming household, a switch to exotic dairy and beef livestock, or simply, a preference for crop, rather than livestock farming.

In the next analysis, I will assess the livestock numbers per area (square kilometers), the numbers of indigenous cattle, sheep, and goats in every county and subcounty, as well as the other animal husbandry and/or crop farming activities that take place in the region. The reader is encouraged to use this code and data package (rKenyaCensus) to come up with their own analyses to share with the world.