The Global Water Access Gap
Introduction
This blog analyzes the impact of differential access to drinking water around the world. As water is critical to our survival, water access affects nearly all parts of human life – including life expectancy, socioeconomic status, health and nutrition, and much more. With endless issues to study surrounding water access, we focus our blog on a few key interests of ours, including how water access levels relate to educational outcomes and economic development levels, as well as how social media is used as a tool for clean water advocacy.
We begin this blog with an overview of how drinking water levels and deaths due to unsafe drinking water vary at the global scale. We then analyze the differences in water access across various levels of economic development, and across urban and rural regions. Next, we look at water access in schools, and its effect on primary and secondary school enrollment rates. Finally, in keeping with our shared passion for social justice and equity, we present an analysis of the most common sentiments and words used in tweets regarding clean water advocacy.
Data
Dataset 1: Drinking Water Access Worldwide by Households
Description: This dataset is from the World Health Organization (WHO), accessible through the JMP Global Database. The dataset contains household-level data on the water coverage and service levels throughout the world. The JMP Database allows users to filter by residence type (urban, rural, and total) and year (2000 to 2017), as well as look at the data by country or by SDG (Sustainable Development Goal) regions. We were able to download this data in the format of a .csv file.
The water service levels within this dataset are divided into five possible categories. In order of poorest to greatest levels of access, these levels include Surface Water, Unimproved Service, Limited Service, Basic Service, and Safely Managed Service.
Dataset 2: Deaths Caused by Unsafe Drinking Water
Description: This dataset is from the Global Health Data Exchange (GHDx), accessed through the Institute for Health Metrics and Evaluation at the University of Washington. The data includes the percentage and number of deaths in a country that are caused by unsafe drinking water from 1990 through 2019. We were able to directly download a .csv file for data on all countries for the year 2017 (selected to match the most recent year of available data in Dataset 1).
Dataset 3: Drinking Water Access Worldwide in Schools
Description: This dataset is from the same source as Dataset 1 (WHO) and was also accessed through the JMP Global Database. However, rather than household-level data, this dataset focuses on how water service levels vary in schools. The dataset allows users to view country-level data, as well as data by other relevant groupings such as the SDG (Sustainable Development Goal) regions, which we chose to focus on, given the importance of our topic in regards to the UN’s Sustainable Development Goals.
A key difference to note between this dataset and Dataset 1 is that while the household-level dataset breaks water access down into 5 levels, this dataset only includes 3, likely due to less frequent data collection at the school level. The 3 groupings (in order of poorest to greatest levels of water service in schools) are No Service, Limited Service, and Basic Service.
Dataset 4: Primary and Secondary School Enrollment Rates
Link: https://ourworldindata.org/primary-and-secondary-education#enrolment-in-primary-school
Description: This website presents various statistics on school attendance, completion, and enrollment around the world, using data from the World Bank. We utilized 2 datasets from this site on school enrollment rates by country, one with primary school data and one with secondary school data. The data were measured at a variety of years for each country, and we kept the most recent year of data collection for each country in our analysis.
It is important to note that these data are reported as gross enrollment ratios, meaning the proportion of individuals enrolled in primary or secondary school over the total age-eligible population for each level of schooling. Therefore, it is common for some countries to have a gross enrollment ratio above 100%, as over-aged or under-aged students at each schooling level will not be accounted for in the age-eligible population.
Dataset 5: Water Services and Economic Development
Description: We used the World Bank’s Databank tool to create a dataset that contains GDP per capita, water service level, water service type, and Gini coefficient of different countries across the world from 2000 to 2007. Since the distribution of GDP per capita is right skewed, we created a
loggdp
variable to better visualize the data.
Other Datasets:
For the text analysis of tweets, Masahiro created a unique dataset using the search_tweets() function through his Twitter developer account. The tweets addressed in the dataset are those generated in the period starting from April 27th and ending on May 7th this year. The tweets are selected if they include one of the following phrases: “water access,” or “Water access,” or “Water Access,” or “access to clean water,” or “access to drinking water.”
In addition to our main informational datasets, we also used the
maps
package for spatial visualizations and the United Nation’s SDG regional grouping dataset (https://unstats.un.org/sdgs/indicators/regional-groups) to complement the data in Dataset 5.
Worldwide Water Service Access and Deaths Caused by Unsafe Drinking Water (Alastair)
Background
How important is it to have access to safe drinking water?
In this map, we examine what percent of a country’s total deaths in the year 2017 are caused by unsafe drinking water. The map provides a broad visual overview of which countries have the highest proportion of deaths due to unsafe drinking water. From there, we are able to examine a particular country’s access to different water service levels, as well as note the total number of deaths by unsafe water.
Percentage of Deaths and Water Service Levels by Country (Map)
Limitations
Countries without data on water service access level and deaths caused by unsafe drinking water include: Taiwan, Argentina, Dominica, Palestine, Eritrea, Central African Republic, and Saint Kitts and Nevis.
The two countries not mapped are: Tokelau and Tuvalu. They are not included in leaflet’s world map, although there is data about their access to different water service levels as well as data on deaths related to unsafe drinking water for these two countries.
The year 2017 was the most recent year included in the water access dataset, so we are working under the assumption that those conditions are similar enough to the conditions in 2021 to draw meaningful analysis about the current global water access gap.
Conclusions
Countries in Central/Sub-Sarahan Africa and South/Southeast Asia appear to have the highest percentage of deaths caused by unsafe drinking water in 2017.
Although countries like Chad, Nigeria, and Madagascar all have some of the highest percentage of deaths, the number of deaths in India is most for any country, with over 500,000 deaths caused by unsafe drinking water in 2017.
Interestingly, even counties with 100% safely managed service may still have some deaths, such as New Zealand which had 14 deaths caused by unsafe drinking water in 2017.
There is a high correlation between countries that have few deaths and countries that have a high percentage of their population relying on at least basic service for drinking water. This means that most countries experiencing deaths due to unsafe drinking water have a significant portion of their population relying on either surface water, or an unimproved/limited service.
Water Service Access, Economic Development, and Inequality (Siyi)
Water service access and economic development
How do countries of different economic development levels differ in their access to water services and how has that changed over time?
Click here to view the interactive Shiny app.
Use of Data
This Shiny app focuses on two variables as indicators for economic development — GDP per capita ($) and Gini coefficient. Despite its limitations, GDP per capita, which is the economic output per person in a country, is generally regarded as an effective indicator of economic development levels. Gini coefficient measures the income inequality of a country, with a value of 0 representing perfect equality and 1 representing maximal inequality. Although they do not provide a completely comprehensive assessment of economic development, data on these two variables are comparatively commonly available. Through this Shiny app, we are interested in exploring possible associations between water service coverage, GDP per capita, and economic inequality.
Trends
- Countries with higher GDP per capita generally tend to have higher coverage for at least basic and safely managed water service in terms of both drinking water and sanitation from 2000 to 2017.
- Countries with high economic inequality (large Gini coefficient) tend to have medium-level water service coverage, while countries that have low income disparity tend to either have very high water service coverage and or lag behind in water service coverage.
- Europe and Northern America countries in general have higher GDP per capita, smaller gini coefficient, and better water service coverage than other countries.
- Central and Southern Asia and Sub-Saharan Africa countries tend to have lower GDP per capita and lag behind other countries in water service coverage.
- There is a general growth in water service coverage around the world from 2000 to 2017.
Water service access and urban-rural inequality
How does access to water services differ across urban and rural regions around the world and how has that changed over time?
Click here to view the interactive Shiny app.
Trends - World
- In terms of both drinking water and sanitation, there is an obvious difference in water service coverage across urban and rural areas; urban areas tend to have higher coverage of safely managed and basic services than rural areas.
- There is a growth in at least basic water service coverage for both urban and rural areas between 2000 to 2007, and the gap between them is gradually decreasing.
Trends - SDG Regions
- The gap between urban and rural areas in different SDG regions, generally, is decreasing from 2000 to 2017.
- Across different regions and time, urban areas tend to have a higher water service coverage.
- There are significant inequalities across regions - for instance, safely managed drinking water service coverage of rural Europe and North American is more than twice that of urban Sub-Saharan Africa in 2000.
- The scale of urban-rural disparity in water service coverage differs across service types, service levels, time, and regions. For instance, in 2000, the urban-rural difference in safely managed drinking water service coverage was larger in some regions, such as Central and Southern America and Sub-Saharan Africa, than others, such as Europe and Northern America. However, in the same year, Sub-Saharan Africa’s urban-rural difference in safely managed service in sanitation is much smaller than that of Europe and Northern America.
Limitations
- There is a lack of data on water service coverage, GDP per capita, and Gini coefficient by country. The visualization only includes countries that have data on all of these variables in a given year; therefore, it is hard to develop comprehensive analysis, as for some year, only a few countries in an entire region have sufficient data to be plotted.
- There is also a lack of data for regional water service coverage by residence type in general. Many regions do not have data for either urban or rural service coverage or both, and it is sometimes hard to tell what specific temporal changes look like.
- There is additionally an imbalance in the lack of data. Some regions, such as Europe and Northern America, have significantly more data available than other regions from the global South. Both overrepresentation and underrepresentation could lead to misleading trends.
Conclusions
- There is a global growth in water service coverage from 2000 to 2017.
- Countries of higher economic development levels tend to have better water service coverage, and countries with very large income disparity tend to be those that rank in the middle in water service coverage.
- There is a urban-rural gap in water service coverage worldwide, although this disparity is decreasing from 2000 to 2017. The scale of this inequality differs across regions, water service types, and time.
Water Access in Schools (Jamie)
Background
This section of our blog explores the intersection between water access and education. Much like textbooks and classrooms, water is an essential resource students need to succeed in school. A 2013 study on primary schools in Brazil even showed that access to piped water improved student test scores by 11% (Mejía, 2014)! While water fountains are found at nearly every corner of most US schools, water access in schools varies greatly at the global scale. For some background on water access in schools across the world, click here to view an interactive Shiny app, and read below for more details on the app.
The first tab in this Shiny app explores the differences in water service levels in schools among the 8 SDG (Sustainable Development Goal) regions in 2018. These differences can be viewed among all schools in a region, or by only primary or secondary schools. Basic service, the highest ranking of service level, refers to having a safe water supply available on school premises. Limited service refers to having a safe water supply nearby (up to a 30-minute walk round trip), but not directly on school premises. Lastly, no service refers to having only an unsafe water source (e.g. surface water or an unprotected spring).
The second tab in this app addresses the question as to how specifically basic water services in schools have changed over the last decade in SDG regions. This will be important context to keep in mind, as the following section will analyze a more specific question regarding the impact of basic water services in schools on school enrollment rates.
Impact on Enrollment
Now that we’ve analyzed how water service levels in schools vary across the world, this section takes that analysis a step further and incorporates school enrollment data. Specifically, I was interested in the following research question:
Does differential access to basic drinking water in schools (being the best service level a school can provide) appear to have an impact on primary and secondary school enrollment rates?
I answer this question in my second interactive Shiny app. As a reminder, basic water service refers to schools that have a safe water supply available on school premises.
Conclusions
Based on the Shiny app in the background section, secondary schools seem to provide slightly better water access than primary schools across all SDG regions, with the exception of Latin America/the Caribbean. In looking at the proportion of all schools in each region providing a basic water supply over the last decade, Northern/Western Africa, Sub-Saharan Africa, and Latin America/the Caribbean are the only regions that experience any notable fluctuations throughout this time period, although still rather minimal. Looking at the same data for only primary and secondary schools, it can be concluded that the fluctuations in these regions mainly occur through changes at the secondary school level, as primary school basic water coverage stayed relatively constant in these regions from 2011 to 2019.
Based on my second Shiny app, there appears to be no relationship between schools providing a basic water supply and a country’s school enrollment rates at the primary level, whereas there appears to be a positive relationship at the secondary level. This may be suggestive of a relationship between secondary schools providing a basic water service and students enrolling in school, which I hypothesize would have a particularly strong impact in developing countries, as water access on school premises could be an important incentive in encouraging students to continue their schooling.
Limitations
The main limitation in my analysis was missing data. Since data on water access in schools is collected much less frequently than household-level data, there were some holes in my data analysis that prevent me from drawing conclusions in certain regions. For example, in my first Shiny app, there was insufficient data for Eastern/South-Eastern Asia in terms of water service levels in all schools.
Moreover, the World Bank does not collect country’s primary and secondary school enrollment rates for the same years in all countries. As a result, countries had a variety of different years as their most recent year of data collection, and this may introduce some challenges in terms of the validity of my analysis on the relationship between basic water access in schools and school enrollment rates.
Advocacies about Water Access around the World (Masahiro)
Introduction
In this tab, we take a look at the tweets advocating for greater access to water around the world in order to discover some interesting trends among such an advocacy. Specifically, in order to gather up the tweets, search_tweets() function was run on May 4th and May 7th, and the tweets generated roughly from April 27th to May 7th were recorded in the same dataset. The included tweets all include at least one of the following phrases: “water access,” or “Water access,” or “Water Access,” or “access to clean water,” or “access to drinking water.” For more details, take a look at the “Wrangling - Masahiro” file in the same repo. Through exploring the following three questions with data, we aim to learn about what kind of rhetoric people are employing in an attempt to claim for more access to water around the world, and extend our knowledge about people’s actions for greater social justice in the realm of water access on the earth.
- What are the common words used in the tweets requesting more access to clean water around the world?
- What are the common sentiments of the words observed in those tweets?
- What do those common words and sentiments imply about people’s rhetorics arguing for clean water in some of the regions lacking water access?
In addition to the removal of so-called stop words from the dataset, we also omitted the word “access” because it is obvious that all the tweets should include that word from the way we collected data. Doing so helps us produce more meaningful word clouds and sentiment analyses.
Word Cloud
First, we examine the word cloud addressing all the words except the ones displaced through data wrangling in order to get a sense about what are some of the most common words utilized in the focal tweets.
In the above word cloud, “https” stands out in its size, which implies that a lot of tweets related to water access advocacy refer to or cite other web resources. Also, “clean” is displayed largely in the visualization, which should be partly because “access to clean water” is one of the phrases we actively searched for when scraping tweets. However, given that we also looked for “access to drinking water” when gathering row text while the word “drinking” does not have equally big size in the display as “clean,” it seems like that the word “clean” possesses a particularly great importance for arguments for greater water access across the earth. Paying attention to other words displayed with smaller sizes, it can be seen that the cloud includes a lot of words related to potential use of water or implication of access to water: “sanitation,” “healthcare,” “health,” “food,” and “hygiene.” Besides, one of the interesting words to be observed in the cloud is “india,” whose presence may be attributable to the socioeconomic standing of India as a country or the nation’s especially large population. Finally, we also found it intriguing that “covid” occupied its place in the above visialuzation because it suggested that tweets about water access were often associated with this pandemic, although there does not seem to be a lot of explicit or obvious connections between the infectious disease and water access.
Sentiment Analysis
Next, we dive into the sentiments reflected in the usage of English by those advocating for water access on twitter. We use the NRC lexicon for attaching sentimental implications for words observed in tweets, and visualize the common sentiment in the tweets with the following graph.
As can be seen, positive, trust, and joy are the most popular sentiments among the words included in the tweets. Negative follows those top three sentiments, and then, the least popular sentiments such as anger, anticipation, fear, and sadness occupy the subsequent places. With this bar chart, we verify that a lot of words employed in the analyzed tweets have some positive connotations, which not only refers to “positive” as a sentiment but also “trust” and “joy.” In order to learn more about the use of words detected as implying these sentiments, we have decided to utilize the comparison cloud (see the next tab).
Comparison Cloud
The below comparison cloud displays what words with implications of “positive,” “trust,” or “joy” are commonly used in the text scraped from twitter. Before diving into the detailed observations about the visualization itself, we lay out how the code and display below works. A comparison cloud enables users to accomplish two goals simultaneously: comparing the relative frequency of the use of certain words and classifying the most commonly used words into several categories based upon certain criteria. In order to craft a comparison cloud, however, it is necessary to transform the data into the form of matrix, whose column corresponds to certain categories (in this case, the sentiment), whose row refers to each word by its name, and whose values correspond to the frequency of the appearance of words in each category. In order to craft such a matrix, a lot of wrangling has been conducted to create a dataset whose row corresponds to words and column to each sentiment. If interested, analyze the commented code below.
# preliminary wranglings below
# first extract words with the connotations of interest
# tweet_sentiment = dataset used for sentiment analysis
pure_words <- tweets_sentiment %>%
filter(sentiment == "positive" | sentiment == "trust" |
sentiment == "joy") %>%
# then collapse the rows so that each word only occupies a single row
group_by(word) %>%
summarize()
# now prepare the dataset to be joined with the dataset about the count of
# each word with the three focal sentiments
pure_words_copied <- pure_words %>%
# let each word occupy three rows at the same time
slice(rep(1:n(), each = 3)) %>%
mutate(number = row_number()) %>%
# list up all the sentiments of interest
mutate(sentiment = case_when(number %% 3 == 1 ~ "positive",
number %% 3 == 2 ~ "trust",
number %% 3 == 0 ~ "joy")) %>%
select(word, sentiment)
# the below dataset is about the count of each word with the three connotations
# of interest
comparison_words_prep <- tweets_sentiment %>%
# extract those with the three sentiments of innterest
filter(sentiment == "positive" | sentiment == "trust" |
sentiment == "joy") %>%
# and count the frequency
group_by(word, sentiment) %>%
summarize(N = n())
comparison_words_prep_2 <- pure_words_copied %>%
# join the dataset with the data about the count (used for the bar)
left_join(comparison_words_prep, by = c("word", "sentiment")) %>%
# if some words do not imply certain sentiments, it will be reflected as
# N/A values, so turn it into 0
mutate(count = case_when(is.na(N) ~ 0,
TRUE ~ as.numeric(N))) %>%
select(word, sentiment, count)
# one last step to make each column refer to each sentiment
comparison_words_prep_3 <- comparison_words_prep_2 %>%
spread(key = sentiment, value = count)
# the below code translates the data frame into a matrix, and each row name of
# the matrix should correspond to the word
comparison_words <- comparison_words_prep_3 %>%
select(-word) %>%
as.matrix()
rownames(comparison_words) <- comparison_words_prep_3$word
# create the comparison cloud
colors1 <- c("#48F11F", "#1226D2", "#CB0A3E")
colors2 <- c("#CCFF99", "#7F88EF", "#EF7FCA")
comparison.cloud(comparison_words, max.words = 100,
random.order = FALSE,
colors = colors1,
title.colors = colors1,
title.bg.colors = colors2)
As was the case in the first word cloud analysis, in this comparison cloud, too, “clean” stands out for its frequency of use as demonstrated by its large size in the cloud. However, as a category, words classified as implying positiveness have more presence in the analyzed tweets as shown by the previous tab of bar chart, which means that the frequency of the use of “clean” is not so big that it can dominate the text analysis conducted here. Taking a closer look at the visualization above, we have noticed that the above display includes a lot of words related to potential outcomes caused by the greater access to water around the world: “food,” “healthy,” “save,” “green,” “income,” “medical,” “safe,” “luxury,” and “survive.” This finding somewhat resonates with the insights gained in the original word cloud because both of the visualizations exhibit a lot of words associated with various promising possibilities greater access to water can bring to the world. Also, the above comparison cloud has let us notice that the tweets of interest contain a number of words related to the process of ensuring water access to underprivileged people: “advocate,” “guarantee,” “partnership,” “supporting,” “improving,” “conservation,” and “providing.” This suggests that the description of the necessary steps to secure water access around the world has made the tweets advocating for water access include a lot of words related to positive connotations, such as positiveness, trust, or joy.
Discussion
Throughout the exploration of a general word cloud, a bar chart, and a comparison cloud, this research has revealed that the tweets requesting greater access to water across the world incorporated a lot of words which connoted positiveness, joy, and trust, and that they specifically include a lot of words related to the potential outcomes of access to water, such as “sanitation” or “food.” We believe that this may plausibly be attributable to the fact that a lot of tweets of interest here describe and discuss how securing water access can improve the life of people in developing country or what such water access enables. This explanation sounds convincing to some extent given that the comparison cloud has actually shown many words which can be associated with the process of improving water access, such as “partnership” or “donation.”
In other words, this study has revealed that the tweets arguing for water access around the world do not engage with negative words, such as death or disease, as much as they do with words with positive sentiments: “positive,” “joy,” and “trust.” This implies that the tweets for advocacy for water access may talk more about how greater water access can resolve problems in the world by, for example, improving the sanitation, food access, and safety in some areas, rather than about how lack of water causes diseases, deaths, conflicts, or other sufferings. We find this speculation fairly plausible given all the results above, and also we find it intriguing that people describe more of the positive aspects of securing clean water around the world and less of the negative consequences caused by lack of water in discussing water access on twitter.
However, we also acknowledge that these findings generated with word clouds do have limitations. The word cloud, bar chart, and comparison cloud here are all generated after cutting the tweets into words. In other words, we are not really analyzing the sentences, which is to say that we are not strictly distinguishing between the two following phrases: “today’s effort for greater water access can improve sanitation around the world,” and “today’s effort for greater water access does not improve sanitation around the world.” The two phrases include almost identical set of words, and moreover, since the negative connotation of the latter text is almost entirely due to the word “not,” which would have been removed as a stop word at the beginning of the data wrangling, our data analysis is not capable of distinguishing the sentiments between the two above phrases. Our findings indeed raise some common words among the tweets of interest, point to positive, joy, and trust as common sentiments, and reach potential explanations about people’s rhetoric which also resonate with what visualizations exhibit here, but we also believe that we definitely need to exploit more techniques of text analysis to generate more accurate and meaningful findings. Thus, future research may not only analyze these tweets as a set of words but also see them as a collection of bigrams or larger unit of English words in order to build upon and expand the discovery here.
Conclusion
Countries in Central/Sub-Sarahan Africa and South/Southeast Asia appear to have the highest percentage of deaths caused by unsafe drinking water in 2017.
There is a high correlation between countries that have few deaths due to unsafe drinking water and countries that have a high percentage of their population relying on at least basic drinking water services. By contrast, the countries experiencing the most deaths due to unsafe drinking water have a significant proportion of their population relying on either surface water, or an unimproved/limited service.
There has been global growth in water service coverage from 2000 to 2017.
Countries of higher economic development levels tend to have better water service coverage, and countries with large income disparities tend to be ranked in the middle in terms of water service coverage.
There is an urban-rural gap in water service coverage worldwide, although this gap has decreased from 2000 to 2017. The scale of this inequality differs across regions, water service types, and time.
Sub-Saharan Africa and Central/Southern Asia, which had the most deaths due to unsafe drinking water, unsurprisingly have the poorest water coverage in their school systems as well. Oceania also has very limited water coverage in schools, whereas Australia/New Zealand and Europe/Northern America have the highest water coverage across all school types. Water access also appears to be slighter better in secondary schools versus primary schools across all regions, with the exception of Latin America/the Caribbean.
When isolating the school data to just basic water coverage, being the best coverage a school can provide, there has been very little fluctuation in terms of the proportion of schools providing a basic drinking water supply in SDG regions over the last decade.
There also appears to be a positive relationship between the proportion of schools providing a basic drinking water supply and school enrollment rates for secondary schools. However, this trend does not hold true for primary schools, likely due to their slightly lower water coverage flattening out the trend line.
Finally, in terms of water advocacy through social media (specifically Twitter), there is a trend to talk about what humans can enjoy with greater water access and how humans can realize that situation using positive language, rather than focus on the detrimental impacts of poor water access. As a result, positive, joy, and trust are the most common sentiments amongst the tweets advocating for better water access.
Bibliography
Arango, L. (2020, March). Action Against Hunger, Philippines [Photograph]. Action Against Hunger. https://actionagainsthunger.ca/world-water-day-access-to-clean-water-is-more-important-than-ever/
DataBank. The World Bank Group. 2021. https://databank.worldbank.org/home.aspx
Global Health Data Exchange. Institute for Health Metrics and Evaluation at the University of Washington. 2019. http://ghdx.healthdata.org/gbd-results-tool
Mejía, Francisco. “How important is clean water for education?” Impacto, 20 Feb. 2014, blogs.iadb.org/efectividad-desarrollo/en/how-important-is-clean-water-for-education/.
Roser, M. and Ortiz-Ospina, E. (2013) “Primary and Secondary Education”. Our World In Data. https://ourworldindata.org/primary-and-secondary-education
SDG Indicators. United Nations. 2021. https://unstats.un.org/sdgs/indicators/regional-groups
Water Supply, Sanitation and Hygiene (WASH) Household Data. WHO/UNICEF Joint Monitoring Programme (JMP). 2017. https://washdata.org/data/household#!/
Water Supply, Sanitation and Hygiene (WASH) School Data. WHO/UNICEF Joint Monitoring Programme (JMP). 2019. https://washdata.org/data/school#!/