[PYTHON] Creating a cholera map for John Snow

Introduction

The new coronavirus map is uploaded on each company's web page. In that connection, I was interested in such things as the geographical spread of infectious diseases and epidemiological maps. Meanwhile, there was something interesting in the datacamp project, so I would like to try it. I would like to post it as a memorandum. In addition, it is translated into Japanese by DeepL.

Overview

In 1854, Dr. John Snow performed a computer-based spatial analysis by mapping the pattern and status of cholera outbreaks in Soho, London. He mapped the dead in the neighborhood and decided that the majority occurred around a particular well and that the dead were using that well. Not only was this one of the earliest uses of data visualization, but by solving this problem he established spatial analysis and modern epidemiology.

This Python project reanalyzes the data and recreates John Snow's famous map. This project is designed to test the knowledge of pandas and Bokeh that you can learn in "pandas Foundations" and "Interactive Data Visualization with Bokeh".

1. Dr. John Snow

Dr. John Snow (1813-1858) is a well-known British doctor, widely known as a legend in public health history and a pioneer in the development of anesthesia. Some even say one of the great doctors of all time.

As a leading advocate of both anesthesia and hygienic medicine, he not only conducted experiments with ether and chloroform, but also designed masks and their administration methods. During the birth of Queen Victoria's eighth and ninth children in 1853 and 1857, his own administration of chloroform made the use of anesthetics during childbirth generally accepted.

But, as I'll show you later, not all of his life was successful. John Snow is recognized as one of the founders of modern epidemiology by a scientific and fairly modern data approach that identified the source of cholera in Soho, London in 1854 (data visualization, space). Some consider it the founder of analysis, data science in general, and many other related disciplines), but it wasn't always like this. In fact, for a long time he was simply ignored by the scientific community and is now very often demythologized.

This note not only rediscovers his "data story", but also re-analyzes the data he collected in 1854 and recreates his famous map (also known as the ghost map). ..

1.py


# Loading in the pandas module
# ... YOUR CODE FOR TASK 1 ...
import pandas as pd

# Reading in the data
deaths = pd.read_csv("datasets/deaths.csv")

# Print out the shape of the dataset
# ... YOUR CODE FOR TASK 1 ...
print(deaths.shape)

# Printing out the first 5 rows
# ... YOUR CODE FOR TASK 1 ...
print(deaths.head(5))

The result is as follows

(489, 3)
   Death  X coordinate  Y coordinate
0      1     51.513418     -0.137930
1      1     51.513418     -0.137930
2      1     51.513418     -0.137930
3      1     51.513361     -0.137883
4      1     51.513361     -0.137883

2. Cholera invades!

Before John Snow discovered cholera, cholera was a regular on the overcrowded and unsanitary streets of London. At the time of the third cholera outbreak, cholera was one of the most studied themes (more than 700 studies and essays were published in London alone between 1839 and 1856), and almost all authors , I thought that the cause of cholera outbreak was infectious disease and "bad air".

John Snow's pioneering work on anesthesia and gas has led him to suspect a myasma model of the disease. Originally, he formulated and published the theory that cholera is spread by water and food in an essay entitled "About the Transmission Mode of Cholera" (before the 1849 pandemic). This paper received negative reviews in The Lancet and London Medical Journals.

We know he was right, but Dr. Snow's dilemma was how to prove it. His first step was to check the data Our dataset has 489 rows of data in 3 columns, but to make it easier to work with the dataset, we First we will make some changes.

2.py


# Summarizing the content of deaths
# ... YOUR CODE FOR TASK 2 ...
deaths.info()

# Define the new names of your columns
newcols = {
    'Death': 'death_count',
    'X coordinate': 'x_latitude', 
    'Y coordinate': 'y_longitude' 
    }

# Rename your columns
# ... YOUR CODE FOR TASK 2 ...
deaths.rename(newcols)

# Describe the dataset 
# ... YOUR CODE FOR TASK 2 ...
deaths.describe()

The result is as follows

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 489 entries, 0 to 488
Data columns (total 3 columns):
Death           489 non-null int64
X coordinate    489 non-null float64
Y coordinate    489 non-null float64
dtypes: float64(2), int64(1)
memory usage: 11.5 KB
Death	X coordinate	Y coordinate
count	489.0	489.000000	489.000000
mean	1.0	51.513398	-0.136403
std	0.0	0.000705	0.001503
min	1.0	51.511856	-0.140074
25%	1.0	51.512964	-0.137562
50%	1.0	51.513359	-0.136226
75%	1.0	51.513875	-0.135344
max	1.0	51.515834	-0.132933

3. Even though I don't know anything, John Snow!

His work was largely ignored, as it was somehow unthinkable that one man could deny the Miasma theory and prove that everyone else was wrong. It was. Colleague medical scientists said, "John Snow, even though I don't know anything!"

As already mentioned, John Snow's first attempt ended with a negative review of the "miasma" theory. However, one critic made a useful suggestion as to what evidence would be compelling. The outbreak of cholera in Soho, London in 1854 not only gave Snow a life-saving opportunity, but also an opportunity to further test and improve his theory. But what about the final proof that he was right?

Now that we know how John Snow did it, let's get the data right.

3.py


# Create `locations` by subsetting only Latitude and Longitude from the dataset 
locations = deaths[["x_latitude","y_longitude"]]

# Create `deaths_list` by transforming the DataFrame to list of lists 
deaths_list = locations.values.tolist()

# Check the length of the list
# ... YOUR CODE FOR TASK 3 ...
len(deaths_list)

The result is as follows

489

4. Ghost map

His original map is unfortunately not available (may not exist). However, you can see the famous map he drew in 1855, about a year later. image.png This map is also called a ghost map because it is a visualization of death. We also have data on how John Snow drew it, so let's use modern technology to recreate his map.

4.py


# Plot the data on map (map location is provided) using folium and for loop for plotting all the points
import folium

map = folium.Map(location=[51.5132119,-0.13666], tiles='Stamen Toner', zoom_start=17)
for point in range(0, len(deaths)):
    folium.CircleMarker(deaths_list[point], radius=8, color='red', fill=True, fill_color='red', opacity = 0.4).add_to(map)
map

The result will be displayed on the map as shown below. image.png folium is a module that enables leaflet to be used in python.

5. It's a pump!

What John Snow saw after marking the dead on the map was not a random pattern (also seen in a reproduction of the ghost map). Most of the fatalities were concentrated on the corners of Broad Street (now Broadwick Street) and Cambridge Street (now Lexington Street). A group of fatalities around the intersections of these streets was the epicenter of the outbreak, but what happened there? Yes, it's a water pump. John Snow at the time already had the theory that cholera spreads through water, so to test it, he marked the location of a nearby water pump on a map. And this was the big picture. By combining the location of cholera-related deaths with the location of water pumps, Snow was able to show that the majority were concentrated around a particular public water pump on Broad Street in Soho. .. Finally, he got the evidence he needed. We now do the same and add pump locations to our recreation on ghost maps.

5.py


# Import the data
pumps = pd.read_csv("datasets/pumps.csv")
# Subset the DataFrame and select just ['X coordinate', 'Y coordinate'] columns
locations_pumps = pumps[["X coordinate","Y coordinate"]]

# Transform the DataFrame to list of lists in form of ['X coordinate', 'Y coordinate'] pairs
pumps_list = locations_pumps.values.tolist()

# Create a for loop and plot the data using folium (use previous map + add another layer)
map1 = map
for point in range(0, len(pumps_list)):
    folium.Marker(pumps_list[point], popup=pumps['Pump Name'][point]).add_to(map1)
map1

image.png

6. Even though I don't know anything, John Snow! ≪ (again)

So John Snow finally got the evidence that there was a link between death as a result of the cholera epidemic and perhaps a contaminated public water pump. But he didn't stop there, he investigated further. He was looking for anomalies (what we now call "outliers in the data") and found two places where there were no actual deaths. The first was a brewery right next to Broad Street, where I found out that they were mostly drinking beer (in other words, pumps were the source, not water from local pumps). It supports his theory that there is). The second building that did not die was a workshop near Poland Street, where he learned that their water source was not a pump on Broad Street (which reaffirmed his theory). The location of both buildings is also shown on the map on the left. The officials did not trust him and his theory because he was convinced, but they removed the handle of the pump the next day, September 8, 1854. In his famous book, Jon Snow later collected and published chronological data on deaths before and after the peak of the epidemic, but here we analyze and compare the effects of unhandled.

image.png

6.py


# Import the data the right way
dates = pd.read_csv("datasets/dates.csv",parse_dates=['date'])
print(dates.head())
# Set the Date when handle was removed (8th of September 1854)
handle_removed = pd.to_datetime('1854/9/8')

# Create new column `day_name` in `dates` DataFrame with names of the day 
dates['day_name'] = dates.date.dt.weekday_name

# Create new column `handle` in `dates` DataFrame based on a Date the handle was removed 
dates['handle'] = dates.date > handle_removed

# Check the dataset and datatypes
dates.info()

# Create a comparison of how many cholera deaths and attacks there were before and after the handle was removed
dates.groupby(['handle']).sum()

The result is as follows

   order       date  attacks  deaths
0      1 1854-08-19        1       1
1      2 1854-08-20        1       0
2      3 1854-08-21        1       2
3      4 1854-08-22        0       0
4      5 1854-08-23        1       0
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 43 entries, 0 to 42
Data columns (total 6 columns):
order       43 non-null int64
date        43 non-null datetime64[ns]
attacks     43 non-null int64
deaths      43 non-null int64
day_name    43 non-null object
handle      43 non-null bool
dtypes: bool(1), datetime64[ns](1),int64(3),object(1)
memory usage: 1.8+ KB
order	attacks	deaths
handle			
False	231	528	500
True	715	43	116

7. A picture worth a thousand words

When I removed the handle from the pump, no more infected water was collected. Later, it was discovered that the spring water under the pump was contaminated with sewage. This practice was later evaluated as an early example of the application of epidemiology, public health medicine, and science (pathogen theory) to real-life crises. In 1992, a replica of the pump was erected, along with a description and commemorative plate, near the original location without a handle, near the wall behind the current John Snow Pub. This place is subtly marked by a pink granite curb in front of a small wall plate.

You can learn a lot from John Snow's data. You can look at the absolute numbers, but this observation can lead to wrong conclusions, so let's use Bokeh to look at other data.

Thanks to John Snow, I was able to see the data in chronological order.

python:7.py
import bokeh
from bokeh.plotting import output_notebook, figure, show
output_notebook(bokeh.resources.INLINE)

# Set up figure
p = figure(plot_width=900, plot_height=450, x_axis_type='datetime', tools='lasso_select, box_zoom, save, reset, wheel_zoom',
          toolbar_location='above', x_axis_label='Date', y_axis_label='Number of Deaths/Attacks', 
          title='Number of Cholera Deaths/Attacks before and after 8th of September 1854 (removing the pump handle)')

# Plot on figure
p.line(dates['date'], dates['deaths'], color='red', alpha=1, line_width=3, legend='Cholera Deaths')
p.circle(dates['date'], dates['deaths'], color='black', nonselection_fill_alpha=0.2, nonselection_fill_color='grey')
p.line(dates['date'], dates['attacks'], color='black', alpha=1, line_width=2, legend='Cholera Attacks')

show(p)

The result is as follows. image.png

8. John Snow Myth & Did We Learn Something?

The previous interactive visualization clearly shows that the peak of cholera outbreaks occurred before the handle was removed and had already diminished (downward orbit) before September 8, 1854.

That's simply not true (because if you compare only the absolute numbers, you could draw the false conclusion that removing the handle of the Broad Street pump stopped the outbreak of cholera ( It certainly helped, but I couldn't stop the outbreak).

But people love stories about heroes and other myths (definitely more than science or data science). According to John Snow's myth, he was a superhero who ignored their equality in two days by hypothesizing that cholera was a water-borne disease. Despite no one listening to him, he bravely continued to map him and, with his findings, persuaded local authorities to remove the handle of the infected water pump and caused the outbreak. stopped. John Snow saved the lives of many Londoners.

A closer look at what's behind this story reveals that with limited tools, he was trying to fight the disease and get proof that he was right and "knows something" about cholera. You can also see John Snow. He only did what he could in a limited amount of time and always boiled water before drinking. image.png

in conclusion

I created a cholera map on python by John Snow, a British doctor and a leader in the development of anesthesia and medical hygiene. Cholera is now known to be water-borne, but it was still unknown at the time. Even under such circumstances, John Snow, who constructed and demonstrated a hypothesis, is amazing.

Recommended Posts

Creating a cholera map for John Snow
Step by Step for creating a Dockerfile
Commands for creating a new django project
Memo for creating a text formatting tool
Creating a development environment for machine learning
A tool for creating symbolic links on Windows
Procedure for creating a LineBot made with Python
Commands for creating a python3 environment with virtualenv
Procedure for creating a Python quarantine environment (venv environment)
A memo for creating a python environment by a beginner
[Python] 2 Create a risk-return map for your asset portfolio
Procedure for creating a Line Bot on AWS Lambda
[Day 9] Creating a model
Creating a Home screen
4. Creating a structured program
Creating a scraping tool
Creating a dataset loader
The story of creating a VIP channel for in-house chatwork
Dockerfile for creating a data science environment based on pip3
(For beginners) Try creating a simple web API with Django