[PYTHON] Plot the environmental concentration of organofluorine compounds on a map using open data

Let's visualize the concentration of organic fluorine compounds by utilizing the data recorded in ChemTHEATER

What is an organic fluorine compound?

It has both hydrophilic and lipophilic properties, and is used in a wide range of applications such as water repellents, digestive agents, oil agents, and etching agents.

For an easy-to-understand explanation of organofluorine compounds, see NHK's Close-up Gendai HP. https://www.nhk.or.jp/gendai/articles/4280/index.html

Due to its harmfulness, the manufacture, use, import and export of perfluorooctanoic acid (PFOA) among organofluorine compounds will be banned internationally from 2019, and perfluorooctanesulfonate (PFOS) will be banned from 2009. Manufacturing, use and import / export are restricted.

Standard values for organic fluorine compounds, etc.

In recent years, both substances have become problems such as being detected in well water and flowing out into rivers. In response to this, the Ministry of Health, Labor and Welfare added PFOS and PFOA to the water quality management target setting items (enforced on April 1, 2nd year of Reiwa). The target value of the tap water quality standard is 50 ng / L, which is the total value of PFOS and PFOA. https://www.mhlw.go.jp/stf/seisakunitsuite/bunya/topics/bukyoku/kenkou/suido/kijun/index.html

In addition, the Ministry of the Environment conducted a nationwide abundance survey of PFOS and PFOA in the water environment, and conducted "Enforcement of Environmental Standards for the Protection of Human Health Related to Water Pollution (Notice)" (May 28, 2nd year of Ordinance). (Appendix), we set 50 ng / L as the total value of PFOS and PFOA as a provisional target value for the water environment. Results of PFOA and PFOA national abundance survey in the first year of Reiwa https://www.env.go.jp/press/108091.html

By the way, in the survey results for water purification plants released by the Ministry of Health, Labor and Welfare in June 2nd year of Reiwa, there was no place where the provisional target value of tap water, PFOS + PFOA 50 ng / L, was exceeded. https://www.mhlw.go.jp/content/10900000/000638290.pdf

Obtain environmental concentration data of organofluorine compounds from ChemTHEATER

ChemTHEATER publishes monitoring data of chemical substance concentrations in the environment, and also includes data on organofluorine compounds. Let's actually see at what concentration PFOS and PFOA were detected, and how high or low the total value is compared to the reference value.

PFCs_Search.png

  1. Select "Sample Search" from the menu bar of ChemTHEATER.
  2. Select "Abiotic --Water" from "Sample Type".
  3. Select "Perfluoroalkyl and polyfluoroalkyl substances" from the "Chemicals Group".
  4. Select "Asia" from "Collection area" and set the collection country to "Japan".
  5. Click the "Search" button to output a list of samples that meet the conditions.
  6. "Export samples" outputs the sample information, and "Export measured data" outputs the measured values of the target chemical substance as a tab-delimited text file.

Save the exported file in any directory and use it for analysis.

Format the exported data

Load the required library.

import pandas as pd

Read information on measured values of chemical substances.

data_file = "measureddata_20200521044415.tsv"
data = pd.read_csv(data_file, delimiter="\t")
data
MeasuredID ProjectID SampleID ScientificName ChemicalID ChemicalName ExperimentID MeasuredValue AlternativeData Unit Remarks RegisterDate UpdateDate
0 81245 PRA000095 SAA005816 Water CH0000362 PFBS EXA000001 0.00100 <1.00E-3 μg/L NaN 2019/7/26 2019/7/26
1 81246 PRA000095 SAA005817 Water CH0000362 PFBS EXA000001 0.00100 <1.00E-3 μg/L NaN 2019/7/26 2019/7/26
2 81247 PRA000095 SAA005818 Water CH0000362 PFBS EXA000001 0.00100 NaN μg/L NaN 2019/7/26 2019/7/26
3 81248 PRA000095 SAA005819 Water CH0000362 PFBS EXA000001 0.00100 <1.00E-3 μg/L NaN 2019/7/26 2019/7/26
4 81249 PRA000095 SAA005820 Water CH0000362 PFBS EXA000001 0.00100 <1.00E-3 μg/L NaN 2019/7/26 2019/7/26
... ... ... ... ... ... ... ... ... ... ... ... ... ...
3087 48619 PRA000060 SAA003645 Water CH0000793 THPFOS EXA000001 0.00030 NaN μg/L NaN 2018/2/9 2018/6/8
3088 48620 PRA000060 SAA003646 Water CH0000793 THPFOS EXA000001 0.00008 NaN μg/L NaN 2018/2/9 2018/6/8
3089 48621 PRA000060 SAA003647 Water CH0000793 THPFOS EXA000001 0.00159 NaN μg/L NaN 2018/2/9 2018/6/8
3090 48622 PRA000060 SAA003648 Water CH0000793 THPFOS EXA000001 0.00188 NaN μg/L NaN 2018/2/9 2018/6/8
3091 48623 PRA000060 SAA003649 Water CH0000793 THPFOS EXA000001 0.00070 NaN μg/L NaN 2018/2/9 2018/6/8

3092 rows × 13 columns

Then, the sample information is read.

sample_file = "samples_20200521044410.tsv"
sample = pd.read_csv(sample_file, delimiter="\t")
sample
ProjectID SampleID SampleType TaxonomyID UniqCodeType UniqCode SampleName ScientificName CommonName CollectionYear ... FlowRate MeanPM10 MeanTotalSuspendedParticles HumidityStartEnd WindDirectionStartEnd WindSpeedMSStartEnd AmountOfCollectedAirStartEnd Remarks RegisterDate UpdateDate
0 PRA000048 SAA002867 ST014 NaN NaN NaN SW-St.1 Water Surface water 2004 ... NaN NaN NaN NaN NaN NaN NaN NaN 2017/10/25 2019/7/18
1 PRA000048 SAA002868 ST014 NaN NaN NaN SW-St.3 Water Surface water 2004 ... NaN NaN NaN NaN NaN NaN NaN NaN 2017/10/25 2019/7/18
2 PRA000048 SAA002869 ST014 NaN NaN NaN SW-St.4 Water Surface water 2004 ... NaN NaN NaN NaN NaN NaN NaN NaN 2017/10/25 2019/7/18
3 PRA000048 SAA002870 ST014 NaN NaN NaN SW-St.5 Water Surface water 2004 ... NaN NaN NaN NaN NaN NaN NaN NaN 2017/10/25 2019/7/18
4 PRA000048 SAA002871 ST014 NaN NaN NaN SW-St.7 Water Surface water 2004 ... NaN NaN NaN NaN NaN NaN NaN NaN 2017/10/25 2019/7/18
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
271 PRA000095 SAA005846 ST015 NaN NaN NaN W_1xyz24_20100821 Water River water 2010 ... NaN NaN NaN NaN NaN NaN NaN Around Kushiro Airport 2019/7/26 2019/7/26
272 PRA000095 SAA005847 ST015 NaN NaN NaN W_1xyz25_20100821 Water River water 2010 ... NaN NaN NaN NaN NaN NaN NaN Around Kushiro Airport 2019/7/26 2019/7/26
273 PRA000095 SAA005848 ST015 NaN NaN NaN W_1xyz26_20100821 Water River water 2010 ... NaN NaN NaN NaN NaN NaN NaN Downstream from the inflow of wastewater from ... 2019/7/26 2019/7/26
274 PRA000095 SAA005849 ST015 NaN ZETTAICODE_FY2011_W 113680.0 W_113680_20110702 Water River water 2011 ... NaN NaN NaN NaN NaN NaN NaN Upstream from the inflow of wastewater from Ch... 2019/7/26 2019/7/26
275 PRA000095 SAA005850 ST015 NaN ZETTAICODE_FY2011_W 118873.0 W_118873_20110702 Water River water 2011 ... NaN NaN NaN NaN NaN NaN NaN Downstream from the inflow of wastewater from ... 2019/7/26 2019/7/26

276 rows × 66 columns

pfos = data[data["ChemicalName"] == "PFOS"] #Extract only data whose Chemical Name is PFOS
pfoa = data[data["ChemicalName"] == "PFOA"] #Extract only data with Chemical Name PFOA

If you check the contents of each, it looks like the following.

pfos
MeasuredID ProjectID SampleID ScientificName ChemicalID ChemicalName ExperimentID MeasuredValue AlternativeData Unit Remarks RegisterDate UpdateDate
269 35646 PRA000048 SAA002867 Water CH0000365 PFOS EXA000001 0.0073 NaN μg/L NaN 2017/10/25 2018/6/8
270 35647 PRA000048 SAA002868 Water CH0000365 PFOS EXA000001 0.0030 NaN μg/L NaN 2017/10/25 2018/6/8
271 35648 PRA000048 SAA002869 Water CH0000365 PFOS EXA000001 0.0034 NaN μg/L NaN 2017/10/25 2018/6/8
272 35649 PRA000048 SAA002870 Water CH0000365 PFOS EXA000001 0.0038 NaN μg/L NaN 2017/10/25 2018/6/8
273 35650 PRA000048 SAA002871 Water CH0000365 PFOS EXA000001 0.0020 NaN μg/L NaN 2017/10/25 2018/6/8
... ... ... ... ... ... ... ... ... ... ... ... ... ...
540 81380 PRA000095 SAA005846 Water CH0000365 PFOS EXA000001 0.0010 <1.00E-3 μg/L NaN 2019/7/26 2019/7/26
541 81381 PRA000095 SAA005847 Water CH0000365 PFOS EXA000001 0.0010 <1.00E-3 μg/L NaN 2019/7/26 2019/7/26
542 81382 PRA000095 SAA005848 Water CH0000365 PFOS EXA000001 0.0076 NaN μg/L NaN 2019/7/26 2019/7/26
543 81383 PRA000095 SAA005849 Water CH0000365 PFOS EXA000001 0.0028 NaN μg/L NaN 2019/7/26 2019/7/26
544 81384 PRA000095 SAA005850 Water CH0000365 PFOS EXA000001 0.0120 NaN μg/L NaN 2019/7/26 2019/7/26

276 rows × 13 columns

pfoa
MeasuredID ProjectID SampleID ScientificName ChemicalID ChemicalName ExperimentID MeasuredValue AlternativeData Unit Remarks RegisterDate UpdateDate
896 46410 PRA000060 SAA003568 Water CH0000372 PFOA EXA000001 0.00436 NaN μg/L NaN 2018/2/9 2018/6/8
897 46411 PRA000060 SAA003569 Water CH0000372 PFOA EXA000001 0.01166 NaN μg/L NaN 2018/2/9 2018/6/8
898 46412 PRA000060 SAA003570 Water CH0000372 PFOA EXA000001 0.01180 NaN μg/L NaN 2018/2/9 2018/6/8
899 46413 PRA000060 SAA003571 Water CH0000372 PFOA EXA000001 0.00430 NaN μg/L NaN 2018/2/9 2018/6/8
900 46414 PRA000060 SAA003572 Water CH0000372 PFOA EXA000001 0.00439 NaN μg/L NaN 2018/2/9 2018/6/8
... ... ... ... ... ... ... ... ... ... ... ... ... ...
1143 81135 PRA000095 SAA005846 Water CH0000372 PFOA EXA000001 0.00100 <1.00E-3 μg/L NaN 2019/7/26 2019/7/26
1144 81136 PRA000095 SAA005847 Water CH0000372 PFOA EXA000001 0.00200 NaN μg/L NaN 2019/7/26 2019/7/26
1145 81137 PRA000095 SAA005848 Water CH0000372 PFOA EXA000001 0.00140 tr(1.40E-3) μg/L NaN 2019/7/26 2019/7/26
1146 81138 PRA000095 SAA005849 Water CH0000372 PFOA EXA000001 0.03900 NaN μg/L NaN 2019/7/26 2019/7/26
1147 81139 PRA000095 SAA005850 Water CH0000372 PFOA EXA000001 0.02400 NaN μg/L NaN 2019/7/26 2019/7/26

252 rows × 13 columns

From the extracted data of PFOS and PFOA, only the columns of SampleID and BeautifulValue are extracted, and the column names of the measured values are both MeasuredValue, so change them to PFOS and PFOA, respectively.

pfos = pfos[["SampleID","MeasuredValue"]].rename(columns={'MeasuredValue': 'PFOS'})
pfoa = pfoa[["SampleID","MeasuredValue"]].rename(columns={'MeasuredValue': 'PFOA'})

Merge PFOS and PFOA data frames with SampleID.

df = pd.merge(pfos, pfoa, on="SampleID").astype({"PFOS": float}, {"PFOA": float})
df
SampleID PFOS PFOA
0 SAA003568 0.00551 0.00436
1 SAA003569 0.01877 0.01166
2 SAA003570 0.01546 0.01180
3 SAA003571 0.00356 0.00430
4 SAA003572 0.00682 0.00439
... ... ... ...
247 SAA005846 0.00100 0.00100
248 SAA005847 0.00100 0.00200
249 SAA005848 0.00760 0.00140
250 SAA005849 0.00280 0.03900
251 SAA005850 0.01200 0.02400

252 rows × 3 columns

What I want to know this time is the total value of the concentrations of PFOS and PFOA, so create a column called TOTAL and enter the total value there.

df['TOTAL'] = df.sum(axis=1, numeric_only=True)
df
SampleID PFOS PFOA TOTAL
0 SAA003568 0.00551 0.00436 0.00987
1 SAA003569 0.01877 0.01166 0.03043
2 SAA003570 0.01546 0.01180 0.02726
3 SAA003571 0.00356 0.00430 0.00786
4 SAA003572 0.00682 0.00439 0.01121
... ... ... ... ...
247 SAA005846 0.00100 0.00100 0.00200
248 SAA005847 0.00100 0.00200 0.00300
249 SAA005848 0.00760 0.00140 0.00900
250 SAA005849 0.00280 0.03900 0.04180
251 SAA005850 0.01200 0.02400 0.03600

252 rows × 4 columns

Extract only the Sample ID and latitude / longitude data from the sample table.

sample = sample[["SampleID", "CollectionLongitudeFrom", "CollectionLatitudeFrom"]]
sample
SampleID CollectionLongitudeFrom CollectionLatitudeFrom
0 SAA002867 139.850000 35.599333
1 SAA002868 140.000000 35.583000
2 SAA002869 139.834500 35.515833
3 SAA002870 139.900333 35.532500
4 SAA002871 139.833667 35.433000
... ... ... ...
271 SAA005846 144.192783 43.062302
272 SAA005847 144.232365 43.041624
273 SAA005848 144.155650 42.997641
274 SAA005849 141.719167 42.765833
275 SAA005850 141.719167 42.782500

276 rows × 3 columns

Merge this with the previously created concentration table and SampleID.

df = pd.merge(df, sample, on="SampleID")
df
SampleID PFOS PFOA TOTAL CollectionLongitudeFrom CollectionLatitudeFrom
0 SAA003568 0.00551 0.00436 0.00987 139.607158 35.453746
1 SAA003569 0.01877 0.01166 0.03043 139.677734 35.501549
2 SAA003570 0.01546 0.01180 0.02726 139.617230 35.528481
3 SAA003571 0.00356 0.00430 0.00786 139.498684 35.578287
4 SAA003572 0.00682 0.00439 0.01121 139.480358 35.536396
... ... ... ... ... ... ...
247 SAA005846 0.00100 0.00100 0.00200 144.192783 43.062302
248 SAA005847 0.00100 0.00200 0.00300 144.232365 43.041624
249 SAA005848 0.00760 0.00140 0.00900 144.155650 42.997641
250 SAA005849 0.00280 0.03900 0.04180 141.719167 42.765833
251 SAA005850 0.01200 0.02400 0.03600 141.719167 42.782500

252 rows × 6 columns

Save the completed file in csv format.

df.to_csv("sum_pfcs.csv")

Load this into QGIS.

Display concentration data on a map using QGIS

Download QGIS from below. https://www.qgis.org/ja/site/ Please refer to the various sites that explain how to use QGIS.

Map information (GML shape file) is downloaded from the GIS homepage of the Ministry of Land, Infrastructure, Transport and Tourism. https://nlftp.mlit.go.jp/index.html

For the time being, this time, we will use the administrative area of the 2. policy area from the download of national land numerical information. Select and download the "nationwide" data from the page below. Please choose the year according to your purpose. スクリーンショット 2020-06-19 18.32.01.png

Launch QGIS, click the Vector tab from Data Source Manager, select the national administrative area shapefile downloaded above and click "Add". スクリーンショット 2020-06-19 18.48.16.png

The map of Japan will be loaded as shown below, so change it to your favorite color. スクリーンショット 2020-06-19 18.49.46.png

Again, click the Delimited Text tab from the Data Source Manager and select the CSV file (sum_pfcs.csv) created in the previous section. After that, select longitude (CollectionLongitudeFrom) in the X field of the geometry definition and latitude (CollectionLatitudeFrom) in the Y field, and click "Add". スクリーンショット 2020-06-19 18.50.37.png

Then, the sampling points are plotted on the map. Next, color-code by density. Double-click "sum_pfcs" from the layer to open the layer properties. Select the Symbolism tab. Select "Divided into stages" for the symbol, and select "TOTAL" for the column because you want to color-code by the concentration of PFOS + PFOA. Change the symbol to your liking. If you change the number of classifications while keeping the mode at equal intervals, the classifications at equal intervals are automatically output. Here, the number of classifications is 2, and the value of a group of small numbers is 0-0.05. (In ChemTHEATER, the concentration of chemical substances in water is unified at µg / L, so if you want to color-code whether or not it exceeds the standard value of 50 ng / L, you need to set it to 0.05.) By default, Since the high density plot will be displayed in the lower layer, select "Symbol level" from "Advanced settings", check the checkbox of Enable symbol level, and set the layer of the high density group to 1. To do. スクリーンショット 2020-06-19 19.26.07.png

As a result, the points where PFOS + PFOA exceeds 50 ng / L and the points where it does not exceed are displayed in different colors. If you want to output this figure, select "Import / Export" from "Project", select "Export Map to Image" or "Export Map to PDF", and copy it to the clipboard or file it in any directory. Save as. スクリーンショット 2020-06-19 19.26.56.png

How about? Did you do it? In this way, we would appreciate it if you could utilize the data recorded in ChemTHEATER by expressing it on a map. Return to "Learning Python with ChemTHEATER"

Recommended Posts

Plot the environmental concentration of organofluorine compounds on a map using open data
[Python] Plot data by prefecture on a map (number of cars owned nationwide)
Drawing on Jupyter using the plot function of pandas
[Python] I wrote the route of the typhoon on the map using folium
Folium: Visualize data on a map with Python
Calculate the probability of outliers on a boxplot
Check the status of your data using pandas_profiling
Create a GUI on the terminal using curses
Scraping the winning data of Numbers using Docker
I want to take a screenshot of the site on Docker using any font
Cut a part of the string using a Python slice
I tried using the API of the salmon data project
Let's use the open data of "Mamebus" in Python
A Study on Visualization of the Scope of Prediction Models
Create a shape on the trajectory of an object
How to plot the distribution of bacterial composition from Qiime2 analysis data in a box plot
How to easily draw the structure of a neural network on Google Colaboratory using "convnet-drawer"
Avoiding the pitfalls of using a Mac (for Linux users?)
A note on the default behavior of collate_fn in PyTorch
A network diagram was created with the data of COVID-19.
Reuse the behavior of the @property method by using a descriptor [16/100]
Analyzing data on the number of corona patients in Japan
Data analysis based on the election results of the Tokyo Governor's election (2020)
[Treasure Data] [Python] Execute a query on Treasure Data using TD Client
Display the address entered using Rails gem'geocoder' on Google Map
I made a VGG16 model using TensorFlow (on the way)
[Python] I tried collecting data using the API of wikipedia
The story of creating a database using the Google Analytics API
Easy on Mac! Plot of unit step response using Python
Let's make a map of the new corona infection site [FastAPI / PostGIS / deck.gl (React)] (Data processing)
Fixed-point observation of specific data on the Web by automatically executing a Web browser on the server (Ubuntu16.04) (2) -Web scraping-
I tried using PDF data of online medical care based on the spread of the new coronavirus infection