Let's visualize the concentration of organic fluorine compounds by utilizing the data recorded in ChemTHEATER

What is an organic fluorine compound?

It has both hydrophilic and lipophilic properties, and is used in a wide range of applications such as water repellents, digestive agents, oil agents, and etching agents.

For an easy-to-understand explanation of organofluorine compounds, see NHK's Close-up Gendai HP. https://www.nhk.or.jp/gendai/articles/4280/index.html

Due to its harmfulness, the manufacture, use, import and export of perfluorooctanoic acid (PFOA) among organofluorine compounds will be banned internationally from 2019, and perfluorooctanesulfonate (PFOS) will be banned from 2009. Manufacturing, use and import / export are restricted.

Standard values for organic fluorine compounds, etc.

In recent years, both substances have become problems such as being detected in well water and flowing out into rivers. In response to this, the Ministry of Health, Labor and Welfare added PFOS and PFOA to the water quality management target setting items (enforced on April 1, 2nd year of Reiwa). The target value of the tap water quality standard is 50 ng / L, which is the total value of PFOS and PFOA. https://www.mhlw.go.jp/stf/seisakunitsuite/bunya/topics/bukyoku/kenkou/suido/kijun/index.html

In addition, the Ministry of the Environment conducted a nationwide abundance survey of PFOS and PFOA in the water environment, and conducted "Enforcement of Environmental Standards for the Protection of Human Health Related to Water Pollution (Notice)" (May 28, 2nd year of Ordinance). (Appendix), we set 50 ng / L as the total value of PFOS and PFOA as a provisional target value for the water environment. Results of PFOA and PFOA national abundance survey in the first year of Reiwa https://www.env.go.jp/press/108091.html

By the way, in the survey results for water purification plants released by the Ministry of Health, Labor and Welfare in June 2nd year of Reiwa, there was no place where the provisional target value of tap water, PFOS + PFOA 50 ng / L, was exceeded. https://www.mhlw.go.jp/content/10900000/000638290.pdf

Obtain environmental concentration data of organofluorine compounds from ChemTHEATER

ChemTHEATER publishes monitoring data of chemical substance concentrations in the environment, and also includes data on organofluorine compounds. Let's actually see at what concentration PFOS and PFOA were detected, and how high or low the total value is compared to the reference value.

Select "Sample Search" from the menu bar of ChemTHEATER.
Select "Abiotic --Water" from "Sample Type".
Select "Perfluoroalkyl and polyfluoroalkyl substances" from the "Chemicals Group".
Select "Asia" from "Collection area" and set the collection country to "Japan".
Click the "Search" button to output a list of samples that meet the conditions.
"Export samples" outputs the sample information, and "Export measured data" outputs the measured values of the target chemical substance as a tab-delimited text file.

Save the exported file in any directory and use it for analysis.

Format the exported data

Load the required library.

import pandas as pd

Read information on measured values of chemical substances.

data_file = "measureddata_20200521044415.tsv"
data = pd.read_csv(data_file, delimiter="\t")
data

	MeasuredID	ProjectID	SampleID	ScientificName	ChemicalID	ChemicalName	ExperimentID	MeasuredValue	AlternativeData	Unit	Remarks	RegisterDate	UpdateDate
0	81245	PRA000095	SAA005816	Water	CH0000362	PFBS	EXA000001	0.00100	<1.00E-3	μg/L	NaN	2019/7/26	2019/7/26
1	81246	PRA000095	SAA005817	Water	CH0000362	PFBS	EXA000001	0.00100	<1.00E-3	μg/L	NaN	2019/7/26	2019/7/26
2	81247	PRA000095	SAA005818	Water	CH0000362	PFBS	EXA000001	0.00100	NaN	μg/L	NaN	2019/7/26	2019/7/26
3	81248	PRA000095	SAA005819	Water	CH0000362	PFBS	EXA000001	0.00100	<1.00E-3	μg/L	NaN	2019/7/26	2019/7/26
4	81249	PRA000095	SAA005820	Water	CH0000362	PFBS	EXA000001	0.00100	<1.00E-3	μg/L	NaN	2019/7/26	2019/7/26
...	...	...	...	...	...	...	...	...	...	...	...	...	...
3087	48619	PRA000060	SAA003645	Water	CH0000793	THPFOS	EXA000001	0.00030	NaN	μg/L	NaN	2018/2/9	2018/6/8
3088	48620	PRA000060	SAA003646	Water	CH0000793	THPFOS	EXA000001	0.00008	NaN	μg/L	NaN	2018/2/9	2018/6/8
3089	48621	PRA000060	SAA003647	Water	CH0000793	THPFOS	EXA000001	0.00159	NaN	μg/L	NaN	2018/2/9	2018/6/8
3090	48622	PRA000060	SAA003648	Water	CH0000793	THPFOS	EXA000001	0.00188	NaN	μg/L	NaN	2018/2/9	2018/6/8
3091	48623	PRA000060	SAA003649	Water	CH0000793	THPFOS	EXA000001	0.00070	NaN	μg/L	NaN	2018/2/9	2018/6/8

3092 rows × 13 columns

Then, the sample information is read.

sample_file = "samples_20200521044410.tsv"
sample = pd.read_csv(sample_file, delimiter="\t")
sample

	ProjectID	SampleID	SampleType	TaxonomyID	UniqCodeType	UniqCode	SampleName	ScientificName	CommonName	CollectionYear	...	FlowRate	MeanPM10	MeanTotalSuspendedParticles	HumidityStartEnd	WindDirectionStartEnd	WindSpeedMSStartEnd	AmountOfCollectedAirStartEnd	Remarks	RegisterDate	UpdateDate
0	PRA000048	SAA002867	ST014	NaN	NaN	NaN	SW-St.1	Water	Surface water	2004	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	2017/10/25	2019/7/18
1	PRA000048	SAA002868	ST014	NaN	NaN	NaN	SW-St.3	Water	Surface water	2004	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	2017/10/25	2019/7/18
2	PRA000048	SAA002869	ST014	NaN	NaN	NaN	SW-St.4	Water	Surface water	2004	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	2017/10/25	2019/7/18
3	PRA000048	SAA002870	ST014	NaN	NaN	NaN	SW-St.5	Water	Surface water	2004	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	2017/10/25	2019/7/18
4	PRA000048	SAA002871	ST014	NaN	NaN	NaN	SW-St.7	Water	Surface water	2004	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	2017/10/25	2019/7/18
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
271	PRA000095	SAA005846	ST015	NaN	NaN	NaN	W_1xyz24_20100821	Water	River water	2010	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	Around Kushiro Airport	2019/7/26	2019/7/26
272	PRA000095	SAA005847	ST015	NaN	NaN	NaN	W_1xyz25_20100821	Water	River water	2010	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	Around Kushiro Airport	2019/7/26	2019/7/26
273	PRA000095	SAA005848	ST015	NaN	NaN	NaN	W_1xyz26_20100821	Water	River water	2010	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	Downstream from the inflow of wastewater from ...	2019/7/26	2019/7/26
274	PRA000095	SAA005849	ST015	NaN	ZETTAICODE_FY2011_W	113680.0	W_113680_20110702	Water	River water	2011	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	Upstream from the inflow of wastewater from Ch...	2019/7/26	2019/7/26
275	PRA000095	SAA005850	ST015	NaN	ZETTAICODE_FY2011_W	118873.0	W_118873_20110702	Water	River water	2011	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	Downstream from the inflow of wastewater from ...	2019/7/26	2019/7/26

276 rows × 66 columns

pfos = data[data["ChemicalName"] == "PFOS"] #Extract only data whose Chemical Name is PFOS
pfoa = data[data["ChemicalName"] == "PFOA"] #Extract only data with Chemical Name PFOA

If you check the contents of each, it looks like the following.

pfos

	MeasuredID	ProjectID	SampleID	ScientificName	ChemicalID	ChemicalName	ExperimentID	MeasuredValue	AlternativeData	Unit	Remarks	RegisterDate	UpdateDate
269	35646	PRA000048	SAA002867	Water	CH0000365	PFOS	EXA000001	0.0073	NaN	μg/L	NaN	2017/10/25	2018/6/8
270	35647	PRA000048	SAA002868	Water	CH0000365	PFOS	EXA000001	0.0030	NaN	μg/L	NaN	2017/10/25	2018/6/8
271	35648	PRA000048	SAA002869	Water	CH0000365	PFOS	EXA000001	0.0034	NaN	μg/L	NaN	2017/10/25	2018/6/8
272	35649	PRA000048	SAA002870	Water	CH0000365	PFOS	EXA000001	0.0038	NaN	μg/L	NaN	2017/10/25	2018/6/8
273	35650	PRA000048	SAA002871	Water	CH0000365	PFOS	EXA000001	0.0020	NaN	μg/L	NaN	2017/10/25	2018/6/8
...	...	...	...	...	...	...	...	...	...	...	...	...	...
540	81380	PRA000095	SAA005846	Water	CH0000365	PFOS	EXA000001	0.0010	<1.00E-3	μg/L	NaN	2019/7/26	2019/7/26
541	81381	PRA000095	SAA005847	Water	CH0000365	PFOS	EXA000001	0.0010	<1.00E-3	μg/L	NaN	2019/7/26	2019/7/26
542	81382	PRA000095	SAA005848	Water	CH0000365	PFOS	EXA000001	0.0076	NaN	μg/L	NaN	2019/7/26	2019/7/26
543	81383	PRA000095	SAA005849	Water	CH0000365	PFOS	EXA000001	0.0028	NaN	μg/L	NaN	2019/7/26	2019/7/26
544	81384	PRA000095	SAA005850	Water	CH0000365	PFOS	EXA000001	0.0120	NaN	μg/L	NaN	2019/7/26	2019/7/26

276 rows × 13 columns

pfoa

	MeasuredID	ProjectID	SampleID	ScientificName	ChemicalID	ChemicalName	ExperimentID	MeasuredValue	AlternativeData	Unit	Remarks	RegisterDate	UpdateDate
896	46410	PRA000060	SAA003568	Water	CH0000372	PFOA	EXA000001	0.00436	NaN	μg/L	NaN	2018/2/9	2018/6/8
897	46411	PRA000060	SAA003569	Water	CH0000372	PFOA	EXA000001	0.01166	NaN	μg/L	NaN	2018/2/9	2018/6/8
898	46412	PRA000060	SAA003570	Water	CH0000372	PFOA	EXA000001	0.01180	NaN	μg/L	NaN	2018/2/9	2018/6/8
899	46413	PRA000060	SAA003571	Water	CH0000372	PFOA	EXA000001	0.00430	NaN	μg/L	NaN	2018/2/9	2018/6/8
900	46414	PRA000060	SAA003572	Water	CH0000372	PFOA	EXA000001	0.00439	NaN	μg/L	NaN	2018/2/9	2018/6/8
...	...	...	...	...	...	...	...	...	...	...	...	...	...
1143	81135	PRA000095	SAA005846	Water	CH0000372	PFOA	EXA000001	0.00100	<1.00E-3	μg/L	NaN	2019/7/26	2019/7/26
1144	81136	PRA000095	SAA005847	Water	CH0000372	PFOA	EXA000001	0.00200	NaN	μg/L	NaN	2019/7/26	2019/7/26
1145	81137	PRA000095	SAA005848	Water	CH0000372	PFOA	EXA000001	0.00140	tr(1.40E-3)	μg/L	NaN	2019/7/26	2019/7/26
1146	81138	PRA000095	SAA005849	Water	CH0000372	PFOA	EXA000001	0.03900	NaN	μg/L	NaN	2019/7/26	2019/7/26
1147	81139	PRA000095	SAA005850	Water	CH0000372	PFOA	EXA000001	0.02400	NaN	μg/L	NaN	2019/7/26	2019/7/26

252 rows × 13 columns

From the extracted data of PFOS and PFOA, only the columns of SampleID and BeautifulValue are extracted, and the column names of the measured values are both MeasuredValue, so change them to PFOS and PFOA, respectively.

pfos = pfos[["SampleID","MeasuredValue"]].rename(columns={'MeasuredValue': 'PFOS'})
pfoa = pfoa[["SampleID","MeasuredValue"]].rename(columns={'MeasuredValue': 'PFOA'})

Merge PFOS and PFOA data frames with SampleID.

df = pd.merge(pfos, pfoa, on="SampleID").astype({"PFOS": float}, {"PFOA": float})
df

	SampleID	PFOS	PFOA
0	SAA003568	0.00551	0.00436
1	SAA003569	0.01877	0.01166
2	SAA003570	0.01546	0.01180
3	SAA003571	0.00356	0.00430
4	SAA003572	0.00682	0.00439
...	...	...	...
247	SAA005846	0.00100	0.00100
248	SAA005847	0.00100	0.00200
249	SAA005848	0.00760	0.00140
250	SAA005849	0.00280	0.03900
251	SAA005850	0.01200	0.02400

252 rows × 3 columns

What I want to know this time is the total value of the concentrations of PFOS and PFOA, so create a column called TOTAL and enter the total value there.

df['TOTAL'] = df.sum(axis=1, numeric_only=True)
df

	SampleID	PFOS	PFOA	TOTAL
0	SAA003568	0.00551	0.00436	0.00987
1	SAA003569	0.01877	0.01166	0.03043
2	SAA003570	0.01546	0.01180	0.02726
3	SAA003571	0.00356	0.00430	0.00786
4	SAA003572	0.00682	0.00439	0.01121
...	...	...	...	...
247	SAA005846	0.00100	0.00100	0.00200
248	SAA005847	0.00100	0.00200	0.00300
249	SAA005848	0.00760	0.00140	0.00900
250	SAA005849	0.00280	0.03900	0.04180
251	SAA005850	0.01200	0.02400	0.03600

252 rows × 4 columns

Extract only the Sample ID and latitude / longitude data from the sample table.

sample = sample[["SampleID", "CollectionLongitudeFrom", "CollectionLatitudeFrom"]]
sample

	SampleID	CollectionLongitudeFrom	CollectionLatitudeFrom
0	SAA002867	139.850000	35.599333
1	SAA002868	140.000000	35.583000
2	SAA002869	139.834500	35.515833
3	SAA002870	139.900333	35.532500
4	SAA002871	139.833667	35.433000
...	...	...	...
271	SAA005846	144.192783	43.062302
272	SAA005847	144.232365	43.041624
273	SAA005848	144.155650	42.997641
274	SAA005849	141.719167	42.765833
275	SAA005850	141.719167	42.782500

276 rows × 3 columns

Merge this with the previously created concentration table and SampleID.

df = pd.merge(df, sample, on="SampleID")
df

	SampleID	PFOS	PFOA	TOTAL	CollectionLongitudeFrom	CollectionLatitudeFrom
0	SAA003568	0.00551	0.00436	0.00987	139.607158	35.453746
1	SAA003569	0.01877	0.01166	0.03043	139.677734	35.501549
2	SAA003570	0.01546	0.01180	0.02726	139.617230	35.528481
3	SAA003571	0.00356	0.00430	0.00786	139.498684	35.578287
4	SAA003572	0.00682	0.00439	0.01121	139.480358	35.536396
...	...	...	...	...	...	...
247	SAA005846	0.00100	0.00100	0.00200	144.192783	43.062302
248	SAA005847	0.00100	0.00200	0.00300	144.232365	43.041624
249	SAA005848	0.00760	0.00140	0.00900	144.155650	42.997641
250	SAA005849	0.00280	0.03900	0.04180	141.719167	42.765833
251	SAA005850	0.01200	0.02400	0.03600	141.719167	42.782500

252 rows × 6 columns

Save the completed file in csv format.

df.to_csv("sum_pfcs.csv")

Load this into QGIS.

Display concentration data on a map using QGIS

Download QGIS from below. https://www.qgis.org/ja/site/ Please refer to the various sites that explain how to use QGIS.

Map information (GML shape file) is downloaded from the GIS homepage of the Ministry of Land, Infrastructure, Transport and Tourism. https://nlftp.mlit.go.jp/index.html

For the time being, this time, we will use the administrative area of the 2. policy area from the download of national land numerical information. Select and download the "nationwide" data from the page below. Please choose the year according to your purpose. スクリーンショット 2020-06-19 18.32.01.png

Launch QGIS, click the Vector tab from Data Source Manager, select the national administrative area shapefile downloaded above and click "Add". スクリーンショット 2020-06-19 18.48.16.png

The map of Japan will be loaded as shown below, so change it to your favorite color. スクリーンショット 2020-06-19 18.49.46.png

Again, click the Delimited Text tab from the Data Source Manager and select the CSV file (sum_pfcs.csv) created in the previous section. After that, select longitude (CollectionLongitudeFrom) in the X field of the geometry definition and latitude (CollectionLatitudeFrom) in the Y field, and click "Add". スクリーンショット 2020-06-19 18.50.37.png

Then, the sampling points are plotted on the map. Next, color-code by density. Double-click "sum_pfcs" from the layer to open the layer properties. Select the Symbolism tab. Select "Divided into stages" for the symbol, and select "TOTAL" for the column because you want to color-code by the concentration of PFOS + PFOA. Change the symbol to your liking. If you change the number of classifications while keeping the mode at equal intervals, the classifications at equal intervals are automatically output. Here, the number of classifications is 2, and the value of a group of small numbers is 0-0.05. (In ChemTHEATER, the concentration of chemical substances in water is unified at µg / L, so if you want to color-code whether or not it exceeds the standard value of 50 ng / L, you need to set it to 0.05.) By default, Since the high density plot will be displayed in the lower layer, select "Symbol level" from "Advanced settings", check the checkbox of Enable symbol level, and set the layer of the high density group to 1. To do. スクリーンショット 2020-06-19 19.26.07.png

As a result, the points where PFOS + PFOA exceeds 50 ng / L and the points where it does not exceed are displayed in different colors. If you want to output this figure, select "Import / Export" from "Project", select "Export Map to Image" or "Export Map to PDF", and copy it to the clipboard or file it in any directory. Save as. スクリーンショット 2020-06-19 19.26.56.png

How about? Did you do it? In this way, we would appreciate it if you could utilize the data recorded in ChemTHEATER by expressing it on a map. Return to "Learning Python with ChemTHEATER"