[Introduction] Artificial satellite data analysis using Python (Google Colab environment)

Introduction-What is satellite data?

--Artificial satellite data refers to data acquired by ** "remote sensing" ** using artificial satellites. ――Since artificial satellite data requires specialized tools and large-capacity data processing infrastructure, the organizations that can be used have been limited to university institutions and some specialized institutions, but with the recent spread of open source libraries and data. By using the cloud processing platform, we have created an external environment where even general organizations can easily handle artificial satellite data. ――By using satellite data, you can expect to analyze various ** places, times, and target states ** that could not be acquired so far with big data. --Therefore, in this article, ** How to handle data **, without using ** specialized tools for satellite data analysis ** (using ** python **, which is one of the most familiar tools) ), I would like to introduce how to use it for free ** so that anyone can feel free to try it. ――In addition, this time, I selected a satellite data set that is easy to use for business and social implementation. I hope you will read it while imagining the scenes that can be applied.

Target data

――In this article, we will introduce how to use the satellite dataset using ** night light data ** as an example. image.png Image and Data processing by NOAA's National Geophysical Data Center. DMSP data collected by the US Air Force Weather Agency.

What is NightTime Lights?

――One of the data observed by artificial satellites is night light data. ――Simply put, it is the data that keeps sensing the amount of light in the city at night. -** Nocturnal light data has been reported to correlate with economic activity (GDP and energy consumption) **, and empirical research is underway as a proxy variable for economic activity. ――In particular, application to countries where it is difficult to measure the amount of economic activity in each city or measure accurate economic indicators is being considered.

Data summary

――Night light data is measured by several artificial satellites, but this time we will use the well-established data set ** DMSP-OLS ** for the night light data research.

Usage image

--At the time of download, artificial satellite data is huge in global size and region size. ――Therefore, it is common to cut out and use only the necessary area during analysis. image.png --Two are required for the downloaded original data (raster data) and the data that specifies the cutout range (vector data). (Both can be downloaded on the WEB)

Basic operation in python

--Use the following library (this article will introduce only the minimum required functions) --Data loading: rasterio --Cut out the target area: geopandas --Numerical calculation: numpy --Visualization: matplotlib

Analysis environment

Workflow

―― 1. Data acquisition ―― 2. Read data ―― 3. Extraction of required area ―― 4. Data visualization / analysis-Visualize / compare night light data for each country-

1. Data acquisition

-You can download it from DMSP-OLS data download. --Click the active link to start the download image.png ――However, as stated in the annotation, the compressed file at the time of download is ** 300MB **, but when decompressed, it will be ** 3GB ** per file, so work on a general notebook PC. Somewhat tough ――Therefore, a series of processing from data download to decompression is performed on Google Colab. -(If you have a high-spec PC, you can download and unzip it with a click ...)

--First, install the necessary libraries on Google Colab.

#Installation of required libraries
!pip install sh
! pip install rasterio
!pip install geopandas

--Next, create a folder (directory) to save the downloaded data set.

import os
from sh import wget, gunzip, mv
import tarfile

#Directory for data storage'data'Create
if not os.path.exists('data'):
    os.mkdir('data')

--Use the wget command to download the compressed data --Unzip the downloaded file --The file is double compressed with .gz and .zip, which is confusing, but eventually the tif file is decompressed.

#Create a URL to download
target_data = 'F101992'
url = f'https://ngdc.noaa.gov/eog/data/web_data/v4composites/{target_data}.v4.tar'

#Download data
wget(url)

#Decompress the compressed file (extract only the corresponding file)
with tarfile.open(f'/content/{target_data}.v4.tar') as tar:
    # stable_lights.avg_vis.tif.Get the file name gz
    file = [tarinfo for tarinfo in tar.getmembers() if tarinfo.name.endswith("web.stable_lights.avg_vis.tif.gz")]
    #Unzip the target file (.gz)
    tar.extractall(path='/content/', members=[file[0]])
    #Unzip the target file (Unzip)
    gunzip(f'/content/{target_data}.v4b_web.stable_lights.avg_vis.tif.gz')
    #Move target file
    mv(f'/content/{target_data}.v4b_web.stable_lights.avg_vis.tif', '/content/data/')

2. Read data

--Read the unzipped tif file --Easy to read with rasterio.open ('path to file') --You can get the data in numpy format by reading () the read object. --If you convert to numpy format, you can analyze and visualize it as you like.

import rasterio
import numpy as np
import matplotlib.pyplot as plt

with rasterio.open('/content/data/F101992.v4b_web.stable_lights.avg_vis.tif') as src:
    data = src.read()#Read in numpy format

#Check the size of the data
data.shape

#Data visualization
plt.imshow(data[0])
plt.colorbar()

** Global night light data for 1992 ** image.png

――However, since it is in the downloaded state (global data) here, it is necessary to cut out the data in the unit you want to analyze.

3. Extraction of required area

--To extract the required area, you need the data to specify the cutout range. --There are various formats for the data that specifies the cutout range, but here we will use the geojson format data (other shape files are also famous). -You can download the border data for each country from the here site. --You can also click and download here, but use wget to download it on Google Colab.

#Download vector file
wget('https://datahub.io/core/geo-countries/r/countries.geojson')

--Next, load the obtained geojson file --By using geopandas, you can read geojson files as well as handle them like pandas.dataframe. --This will get the data of the boundary line of any area (country).

import geopandas as gpd

#Read geojson file
countries = gpd.read_file('/content/countries.geojson')

#Check the contents
countries.head()

#Visualization of border data
countries.plot()

#Extraction of Japanese borders (normal pandas.dataframe operation)
countries.query('ADMIN == "Japan"')

--Next, apply the acquired data of the border line of Japan to the global dataset and try to extract only the data of the Japan area. --Use a method called rasterio.mask to cut out the necessary parts. --rasterio.mask.mask (object reading "tif file", geometry column of "border data", crop = True) --This outputs two data, out_image and out_transform, and the data cut out in numpy format is stored in out_image (out_transform stores the coordinate conversion information of the cut out data, but it is a bit complicated, so here I will omit it)

import rasterio.mask

with rasterio.open('/content/data/F101992.v4b_web.stable_lights.avg_vis.tif') as src:
    out_image, out_transform = rasterio.mask.mask(src, countries.query('ADMIN == "Japan"').geometry, crop=True)

――I will visualize the out_image cut out at the border of Japan

** Japan's night light data in 1992 ** (It can be seen that the amount of light in the Greater Tokyo area is large) image.png

4. Data visualization / analysis

--So far, we have introduced how to obtain night light data for any country. ――I will try to visualize and analyze the night light data of some countries using the method introduced so far.

#Functionize a series of processes
def load_ntl(target_data, area):
    #Download only when the data does not exist
    if not os.path.exists(f'/content/data/{target_data}.v4b_web.stable_lights.avg_vis.tif'):
        url = f'https://ngdc.noaa.gov/eog/data/web_data/v4composites/{target_data}.v4.tar'
        #Download data
        wget(url)
        #Decompress the compressed file (extract only the corresponding file)
        with tarfile.open(f'/content/{target_data}.v4.tar') as tar:
            # stable_lights.avg_vis.tif.Get the file name gz
            file = [tarinfo for tarinfo in tar.getmembers() if tarinfo.name.endswith("web.stable_lights.avg_vis.tif.gz")]
            #Unzip the target file (.gz)
            tar.extractall(path='/content/', members=[file[0]])
            #Unzip the target file (Unzip)
            gunzip(f'/content/{target_data}.v4b_web.stable_lights.avg_vis.tif.gz')
            #Move target file
            mv(f'/content/{target_data}.v4b_web.stable_lights.avg_vis.tif', '/content/data/')
    #Extract the data of the corresponding area from the TIF file
    with rasterio.open(f'/content/data/{target_data}.v4b_web.stable_lights.avg_vis.tif') as src:
        out_image, out_transform = rasterio.mask.mask(src, countries.query(f'ADMIN == "{area}"').geometry, crop=True)
    return out_image

#Function for visualization
def show(data):
    plt.figure(figsize=(15, 5))
    plt.subplot(121)
    plt.imshow(data[0])
    plt.subplot(122)
    plt.hist(data.reshape(-1), bins=np.arange(1, 63, 1))

#Usage example (acquired data from Japan in 1992)
japan_1992 = load_ntl(target_data='F101992', area='Japan')

-I will try to visualize the night light of various countries. (You can check the data of your favorite country / year by changing the variables of target_data and area)

#Obtained data from Japan, China, Thailand, and Cambodia
japan_1992 = load_ntl(target_data='F101992', area='Japan')
china_1992 = load_ntl(target_data='F101992', area='China')
thailand_1992 = load_ntl(target_data='F101992', area='Thailand')
cambodia_1992 = load_ntl(target_data='F101992', area='Cambodia')

#Visualization
show(japan_1992)
show(china_1992)
show(thailand_1992)
show(cambodia_1992 )

Japan nocturnal light data in 1992 image.png

1992 Chinese night light data image.png

1992 Thai night light data image.png

1992 Cambodian night light data image.png

――In this way, Japan is generally bright as of 1992, China has a large land area, so there are large differences between cities, Thailand is bright with highways and local cities centered on Bangkok, and Cambodia is other than the capital. You can see the tendency of each country such as darkness. ――Here, I will only analyze the above pattern, but I think it would be interesting to compare other age groups and countries. --The research area uses the Sum of NTL (Night time Light) (total nighttime light intensity for each area) as an index, and comparative analysis is performed with the GDP and energy consumption index. --Also, you can analyze units smaller than the country (prefectures and municipalities) by downloading other geojson files (shape files).

Supplement

――We have introduced the night light data so far, but some problems have also been reported in the night light data. ――For example, since the sensor type is different for each fixed institution (F10, F12, etc. in the header of the data name), slightly different sensor biases occur. Therefore, when analyzing the amount of light in a long-term trend, pretreatment is required to remove those biases (many research papers have been published in the process called Calibration). ――In addition, this data expresses the amount of night light as an integer value from 0 to 63, and while it is easy to handle, saturation phenomenon (Saturation) occurs at points where the amount of light is quite large, and the amount of light can be measured correctly. It also has the property of not being present. --Also, this time we introduced a satellite dataset called DMSP-OLS, but this satellite mission was completed in 2013, and now it is the successor to NPP VII RS. A satellite with a higher precision sensor called gov / VIIRS /) is in operation. ――Since satellite datasets are not all-purpose, it is necessary to refer to previous research when using them to understand these problems before using them for analysis.

Finally

――In this article, we introduced how to handle satellite datasets in the Google Colab environment. ――This time, we introduced the night light data set, but various satellite data sets are open to the public as open data, and you can basically handle them in the same way as this time. --New big data is possible by combining and analyzing ** micro-side big data ** and ** macro-side big data ** such as satellite datasets that are collected and accumulated by each company. I think that sex will be born. ――I hope that the various satellite datasets released as open data will be used more for the benefit of society.

Recommended Posts

[Introduction] Artificial satellite data analysis using Python (Google Colab environment)
Data analysis using python pandas
[Technical book] Introduction to data analysis using Python -1 Chapter Introduction-
Creating Google Spreadsheet using Python / Google Data API
Data analysis python
[Environment construction] Dependency analysis using CaboCha in Python 2.7
Data analysis with python 2
Data analysis using xarray
Data analysis overview python
Data cleaning using Python
Python data analysis template
Data analysis with Python
[Python] [Word] [python-docx] Simple analysis of diff data using python
Reading Note: An Introduction to Data Analysis with Python
Easy way to scrape with python using Google Colab
Data analysis environment construction with Python (IPython notebook + Pandas)
[Python] Get insight data using Google My Business API
Artificial satellite image analysis by Google Earth Engine and Google Colab-Satellite image analysis starting for free (Introduction)-
My python data analysis container
Visualize plant activity from space using satellite data and Python
Python for Data Analysis Chapter 4
Inflating text data by retranslation using google translate in Python
20200329_Introduction to Data Analysis with Python Second Edition Personal Summary
My python data analytics environment
Build a python data analysis environment on Mac (El Capitan)
Python data analysis learning notes
Python for Data Analysis Chapter 2
[Python] Introduction to graph creation using coronavirus data [For beginners]
Play with YouTube Data API v3 using Google API Python Client
Python for Data Analysis Chapter 3
Introduction to Data Analysis with Python P32-P43 [ch02 3.US Baby Names 1880-2010]
Introduction to Data Analysis with Python P17-P26 [ch02 1.usa.gov data from bit.ly]
Create a USB boot Ubuntu with a Python environment for data analysis
First satellite data analysis by Tellus
Data acquisition using python googlemap api
Python: Time Series Analysis: Preprocessing Time Series Data
[Python3] Google translate google translate without using api
Using Cloud Storage from Python3 (Introduction)
Preprocessing template for data analysis (Python)
Recommendation of data analysis using MessagePack
Data analysis starting with python (data visualization 1)
Introduction to image analysis opencv python
Data analysis starting with python (data visualization 2)
Python3.6 environment construction (using Win environment Anaconda)
I tried fMRI data analysis with python (Introduction to brain information decoding)
Python development environment for macOS using venv 2016
Python visualization tool for data analysis work
Introduction to Discrete Event Simulation Using Python # 1
[Introduction to Python3, Day 17] Chapter 8 Data Destinations (8.1-8.2.5)
Using venv in Windows + Docker environment [Python]
[Beginner] Python web scraping using Google Colaboratory
[Introduction to Python3, Day 17] Chapter 8 Data Destinations (8.3-8.3.6.1)
[Python] Create a Batch environment using AWS-CDK
Get Google Fit API data in Python
Try using Python with Google Cloud Functions
Recommendation tutorial using association analysis (python implementation)
[Introduction to Python3 Day 19] Chapter 8 Data Destinations (8.4-8.5)
Get Youtube data in Python using Youtube Data API
[Introduction to Python3 Day 18] Chapter 8 Data Destinations (8.3.6.2 to 8.3.6.3)
[Python] First data analysis / machine learning (Kaggle)
Creating a data analysis application using Streamlit