[PYTHON] Software training for those who start researching space

Introduction

I would like to briefly introduce the basics of software for people who aspire to space so that they can do it at home. I'm assuming a student who did a little software in class. (2020.4.9. First edition for the time being).

How to use python with google Colab for online learning

-Online learning using google Colab for those who start space research

Please refer to. How to use Tex

-How to use tex + overleaf + beamer for those who start space research

Please refer to.

Basic knowledge

What is software? Why is it important?

Software is simply the language of the computer world, and language is also the tool needed to "tell your thoughts to the other person" and "understand the other person's thoughts." In other words, it is a "communication tool." Who connects who? Speaking of which, the target is hardware and CPU, hardware and human beings, and that target is now an important existence in connecting "things" and "people". As for how it relates to space research, software is indispensable as a tool for "conversation" with beautiful images of the universe and data that can be seen for the first time in the world.

Software is now recognized as an "experimental device" or "experimental technology" in space research. For example, in fact, NASA has a team of software experts, and the laboratories that develop experimental equipment have software experts. Software language is a tool for skillfully "communication" with state-of-the-art observation equipment and "dialogue" with huge amounts of data. With the recognition that "foreign language" is as important as English, software "words" and "words" It is a good idea to practice the skills of "grammar" and "writing" while you are young.

Software language

What is the minimum without fear of criticism? Speaking of which, it's python now. However, just as there are many languages other than English and French, there are various languages for computers, and there are good and bad things, so if you only know a certain language, everything is OK. So, let's first take a look at the types of computer languages.

CUI command

CUI is an abbreviation for Character User Interface (CUI). For windows, it corresponds to the operation at the command prompt, and for Mac and linux, it corresponds to the operation at the terminal. If you are new to each, please read the following articles first.

-I can't hear you anymore! How to use the command prompt [for beginners] -I can't hear you anymore! How to use the terminal [for beginners] -16 selections of "OS X" terminal commands that Mac users should know

Why do young people have to type low-level commands? You might have thought. In the software world, low-level operations (CUI) are no better than high-level operations (GUI). In particular, when analyzing space observation equipment and data, it is important to be able to skillfully manipulate layered software from low to high levels. Operation automation and maintenance operations must be easy and safe. Underlying this is a command-line tool, which can be a waste of time if you don't know it early.

First of all, let's know three of the pattern processing languages. They are "awk", "sed", and "grep". "Awk" can divide, select, and calculate character strings. "Sed" is a character substitution. "Grep" can search for characters. Since it is a unix command, linux and mac work as expected by default, but windows does not go straight, so there is no need to force it at the command prompt (it may be easier to put linux base in windows) .. Recently, there is Windows Subsystem for Linux (WSL) in windows, so you can easily build a linux environment with this. Please refer to Basic memo of WSL (Windows Subsystem for Linux). For the three commands

--General story -UNIX Tool (Pattern processing language grep, sed, awk) --For Mac and linux -[Sed / awk / grep] String replacement / extraction / search and regular expression | Linux Cheat Sheet --For windows --gawk How to use awk introductory commands and how to write scripts --finststr grep with standard Windows command --sed How to write in such a case with sed?

Please read. However, I still don't understand how grateful it is. To give a concrete example, suppose you have a text file (gal.txt) with a catalog of 10,000 celestial bodies. You can find out how many NGC catalogs there are with "grep NGC gal.txt". If you want to shorten the letter "NGC" to "N", you can do it with "sed" s / NGC / N / g "gal.txt". If you want to get the 10th line only after that, you can do it with "awk'NR> 10 {print $ 0} gal.txt'". This is convenient for easily pre-processing data.

Another command is rsync. This is used to sync and copy files. When copying data that grows from moment to moment, you want to avoid copying the already copied data again and copy only the changes. rsync is an excellent command tool that calculates such differences and compresses and copies only the differences, and is still often used in the transfer and experiments of huge data used in space research. For the first time,

-6 rules you should know if you are using rsync for the first time (1/2) -How to use rsync

Please read around.

Low-level scripting language

CUI commands are useful, but you can combine them to do more. For example, suppose you want to perform a series of processes such as "copying a file once an hour, processing the data in the file, and writing it to another file." Such a series of processing flow is called batch processing. An easy way to do this is with a "shell script". There are a lot of explanations on the web, so if you are new to it,

-Introduction to Shell Script Bash -Introduction to shell script Summary of how to write

Please read. I think it is no exaggeration to say that just being able to manipulate this will make a big difference in your research life.

Cron is easy to realize "Perform XX processing once an hour".

-How to use Cron and where it gets stuck -Summary of crontab commands

Please refer to. Also, these days, I often use python's schedule.

-Run Jobs in Python Schedule Library

Please refer to the area.

Advanced scripting language

By the way, the script language is the interpreter language (translating the program line by line into machine language and executing it). "Luxury" here means "object-oriented" (https://ja.wikipedia.org/wiki/object-oriented). In space astronomical, I think that python or ruby is the mainstream, but here we treat python as the king.

How to use python will be described later, so here I will only introduce the general theory. Since the interpreter language interprets sequentially, the execution speed is orders of magnitude slower than the compiler language introduced below. However, when performing simple processing such as vector and matrix operations such as space astronomical, only that part can be read from a program written in C language or Fortran. Many of python's numerical calculation libraries called numpy have succeeded in achieving fast processing even though they are python, by using the legacy high-speed modules of the past. It's a bit technical, but this is because python speaks to the C / C ++ language, thanks to a technique called binding. You can do it.

The idea of python is that no matter who writes it, it will be the same. This was a free world where the same process could be written in a completely different way in the past, such as perl and C, but on the other hand, it was difficult to read and team development. Although python may have strict rules, individual differences are suppressed by that amount, making it a language that is easy for anyone to read.

For example, in C language, the code to calculate the difference sin (i + 1) -sin (i) of sin (x) looks like this.

python


#include <stdio.h>
#include <math.h>
#define f(x) sin(x)
int main() {
    double x1,x2,dy;
  for (int i=0; i<100; i++) {
    x1 = i;
    x2 = i+1;
    dy = f(x2) - f(x1);
    printf("x, f'(x) = %f %f \n", x1,dy);
  }
}

If you write this in python,

python


#!/usr/bin/env python 
import numpy as np
x = np.arange(101)
y = np.sin(x)
dy = y[1:] - y[:-1]
print("x1, dy =", x[:-1], dy)

You can write without a for loop like this. For python, it is calculated using the brackets. This is called "slice", and it is a mechanism that allows you to easily access the elements of the array. For details, see [Python] Summary of slice operations. Please refer.

In space astronomical, there are many operations on vectors and matrices, but python makes it overwhelmingly simpler than the C / C ++ language. The C / C ++ language makes heavy use of for loops, but in python it is easier to calculate and access arrays without looping, as it is said that "without for loops" is more like python. Recently, many tools for machine learning are also based on python. Young people starting now will start with python, so there should be no big mistake.

Compiler language

The compiler language is a type that once compiles the whole program, translates it into machine language so that the program works optimally, generates an executable file, and runs it. C / C ++ is a compiler language. I still use the C / C ++ language wherever I need speed. For example, FPGA and CPU client and server programs are usually written in C ++. People in the high energy field use root, so let's study C ++ as well. C ++ is also a must-have for anyone who uses geant4, which is a representative of Monte Carlo simulation.

Basic knowledge before working with software

Environment variable

Environment variables are those that the OS saves the settings and allows them to be set and referenced by the user or the program to be executed. For example

--"PATH" to set the path --"HOME" that represents the path of the current user's home directory --"LANG" to set the language (Japanese, English, etc.) used by the user --"PWD" representing the current directory --"SHELL" to specify the default shell path

And so on. When I enter the laboratory, I often hear the words "Is it in your PATH?" The meaning is

-Attempts to explain the meaning of "passing through the PATH" as easily as possible -Setting procedure and mechanism to pass PATH on Windows 10 -PATH on Mac

Please refer to. Find out where a program is in the order written in the PATH.

What are CPU, GPU, FPGA, and embedded OS?

Software and hardware are inseparable when studying space astronomical science. As a basis, let's know the foundation on which software works.

--CPU: CPU is a device that executes instructions sequentially. It works without an OS as long as there is an instruction sequence. With an embedded CPU, it is often the case that the operation is checked without installing an OS. The personal computer you usually use has an OS (windows, mac, linux, etc.). The OS is a piece of software that interfaces with the hardware. For example, software called OS is in charge of basic functions such as hitting the keyboard, listening to music, and exchanging files via USB. --GPU: GPU supports the recent trend of machine learning. Unlike the CPU, it is good at processing routine calculations in parallel. While the CPU has several cores, the number of GPU CUDA cores, such as 10,000 cores, is an order of magnitude, and the high degree of parallelism has dramatically advanced machine learning. However, it is not suitable for calculations that cannot be parallelized. --FPGA: FPGA is an abbreviation for Field Programmable Gate Array, and "a large number of rewritable logic circuits are arranged. It becomes "thing". In the past, we used non-rewritable things called ASICs, but the drawback was that they were expensive and non-rewritable. FPGAs are rewritable and available at low cost, making them an indispensable part of modern space astronomical experimental equipment. It is written in software called VHDL and Verilog, but recently, methods that can be designed in C language have become widespread. You can also make a CPU with FPGA. The CPU called microblaze is one of the famous ones. --Embedded OS: Embedded OS / System (also called an embedded system) is embedded in digital devices and home appliances to provide specific functions. The computer system that runs it. In the case of space astronomical, it is used when advanced real-time performance and specialized functions are required. For example, it is possible to mount mac and windows on an artificial satellite, but it is too versatile and wasteful. Recently, [Practical application of Xilinx Zynq UltraScale + MPSoC in space](https://forums.xilinx.com/t5/Adaptable-Advantage-Blog/Partner-RTEMS-is-being-used-in-NASA-and- ESA-missions-and-now-it / ba-p / 1019266) is underway.

In this way, the software handled by space astronomical astronomical affairs is diverse, such as whether the stage on which it runs is the CPU, the GPU, the language that describes the FPGA, or the software for the CPU equipped with an embedded OS. .. Hardware and software may be designed separately, but recently, a design method called co-design, which designs while considering the functions of hardware and software, has become important. , Young people should know.

About the editor

An editor for writing software is like paper and a pen, which is quite important.

-emacs This is an old royal road editor that is diversified in linux and experiments. You should know it as a culture. -sublimetext An easy-to-write editor for developing python. --Emacs users should also refer to What Emacs users should know when writing python code in Sublime Text.

There are many others, but find one that's easy for you to use.

Setting up and using python

How to install python

For space astronomical systems, it is recommended to install python at anaconda. Contains a set of astropy, matplotlib, ipython, etc. Since the 2nd system of python officially stopped in March 2020, let's use the 3rd system of python. However, for those who may use the old python2 series, it is recommended to install anaconda python at pyenv. pyenv can switch the python version for each directory, so it is recommended for people with mixed versions of modules.

Plot using matplotlib

The python version of MATLAB has been exploding in the last few years and I recommend matplotlib. Installation will be done automatically if you include anaconda.

For example, fft can be easily done with python. For example

-How to confirm the Persival theorem using the Fourier transform (FFT) of matplotlib and scipy

Please refer to. The speed is not much different from C ++ etc. because the FFT part is executed by the fortran binary.

Nowadays, many people, such as experiments, observations, and cosmology, draw diagrams with matplotlib, so it is a good idea to master them.

-Basic knowledge of matplotlib that I wanted to know early, or the story of an artist who can adjust the appearance -Match summary of matplotlib

It is kind to the first person that there are a lot of commentary articles in Japanese.

How to search while looking at a beautiful picture of the universe

A beautiful picture is one of the attractions of the universe. It would be interesting to find out what kind of observation data was available in the past while looking at it. At the moment,

-ESAsky European Space Agency (ESA) tool. -DARTS / JUDO2 A tool of the Japanese space agency (JAXA / ISAS).

I think it is well done. Technology is advancing rapidly in this area, so I think that new and good ones will come out one after another. Here are two pictures for those who have trouble pressing links.

qiita_esasky_judo2.png

So how do you use this for your research? As an example, I will explain using the XMM-Newton satellite.

-I tried to explain the data analysis of XMM-Newton satellite in easy Japanese

Searching for celestial bodies and data seems to be easy, and each observation must be unique and carefully viewed. There are others, but let's make good use of useful tools.

Space catalog research using python

You can now use astroquery to get unified access to various space catalogs from python. Here are some examples and ways to do it.

-Searching the space astronomical catalog using python's astroquery and simple plotting method using galaxies -How to search using python's astroquery and get fits images with skyview -How to plot visible light data of galaxy using OpenNGC database in python -How to plot multiple fits images side by side using python

Astronomical analysis

NASA related software

-ds9 Software dedicated to image analysis. A royal road tool for image analysis regardless of wavelength. These days you can use it just by downloading it.

--Download the required software from NASA's heasoft page. (For space astronomical)

There are two ways to run hesoft, one is to download and compile the source code, and the other is to do it in binary. Try it in binary first, it should work just by passing it through the path, and if it doesn't work, try compiling from source code.

"Pass through the path" means to put the following description in a configuration file such as .bashrc. Is to write. For example, I have two machine environments at hand, so Here are two examples.


export HEADAS=/usr/local/astroh/software/HEASOFT/Hitomi_Release_03_rc1/x86_64-debian-linux-gnu-libc2.19-18+
source $HEADAS/headas-init.sh
## For suzaku analysis 
export CALDB=/usr/local/xray/caldb
export CALDBCONFIG=$CALDB/software/tools/caldb.config
export CALDBALIAS=$CALDB/software/tools/alias_config.fits

In short, set $ HEADAS and set where headas-init.sh is.

Since satellite-related tools require satellite setting data called CALDB, Download CALDB and set the path.

data form

When you start studying space, you'll see FITS (read as Fitz) many times, which is standard in the astronomical industry. File format.

The basic information of the header part is an uncompressed text file, and the data consists of a compressed binary file. For information on binary files, see What are text and binary files? (Https://c.keicode.com/lang/text-vs-binary.php).

Other than astronomical, HDF5 is a file that is compatible with python and is often used. The fits file is a file whose specifications have been hardened for space astronomical use, and it is rarely used for anything other than astronomical astronomical use. To see the fits file, use heasoft's command line tool fv or astropy. use.

statistics

You can find various information in Repository of Teruyo Enokido's Public Materials by Mr. Enokido (RIKEN). One of them

-Statistical analysis of experimental data

Let's read.

The need to take notes electronically

When doing space astronomical analysis, it is necessary to take a large amount of logs. The number of file names is also the same, and it is information such as arguments and messages when the program is run. This is important experimental data and needs to be recorded. It's also good for you. There will always be times when it will be good for me in the future, such as what I did later, the program was running, but I was actually throwing an error. It is not the amount of information that can be recorded by hand, so let's log it electronically. Recently, there are various tools such as evernote and bear, so you can customize them to your liking. (Note: It doesn't mean that you don't need a paper lab notebook.)

Writing a dissertation, searching literature, learning English

retrieval method

Get in the habit of "searching in English" once you enter the world of research. If you can get enough information in Japan, that's fine, but if you become a little specialized, the amount of information is overwhelmingly larger in English-speaking countries, so get in the habit of searching in English.

Paper writing

Learn tex early. tex is also a language. These days, brew installs tex in one shot, so it's easiest for anyone who can use it.

--tex install (for Mac users) - [Tex] How to install Tex in Mac by Homebrew --For windows

Recently, the use of overleaf is increasing. This is best for collaborative editing online.

Nowadays, it's called beamer, so you can make slides and notes with tex. What is beamer? The person who says

-How to use Beamer: Easy way to switch between slides and sentences

Please refer to.

Literature search method

ads is the royal road. There should be almost no papers on the universe that cannot be searched here. If you search for one paper, you can increase the number of documents in a potato-like manner by subscribing to the papers that refer to the paper or the papers that are citation. In this era, there are so many papers that you can't read even if it takes a lifetime, so let's train to read papers from a young age with selection and strength.

Another famous one is arxiv. Papers from all over the world are gathered here at the fastest speed. However, please note that some papers will be published before they are accepted, so some papers will be rejected later. After that, even the accepted papers are not 100% correct. Science requires a lot of trial and verification, and even a good journal article often turns out to be a mistake later.

We are in an era of information floods, and while we can easily get the right information, it is difficult to determine what information we really want and whether it is true or false. To put it the other way around, people who have the ability to quickly collect truly necessary information and determine the truth are also human resources needed not only by research but also by society. Studying space physics and astronomy is also the best stage to enjoy such power.

Studying science English

We recommend grammarly. It will tell you why this is bad, so repeating it will reduce the error. The number of grammatical mistakes pointed out by the teacher should be almost zero.

How to write a scientific paper

There are various schools, but if you are new to this, please refer to Professor Makishima's materials.

-How to write Japanese: To everyone in M2 -[How to write in English: Check Manual for Science English (6th Edition)](http://energetic-universe.phys.su-tokyo.ac.jp/wordpress/wp-content/uploads/2018/05/Makishima_ENG. pdf)

Generally, the writing style and school differ depending on the field, so write the style according to the journal to which you are submitting your dissertation. I think the most important thing in a dissertation is to convey one new thing as the core. It doesn't matter if people all over the world are surprised to say "new". Any small step, a step that no one has yet reached, is a valuable scientific activity. The goal will become clearer if you are conscious of making a paper, so make a habit of always being aware of making a paper.

PC operation, settings

How to copy the display

If you want the screen of your PC and the projector to be the same, Turn on mirroring. If you want to change it, turn off mirroring. http://tokyo.secret.jp/macs/macbook-mirroring-option.html

There are two types of moving the screen of a personal computer to a display: analog and digital. Recently, HDMI cables (digital) are the mainstream. Please note that the HDMI cable can transmit two types of video and audio, so if the audio output of your computer is HDMI, you may get "no sound" from the microphone.

Tips for not breaking your laptop

--Unclick by pressing the trackpad deeply

Clicking with a deep press on the trackpad causes mechanical wear. For mac users, System Preferences ==> Trackpad ==> Tap to click Click with one finger to check. Now you can click with a soft touch. (If you care more, you can use an external keyboard, mouse, and trackpad, but then you don't have to be a laptop.)

--Do not leave the power on for a long time.

From the viewpoint of the life of the power supply, there are various theories as to whether it is better to turn the power on / off frequently or to reduce it. However, if it's obviously slow, it's often a waste of energy just because the dead app doesn't free up memory, so I personally turn it off about once a week. Please keep it.

About installation tools (for mac users)

There are three types of mac, fink, macport, and homebrew, but if you are setting up a PC from now on, I definitely recommend using homebrew. I will. The other two are now due to lack of support.

There is nothing wrong with homebrew, but be careful, use only one. If they are mixed, the setting file may be rewritten when one is moved, which is often addictive.

Password search method (for mac users)

For mac users, there is a tool called keychain that manages passwords as a function of the OS without having to remember the password or manage the text file in which the password is written. In the spotlight (magnifying glass mark on the upper right), hit keychain. http://www.danshihack.com/2014/01/05/junp/mactips_keychain.html (Application-> Utilities-> keychain is fine, but spotlights are faster.)

Screenshot (for mac users)

--Press "Command + shift + 4" all at the same time. https://support.apple.com/ja-jp/HT201361

--Follow the selection screen with "Command + shift + 5"

It depends on the version of macOS, so please refer to the explanation of your OS in Official explanation. ..

Tools for advanced users

-xraylib An excellent product that can easily output the cross-sectional area of photoelectric absorption of X-rays and the fluorescence yield. --uproot A handy tool that makes it easy to use root from python.

Shortcut settings

Let's practice to use some typical shortcuts.

--"Command + q": Exit application --"Command + tab": Switch applications -Shortcut to switch windows with the same Mac application

How to use zsh (bonus)

Here's a quick introduction to what's good about zsh. It's newer than bash and tcsh, so there are many good things. If you want to ask "what is a shell?", Please read shell.

--File search is instant

For example, if you can't remember where you saved it, but you're looking for a file with the extension cc,

python


> ls **/*If it is cc,

It will look for * cc files under the current directory.

--History search for each command is possible, and options can be remembered immediately.

For example, when you can't remember what options you put after rsync

python


> rsync

If you type up to and press ctl + p, it will tell you the history of typing rsync + ○○.

In addition, please refer to http://blog.blueblack.net/item_204.

For those who want easy initial setup, in .zshrc,

python


HISTFILE=$HOME/.zsh-history # used for storing commands
HISTSIZE=100000
SAVEHIST=100000
setopt extended_history # add time stump for the history
function history-all { history -E 1 } # output all history when putting history-all

autoload -U compinit
compinit

autoload colors
colors
PROMPT="%{${fg[cyan]}%}[%n] %(!.#.$) %{${reset_color}%}"
#PROMPT="%{${fg[cyan]}%}[%n@%m] %(!.#.$) %{${reset_color}%}"
PROMPT2="%{${fg[cyan]}%}%_> %{${reset_color}%}"
SPROMPT="%{${fg[red]}%}correct: %R -> %r [nyae]? %{${reset_color}%}"
RPROMPT="%{${fg[cyan]}%}[%~]%{${reset_color}%}"

#setopt list_types
setopt auto_list
setopt autopushd
setopt pushd_ignore_dups
setopt auto_menu

# get history with ctl+p copied from Yuasa
autoload history-search-end
zle -N history-beginning-search-backward-end history-search-end
zle -N history-beginning-search-forward-end history-search-end
bindkey "^P" history-beginning-search-backward-end
bindkey "^N" history-beginning-search-forward-end
setopt list_packed
zstyle ':completion:*:default' menu select=1

If you enter, you can use various functions.

Recommended Posts

Software training for those who start researching space
Image analysis performed with google Colab for those who start space research
Loose articles for those who want to start natural language processing
AWS ~ For those who will use it ~
Join Azure Using Go ~ For those who want to start and know Azure with Go ~
Dart grammar for those who only know Python
For those who have done pip uninstall setuptools
Anxible points for those who want to introduce Ansible
For those who want to write Python with vim
For those who get Getting Key Error:'length_seconds' on pytube
For those who can't install Python on Windows XP