[PYTHON] I want to know the population of each country in the world.

Target

Screen Shot 2020-05-09 at 0.11.02.png

This is the story of what to do when you want a table with the year, region, sub-region, country, country code, and population.

LocID,Location,Time,PopTotal,SubRegName,GeoRegName
392,Japan,2019,126860299,Eastern Asia,Asia

Conclusion

You can download the required data from https://population.un.org/wpp/.

The contents of WPP2019_TotalPopulationBySex.csv look like this.

LocID,Location,VarID,Variant,Time,MidPeriod,PopMale,PopFemale,PopTotal,PopDensity
...
392,Japan,2,Medium,1950,1950.5,40602.499,42199.585,82802.084,227.132
392,Japan,2,Medium,1951,1951.5,41380.556,42935.709,84316.265,231.285

This LocID and Location items include region names such as Asia and East Asia in addition to country names, so they cannot be used as they are. Refer to WPP2019_F01_LOCATIONS.XLSX to align only to countries except region names. In this file

There are three sheets, of which the DB is just machine readable. For example, the item of Japan is

Index, Location, Notes, LocID, ISO3_Code, LocType, LocTypeName, ParentID, WorldID, SubRegID, SubRegName, SDGSubRegID, SDGSubRegName, SDGRegID, SDGRegName, GeoRegID, GeoRegName
133, Japan, 392, JPN, 4, Country/Area, 906, 900, 906, Eastern Asia, 1832, Eastern and South-Eastern Asia, 935, Asia

is. So you can see the following.

Create data with Jupyter

(Notebook link: https://colab.research.google.com/drive/160xZ5tAGKb1enC0LU2JYEOA6m3l3w1cn?usp=sharing)

After investigating up to this point, the work is finally started. First, load WPP2019_TotalPopulationBySex.csv.

import pandas as pd

population_src = pd.read_csv("WPP2019_TotalPopulationBySex.csv")
population_src.head()
LocID Location VarID Variant Time MidPeriod PopMale PopFemale PopTotal PopDensity
0 4 Afghanistan 2 Medium 1950 1950.5 4099.243 3652.874 7752.117 11.874
1 4 Afghanistan 2 Medium 1951 1951.5 4134.756 3705.395 7840.151 12.009
2 4 Afghanistan 2 Medium 1952 1952.5 4174.450 3761.546 7935.996 12.156
3 4 Afghanistan 2 Medium 1953 1953.5 4218.336 3821.348 8039.684 12.315
4 4 Afghanistan 2 Medium 1954 1954.5 4266.484 3884.832 8151.316 12.486

Extract only the necessary information.

population = population_src[population_src.Variant == "Medium"][["LocID", "Location", "Time", "PopTotal"]]
population["PopTotal"] = (population["PopTotal"] * 1000).astype(int)
population
LocID Location Time PopTotal
0 4 Afghanistan 1950 7752116
1 4 Afghanistan 1951 7840151
2 4 Afghanistan 1952 7935996
3 4 Afghanistan 1953 8039684
4 4 Afghanistan 1954 8151316

Loading WPP2019_F01_LOCATIONS.XLSX.

locations_src = pd.read_excel('WPP2019_F01_LOCATIONS.XLSX', sheet_name="DB")
locations_src.head()
Index Location Notes LocID ISO3_Code LocType LocTypeName ParentID WorldID SubRegID SubRegName SDGSubRegID SDGSubRegName SDGRegID SDGRegName GeoRegID GeoRegName MoreDev LessDev LeastDev oLessDev LessDev_ExcludingChina LLDC SIDS WB_HIC WB_MIC WB_UMIC WB_LMIC WB_LIC WB_NoIncomeGroup MaxHIV_Male MaxHIV_Female MaxHIV_BothSexes YearMaxHIV_BothSexes HIVAIDSMortalityImpact_AgePattern HIVAIDSMortalityImpact_e0 TotPop2019LessThan90k
0 1 WORLD NaN 900 NaN NaN NaN 0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 2 UN development groups a 1803 NaN 25.0 Label/Separator 900 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 3 More developed regions b 901 NaN 5.0 Development group 1803 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 4 Less developed regions c 902 NaN 5.0 Development group 1803 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 5 Least developed countries d 941 NaN 5.0 Development group 902 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

Extract only the necessary information.

location = locations_src[locations_src.LocType == 4][["LocID", "SubRegName", "GeoRegName"]]
location.head()
LocID SubRegName GeoRegName
26 108 Eastern Africa Africa
27 174 Eastern Africa Africa
28 262 Eastern Africa Africa
29 232 Eastern Africa Africa
30 231 Eastern Africa Africa

Combine population and country data.

population_by_countries = population.merge(location)
population_by_countries.head()
LocID Location Time PopTotal SubRegName GeoRegName
0 4 Afghanistan 1950 7752116 Southern Asia Asia
1 4 Afghanistan 1951 7840151 Southern Asia Asia
2 4 Afghanistan 1952 7935996 Southern Asia Asia
3 4 Afghanistan 1953 8039684 Southern Asia Asia
4 4 Afghanistan 1954 8151316 Southern Asia Asia

Let's look for Japanese data.

population_by_countries[(population_by_countries.Location == "Japan") & (population_by_countries.Time == 2019)]
LocID Location Time PopTotal SubRegName GeoRegName
16377 392 Japan 2019 126860299 Eastern Asia Asia

It looks good so I will save it.

population_by_countries.to_csv("population_by_countries.csv", index=False)
!head population_by_countries.csv
LocID,Location,Time,PopTotal,SubRegName,GeoRegName
4,Afghanistan,1950,7752116,Southern Asia,Asia
4,Afghanistan,1951,7840151,Southern Asia,Asia
4,Afghanistan,1952,7935996,Southern Asia,Asia
4,Afghanistan,1953,8039684,Southern Asia,Asia
4,Afghanistan,1954,8151316,Southern Asia,Asia
4,Afghanistan,1955,8270992,Southern Asia,Asia
4,Afghanistan,1956,8398873,Southern Asia,Asia
4,Afghanistan,1957,8535157,Southern Asia,Asia
4,Afghanistan,1958,8680097,Southern Asia,Asia

Bonus hardship story

So, I'll record here that I've spent a lot of time just to get the mundane data of the world population.

Recommended Posts

I want to know the population of each country in the world.
I want to know the legend of the IT technology world
I want to know the features of Python and pip
[Linux] I want to know the date when the user logged in
I want to customize the appearance of zabbix
I want to display the progress in Python!
I want to batch convert the result of "string" .split () in Python
I want to explain the abstract class (ABCmeta) of Python in detail.
I want to leave an arbitrary command in the command history of Shell
[Python] I want to know the variables in the function when an error occurs!
I want to use Python in the environment of pyenv + pipenv on Windows 10
I want to fully understand the basics of Bokeh
I want to set a life cycle in the task definition of ECS
I want to see a list of WebDAV files in the Requests module
I want to write in Python! (3) Utilize the mock
I want to store the result of% time, %% time, etc. in an object (variable)
I want to use the R dataset in python
I want to increase the security of ssh connections
I just want to find the 95% confidence interval for the difference in population ratios in Python
I used Python to find out about the role choices of the 51 "Yachts" in the world.
I want to use only the normalization process of SudachiPy
I want to get the operation information of yahoo route
I want to judge the authenticity of the elements of numpy array
I want to make the Dictionary type in the List unique
I want to align the significant figures in the Numpy array
I didn't want to write the AWS key in the program
Programming to fight in the world ~ 5-1
Programming to fight in the world 5-3
Programming to fight in the world ~ 5-2
[For beginners] I want to explain the number of learning times in an easy-to-understand manner.
I was in charge of maintaining the Fabric script, but I don't know.> <To those who
[Natural language processing] I tried to visualize the remarks of each member in the Slack community
I want to get the name of the function / method being executed
I want to manually assign the training parameters of the [Pytorch] model
How to know the internal structure of an object in Python
I want to read the html version of "OpenCV-Python Tutorials" OpenCV 3.1 version
I want to output the beginning of the next month with Python
I wrote the code to write the code of Brainf * ck in python
LINEbot development, I want to check the operation in the local environment
I want to make the second line the column name in pandas
I want to pass the G test in one month Day 1
I want to check the position of my face with OpenCV!
I used the worldcup command to check the outcome of the World Cup.
I want to pin Spyder to the taskbar
I want to output to the console coolly
I want to print in a comprehension
I want to handle the rhyme part1
I want to know how LINUX works!
I want to handle the rhyme part3
I didn't know the basics of Python
I want to display the progress bar
I want to embed Matplotlib in PySimpleGUI
I want to handle the rhyme part2
I want to handle the rhyme part5
I want to handle the rhyme part4
I want to change the color by clicking the scatter point in matplotlib
[Note] I want to completely preprocess the data of the Titanic issue-Age version-
I don't want to admit it ... The dynamical representation of Neural Networks
[C language] I want to generate random numbers in the specified range
Python Note: When you want to know the attributes of an object
(Python Selenium) I want to check the settings of the download destination of WebDriver