Dies ist die Geschichte, was zu tun ist, wenn Sie eine Tabelle mit Jahr, Region, Unterregion, Land, Ländercode und Bevölkerung wünschen.
LocID,Location,Time,PopTotal,SubRegName,GeoRegName
392,Japan,2019,126860299,Eastern Asia,Asia
Sie können die erforderlichen Daten von https://population.un.org/wpp/ herunterladen.
Der Inhalt von WPP2019_TotalPopulationBySex.csv sieht folgendermaßen aus.
LocID,Location,VarID,Variant,Time,MidPeriod,PopMale,PopFemale,PopTotal,PopDensity
...
392,Japan,2,Medium,1950,1950.5,40602.499,42199.585,82802.084,227.132
392,Japan,2,Medium,1951,1951.5,41380.556,42935.709,84316.265,231.285
Zusätzlich zum Ländernamen enthalten die Elemente LocID und Location auch Regionsnamen wie Asien und Ostasien, sodass sie nicht so verwendet werden können, wie sie sind. Informationen zum Ausrichten nur für Länder mit Ausnahme von Regionsnamen finden Sie unter WPP2019_F01_LOCATIONS.XLSX. In dieser Datei
Es gibt drei Blätter, von denen die Datenbank nur maschinenlesbar ist. Zum Beispiel der Artikel von Japan
Index, Location, Notes, LocID, ISO3_Code, LocType, LocTypeName, ParentID, WorldID, SubRegID, SubRegName, SDGSubRegID, SDGSubRegName, SDGRegID, SDGRegName, GeoRegID, GeoRegName
133, Japan, 392, JPN, 4, Country/Area, 906, 900, 906, Eastern Asia, 1832, Eastern and South-Eastern Asia, 935, Asia
ist. Sie können also Folgendes sehen.
(Notizbuch-Link: https://colab.research.google.com/drive/160xZ5tAGKb1enC0LU2JYEOA6m3l3w1cn?usp=sharing)
Nach einer Untersuchung bis zu diesem Punkt wird endlich mit der Arbeit begonnen. Laden Sie zunächst WPP2019_TotalPopulationBySex.csv.
import pandas as pd
population_src = pd.read_csv("WPP2019_TotalPopulationBySex.csv")
population_src.head()
LocID | Location | VarID | Variant | Time | MidPeriod | PopMale | PopFemale | PopTotal | PopDensity | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 4 | Afghanistan | 2 | Medium | 1950 | 1950.5 | 4099.243 | 3652.874 | 7752.117 | 11.874 |
1 | 4 | Afghanistan | 2 | Medium | 1951 | 1951.5 | 4134.756 | 3705.395 | 7840.151 | 12.009 |
2 | 4 | Afghanistan | 2 | Medium | 1952 | 1952.5 | 4174.450 | 3761.546 | 7935.996 | 12.156 |
3 | 4 | Afghanistan | 2 | Medium | 1953 | 1953.5 | 4218.336 | 3821.348 | 8039.684 | 12.315 |
4 | 4 | Afghanistan | 2 | Medium | 1954 | 1954.5 | 4266.484 | 3884.832 | 8151.316 | 12.486 |
Extrahieren Sie nur die erforderlichen Informationen.
population = population_src[population_src.Variant == "Medium"][["LocID", "Location", "Time", "PopTotal"]]
population["PopTotal"] = (population["PopTotal"] * 1000).astype(int)
population
LocID | Location | Time | PopTotal | |
---|---|---|---|---|
0 | 4 | Afghanistan | 1950 | 7752116 |
1 | 4 | Afghanistan | 1951 | 7840151 |
2 | 4 | Afghanistan | 1952 | 7935996 |
3 | 4 | Afghanistan | 1953 | 8039684 |
4 | 4 | Afghanistan | 1954 | 8151316 |
Laden von WPP2019_F01_LOCATIONS.XLSX.
locations_src = pd.read_excel('WPP2019_F01_LOCATIONS.XLSX', sheet_name="DB")
locations_src.head()
Index | Location | Notes | LocID | ISO3_Code | LocType | LocTypeName | ParentID | WorldID | SubRegID | SubRegName | SDGSubRegID | SDGSubRegName | SDGRegID | SDGRegName | GeoRegID | GeoRegName | MoreDev | LessDev | LeastDev | oLessDev | LessDev_ExcludingChina | LLDC | SIDS | WB_HIC | WB_MIC | WB_UMIC | WB_LMIC | WB_LIC | WB_NoIncomeGroup | MaxHIV_Male | MaxHIV_Female | MaxHIV_BothSexes | YearMaxHIV_BothSexes | HIVAIDSMortalityImpact_AgePattern | HIVAIDSMortalityImpact_e0 | TotPop2019LessThan90k | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | WORLD | NaN | 900 | NaN | NaN | NaN | 0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1 | 2 | UN development groups | a | 1803 | NaN | 25.0 | Label/Separator | 900 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2 | 3 | More developed regions | b | 901 | NaN | 5.0 | Development group | 1803 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3 | 4 | Less developed regions | c | 902 | NaN | 5.0 | Development group | 1803 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
4 | 5 | Least developed countries | d | 941 | NaN | 5.0 | Development group | 902 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
Extrahieren Sie nur die erforderlichen Informationen.
location = locations_src[locations_src.LocType == 4][["LocID", "SubRegName", "GeoRegName"]]
location.head()
LocID | SubRegName | GeoRegName | |
---|---|---|---|
26 | 108 | Eastern Africa | Africa |
27 | 174 | Eastern Africa | Africa |
28 | 262 | Eastern Africa | Africa |
29 | 232 | Eastern Africa | Africa |
30 | 231 | Eastern Africa | Africa |
Kombinieren Sie Bevölkerungs- und Länderdaten.
population_by_countries = population.merge(location)
population_by_countries.head()
LocID | Location | Time | PopTotal | SubRegName | GeoRegName | |
---|---|---|---|---|---|---|
0 | 4 | Afghanistan | 1950 | 7752116 | Southern Asia | Asia |
1 | 4 | Afghanistan | 1951 | 7840151 | Southern Asia | Asia |
2 | 4 | Afghanistan | 1952 | 7935996 | Southern Asia | Asia |
3 | 4 | Afghanistan | 1953 | 8039684 | Southern Asia | Asia |
4 | 4 | Afghanistan | 1954 | 8151316 | Southern Asia | Asia |
Lassen Sie uns nach japanischen Daten suchen.
population_by_countries[(population_by_countries.Location == "Japan") & (population_by_countries.Time == 2019)]
LocID | Location | Time | PopTotal | SubRegName | GeoRegName | |
---|---|---|---|---|---|---|
16377 | 392 | Japan | 2019 | 126860299 | Eastern Asia | Asia |
Es sieht gut aus, also werde ich es speichern.
population_by_countries.to_csv("population_by_countries.csv", index=False)
!head population_by_countries.csv
LocID,Location,Time,PopTotal,SubRegName,GeoRegName
4,Afghanistan,1950,7752116,Southern Asia,Asia
4,Afghanistan,1951,7840151,Southern Asia,Asia
4,Afghanistan,1952,7935996,Southern Asia,Asia
4,Afghanistan,1953,8039684,Southern Asia,Asia
4,Afghanistan,1954,8151316,Southern Asia,Asia
4,Afghanistan,1955,8270992,Southern Asia,Asia
4,Afghanistan,1956,8398873,Southern Asia,Asia
4,Afghanistan,1957,8535157,Southern Asia,Asia
4,Afghanistan,1958,8680097,Southern Asia,Asia
Ich werde hier aufzeichnen, dass ich viel Zeit damit verbracht habe, die weltlichen Daten der Weltbevölkerung zu erhalten.
Recommended Posts