[PYTHON] Extract Pokemon GO Pokemon data and skill data

I'm sorry to say that it's a private matter, but I restarted Pokemon Go from around January last year. I've been doing it for a year since then, but there was a big addition on the Pokemon Go side as well. It's a long-awaited online match. I wasn't interested in it, but when I tried it, it was unexpectedly fun and I wanted to think about simulations and efficient battles myself. Then how do you collect the Pokemon data and skill data? It is a story.

Overview

1. Examination of data collection method

This was the first method I came up with. All status and skill data in Pokemon Go, including overseas sites and domestic sites, are open to the public. However, since it is not intended for secondary use, a method of scraping from the web is required.

Proposal 1. Collection from website
This is a method of scraping from the page published on the capture site.
Proposal 2. Extract from in-game data
This is what is explained by the word "analysis". I thought that if the expanded data could be obtained, it could be used.

2. Adopt "extract from in-game data"

A repository that publishes expanded data where it has demonstrated its search power such as Google search [[pokemongo-dev-contrib / pokemongo-game-master](https://github.com/pokemongo-dev-contrib/pokemongo- game-master)] was encountered. So I thought that one case was settled, but ** there were too many extra things in the data **. In addition to Pokemon and skill data, it also includes player costumes and various setting values.

3. Find the part you need from the in-game data

By the way, this file is named ** GAME_MASTER.json ** and contains a lot of Pokemon GO setting data such as costume data, PvP battle setting values, Pokemon data, skill energy, etc. as mentioned above. Because of that, it is a large size of 3.4MB and there is waste. First, we will extract the necessary data from it.

Since there are various data, first look at it and find the data part of Pokemon.

GAME_MASTER.Partial excerpt and shaping of Pokemon data from json


{
  "itemTemplates": [{
   "templateId": "V0001_POKEMON_BULBASAUR",
    "pokemonSettings": {
      "pokemonId": "BULBASAUR",
      "type": "POKEMON_TYPE_GRASS",
      "type2": "POKEMON_TYPE_POISON",
      "stats": {
        "baseStamina": 128,
        "baseAttack": 118,
        "baseDefense": 111
      },
      "quickMoves": ["VINE_WHIP_FAST", "TACKLE_FAST"],
      "cinematicMoves": ["SLUDGE_BOMB", "SEED_BOMB", "POWER_WHIP"],
  }, 
  }],

It was there. In the figure, only the data part that was considered important this time was extracted, but various other data are also included. Although it is not enough to summarize it, it can be estimated from the key name and data that it will be as follows.

Key Data content
templateId Pokemon data identifier
pokemonId Pokemon name
stats Race value
type type
quickMoves Technique 1
cinematicMoves Technique 2

In addition, this data does not seem to include techniques for a limited time.

4. Programmatic data extraction

At this point, I think you are 100 times more detailed than I am, but I will post my program as well. Each data is separated by a comma so that it can be used in csv format. I didn't want to be nested if I used the technique data as it is, so I forcibly attached :: as a delimiter.

GAME_MASTER_pokemon_parser.py


# -*- coding: utf-8 -*-

import json
import re

#Prepare a regular expression pattern that matches Pokemon data
pattern = '^V0\d+_POKEMON_.+'

f = open('GAME_MASTER.json', 'r')
json_dict = json.load(f)

#As the top key"itemTemplates"Since there is, first deploy it.
for json_list in json_dict["itemTemplates"]:
	templateIdData = json_list["templateId"]
	result = re.match(pattern, templateIdData)

	#Extract the necessary data when it matches the Pokemon data
	if (result):
		#Since there are Pokemon that do not have technique data, access dictionary data in get format
		#(Doble didn't have the skill)
		quickMoves = json_list["pokemonSettings"].get("quickMoves")
		quickMovesStr = ""
		if quickMoves is not None:
			for quickMovesStrTemp in quickMoves:
				quickMovesStr = quickMovesStr + quickMovesStrTemp + "::"
		cinematicMoves = json_list["pokemonSettings"].get("cinematicMoves")
		cinematicMovesStr = ""
		if cinematicMoves is not None:
			for cinematicMovesStrTemp in cinematicMoves:
				cinematicMovesStr = cinematicMovesStr + cinematicMovesStrTemp + "::"
		outPokemon = json_list.get("templateId") + "," + str(json_list["pokemonSettings"].get("type")) + "," + str(json_list["pokemonSettings"].get("type2")) + "," + str(json_list["pokemonSettings"]["stats"]["baseStamina"]) + "," + str(json_list["pokemonSettings"]["stats"]["baseAttack"]) + "," + str(json_list["pokemonSettings"]["stats"]["baseDefense"])+ ',' +quickMovesStr + ',' + cinematicMovesStr
		print(str(outPokemon))

When executed, the standard output will be output in the following format. ** Pokemon data identifier, Pokemon type 1, Pokemon type 2, race value stamina, race value attack, race value defense, race value technique 1, race value technique 2 **

Execution result.csv (partial excerpt)


V0808_POKEMON_MELTAN,POKEMON_TYPE_STEEL,None,130,118,99,THUNDER_SHOCK_FAST::,FLASH_CANNON::THUNDERBOLT::
V0809_POKEMON_MELMETAL,POKEMON_TYPE_STEEL,None,264,226,190,THUNDER_SHOCK_FAST::,FLASH_CANNON::THUNDERBOLT::HYPER_BEAM::ROCK_SLIDE::SUPER_POWER::

5. Remaining tasks

First of all, I was able to retrieve the Pokemon data as raw data. However, this data has the following problems:

(1) Not compatible with Japanese (2) There is no picture book number (3) Arora, shadow, rewrite, normal (?) Arrangement (4) Limited moves are not included

It seems that (2)-(4) can be processed mechanically. Regarding (1), I wonder if I can prepare the picture book number and mapping data. However, what should I do with the technique data?

6. Postscript

When I was able to access the API directly, it was full of tech articles such as search, individual value check, and GO Plus modoki. Due to the end of the boom and the measures taken by the operating company, it went down at once ... I'm just grateful to the users who continued to play and analyze.

bonus

The technique data at the time of PvP is as follows. The most troublesome of these is ** "durationTurns" **. It represents the rigid time [s] of technique 1 during PvP, but it seems that the rigid time [s] = 1 + durationTurns of the technique. And when the rigidity time was 1 [s], this key was not available. ** VOLT_SWITCH is 4 seconds rigid, DRAGON_BREATH is 1 second rigid **

    "templateId": "COMBAT_V0250_MOVE_VOLT_SWITCH_FAST",
    "combatMove": {
      "uniqueId": "VOLT_SWITCH_FAST",
      "type": "POKEMON_TYPE_ELECTRIC",
      "power": 12.0,
      "vfxName": "volt_switch_fast",
      "durationTurns": 3,
      "energyDelta": 16
    }

    "templateId": "COMBAT_V0204_MOVE_DRAGON_BREATH_FAST",
    "combatMove": {
      "uniqueId": "DRAGON_BREATH_FAST",
      "type": "POKEMON_TYPE_DRAGON",
      "power": 4.0,
      "vfxName": "dragon_breath_fast",
      "energyDelta": 3
    }

Recommended Posts

Extract Pokemon GO Pokemon data and skill data
Extract csv data and calculate
Relationship between Firestore and Go data type conversion
Extract data from S3
Follow Blender's data structure and extract vertex coordinates from fbx
Extract Twitter data with CSV
Extract and package initrd images
Point and Figure Data Modeling
Algebraic data types and FizzBuzz
I tried to extract players and skill names from sports articles