[PYTHON] Download XBRL file from EDINET (personal memo)

I read the book described in the reference material and made it with Python. Initially, I was planning to write the items and numerical values of the securities report from the XBRL file, but Guidelines for creating taxonomies by submitter I gave up once because it seemed difficult unless I read and understood% 20.pdf). I would like to continue as soon as the other priorities are cleared.

Reference material

There is a detailed description of XBRL in the above book, so please have a look at the book. I think this book is a very good book because it describes not only XBRL but also know-how and time series analysis for effective data analysis using R language.

Regarding the acquisition of XBRL files, the API URL is in the book. "http://resource.ufocatcher.com/atom/edinetx/query/" Although it is "http://resource.ufocatch.com/atom/edinetx/query/" Is the URL of the current API. The name of the service seems to be "reported catcher", so it may have been ufocatcher before.

Reference code

It's not particularly interesting because it just downloads the file, but I'll make a note of it.

python


# coding: utf-8

import requests
import xml.etree.ElementTree as ET
from collections import defaultdict
import json
import os
from zipfile import ZipFile
from StringIO import StringIO

def get_link_info_str(ticker_symbol, base_url):
    url = base_url+ticker_symbol
    response = requests.get(url)
    return response.text
    
def get_link(tree, namespace):
    #print ET.tostring(tree)
    yuho_dict = defaultdict(dict)
    for el in tree.findall('.//'+namespace+'entry'):
        title = el.find(namespace+'title').text
        if not is_yuho(title): continue
        print 'writing:',title[:30],'...'
        _id = el.find(namespace+'id').text
        link = el.find('./'+namespace+'link[@type="application/zip"]')
        url = link.attrib['href']
        yuho_dict[_id] = {'id':_id,'title':title,'url':url}
    return yuho_dict
    
def is_yuho(title):
    if u'Securities report' in unicode(title):
        return True
    else:
        return False
    
def write_download_info(ofname):
    with open(ofname,'w') as of:
        json.dump(dat_download, of, indent=4)
    
def download_all_xbrl_files(download_info_dict,directory_path):    
    for ticker_symbol, info_dicts in download_info_dict.items():
        save_path = directory_path+ticker_symbol
        if not os.path.exists(save_path):
            os.mkdir(save_path)
            
        for _id, info_dict in info_dicts.items():
            _download_xbrl_file(info_dict['url'],_id,save_path)
    
def _download_xbrl_file(url,_id,save_path):
    r = requests.get(url)
    if r.ok:
        #print url
        path = save_path+'/'+_id
        if not os.path.exists(path):
            os.mkdir(path)
        r = requests.get(url)
        z = ZipFile(StringIO(r.content))
        z.extractall(path) # unzip the file and save files to path.
    
if __name__=='__main__':
    base_url = 'http://resource.ufocatch.com/atom/edinetx/query/'
    namespace = '{http://www.w3.org/2005/Atom}'
    t_symbols = ('1301','2432',)
    
    for t_symbol in t_symbols:
        response_string = get_link_info_str(t_symbol, base_url)
        ET_tree = ET.fromstring( response_string )
        ET.register_namespace('',namespace[1:-1])
        
        dat_download = defaultdict(dict)
        # get download file info
        info_dict = get_link(ET_tree,namespace)
        dat_download[t_symbol] = info_dict
        
        ofname = os.getcwd()+'/downloaded_info/dat_download_'+t_symbol+'.json'
        write_download_info(ofname)
        
        directory_path = os.getcwd()+'/xbrl_files/'
        download_all_xbrl_files(dat_download,directory_path)

The part that got stuck is that the XBRL file (group) is provided as a zip file.

It seems that you don't need to use StringIO if you use urllib2 instead of requests. ..

comment

I would like to extract the contents of the report as soon as possible. If you notice something strange, it would be greatly appreciated if you could comment.

Recommended Posts

Download XBRL file from EDINET (personal memo)
Information extraction from EDINET XBRL files
Download the file from S3 using boto.
Download XBRL of securities report, quarterly report, financial report from EDINET / TDNET with Python
[OpenCV] Personal memo
youtube download memo
Download the image from the text file containing the URL
Django memo # 1 from scratch