[PYTHON] Create and run embulk config in Jupyter

Dynamically generate many embulk config files (hereafter embulk config) And I think there are occasional cases where you want to do that.

For embulk, please refer to the following page. http://qiita.com/hiroysato/items/397f36c4838a0a93e352 http://qiita.com/hiroysato/items/da45e52fb79c39547f69

When Jupyter can generate and execute embulk config file It's convenient to proceed through trial and error. I think that the efficiency of creating embulk config will also increase.

Create embulk config

    f=open('[file name]','w')
    setting = '''in:\n\
  type: gcs\n\
  bucket: xxxx\n\
  path_prefix: aaa/bbb/ccc_\n\
  auth_method: private_key\n\
  service_account_email: {{ env.SERVICE_ACCOUNT_EMAIL }}\n\
  p12_keyfile: ../key/{{ env.P12_FILENAME }}\n\
  application_name: zzz\n\
  tasks: 1\n\
  parser:\n\
    charset: UTF-8\n\
    newline: LF\n\
    header_line: true\n\
    type: csv \n\
    delimiter: \',\' \n\
    quote: \'\"\' \n\
    columns: \n\
    - {name: name, type: string}\n\
    - {name: title, type: string}\n\
    - {name: words, type: string}\n\
\n
out: \n\
  type: file \n\
  path_prefix: tmp \n\
  file_ext: txt \n\
  formatter: \n\
    type: csv \n\
    charset: UTF-8 \n\
    delimiter: \'\\\' \n\
    header_line: false \n\
    newline: LF'''

    f.write(setting)
    f.close()

I'm sorry that there is nothing special about it, Just write the contents of embulk config to a file. I think that the output embulk config will be easier to see if you add "\ n " to the line breaks.

embulk run

  os.system('embulk run [file name]')

Do it, paying attention to the path.

Usage case

If you have many tables you want to migrate, or if you want to separate files for each type of data, It is convenient to use. When it becomes possible to dynamically create many embulk configs using for statements etc. It becomes troublesome to manually create embulk config one by one.

Example

Generate and execute a file according to the multiplication of categories 1 to 5

for a in [1, 2, 3, 4, 5]:
  for b in [1, 2, 3, 4, 5]:
    filename = a + '-' + b '_xxx.yml.liquid'
    f.open(filename,'w')
    setting = '''in:\n\
      [embulk setting]
    '''
    f.write(setting)
    f.close()
    os.system('embulk run ' + filename)

Finally

Have embulk in, out, filter, etc. as separate character strings It will be more convenient if you generate embulk config by combining them.

Jupyter files runipy You can also batch execute using It was also easy to periodically execute the completed process after trial and error.

I think that the contents described this time can only be done without Jupyter, Generate and run embulk config from most recent Jupyter, It was easy to operate other related processing with Jupyter. I have listed it here.

Recommended Posts

Create and run embulk config in Jupyter
Create and read messagepacks in Python
Create and deploy Flask apps in PTVS
Automatically create word and excel reports in python
Bash in Jupyter
Coexistence of Anaconda 2 and Anaconda 3 in Jupyter + Bonus (Julia)
Create initial settings and staff apps in Django
jupyter qtconsole config
Create and run Discord Bot on one Android device
Create Amazon Linux with AWS EC2 and log in
Create a plugin to run Python Doctest in Vim (2)
Create a plugin to run Python Doctest in Vim (1)
Automatically save .py and .html files in Jupyter notebook.
Run pandas-highcharts display_charts in an environment other than jupyter
Get stock prices and create candlestick charts in Python
Create SpatiaLite in Python
Jupyter in Cloud9 IDE
Edit the config file and run docker-compose up (Django + MySQL ④)
Until you create Python Virtualenv on Windows and launch Jupyter
Get and create nodes added and updated in the new version
How to create and use static / dynamic libraries in C
Create code that outputs "A and pretending B" in python
Create an authentication feature with django-allauth and CustomUser in Django
How to create dataframes and mess with elements in pandas