[PYTHON] Tips for using ElasticSearch in a good way

Personal notes

install ~ setup

Orchestrated with fabric

fabfile.py


def install_elastic_search():
    sudo("wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.2.1.deb")
    sudo("dpkg -i elasticsearch-1.2.1.deb")
    sudo("service elasticsearch start")
    run("echo export PATH=\$PATH:/usr/share/elasticsearch/bin/ >> ~/.bashrc")

def es_init():
    //changing permission
    //see document

Cooperation with td-agent

Receive logs from various servers on the log server containing elasticsearch Conditional branching with td-agent (fluentd) forest plugin or extracting information from tags is recommended.

td-agent.conf



<source>
  type forward
  port 24224
</source>

<match **>
 type forest
 subtype elasticsearch
 <template>
   host localhost
   port 9200
   index_name ${tag_parts[2]}
   type_name ${tag_parts[1]}

   buffer_type memory
   flush_interval 3s
   retry_limit 17
   retry_wait 1.0
   num_threads 1
   flush_at_shutdown true
 </template>
</match>

This is a little flush_interval 3s is not good. Maybe in production it's so frequent that it affects performance.

By the way, if you set interval to the timing like Akan with td-agent, the data that should have been PUT to ElasticSearch will not be reflected easily, so if you think that it is slow to reflect, you may want to play with it.

test

Existence test

fabfile_test.py



def all():
  availability_test("td-agent")
  availability_test("elasticsearch")

def availability_test(name):
  env.warn_only = True

  if name == "td-agent":
    version_checker = name + " --version"
  elif name == "elasticsearch":
    version_checker = "export PATH=$PATH:/usr/share/elasticsearch/bin/ && " + name + " -v"

  if "command not found" in run(version_checker):
    print(name + " hasn't been installed")
  else:
    print(name + " has been installed")
  env.warn_only = False

If you think about it now, it's better to use the fabric test tool envassert, and I should be refactoring it now.

envassert is easy to set up and doesn't have to be as rugged as serverspec. No, it's very good to be able to write in rspec-like notation, and I think that mocha is also good in JS, but serverspec has too much to prepare and I want to test the environment for infrastructure testing. I came and touched it, but I'm afraid. So it's rubyist but fiblic, not chef.

Log system integration test

Dirty Rewrite around python

bash



#! /bin/bash
###############################################
# function
###############################################

initializing () {
  if ! expr $before : '[0-9]*' 1> /dev/null 2> /dev/null ; then
    before=0
  fi

  if [ -z $num ]; then
    num=0
  fi

  if [ $num -le $before ]; then
    num=$(($before+1))
  fi
}

buffering () {
  waiting=$*
  for i in `seq 1 $waiting`
  do
    left=$(($waiting - $i))
    echo $left sec
    sleep 1
  done
}

log_into_td_es () {
  fab -u <your_name> -i <your_pem> -H <your_domain> all
  buffering 1
}

diff_check () {
  diff=$(($after - $before))
  if [ $diff -eq 1 ]; then
    echo diff: $diff
  else
    echo not changed
    exit 1
  fi
}

# delete_all () {
#   curl -XDELETE 'http://$*:9200/*' 1> /dev/null 2> /dev/null 
# }




###############################################
# main
###############################################
before=`curl -XGET http://$*:9200/fluentd/_count 2> /dev/null | cut -d "," -f 1 | cut -d ":" -f 2`
echo before_count: $before
initializing
curl -XPUT http://$*:9200/fluentd/info/$num -d '{ "test" : "hoge" }' 1> /dev/null 2> /dev/null
after=`curl -XGET http://$*:9200/fluentd/_count 2> /dev/null | cut -d "," -f 1 | cut -d ":" -f 2`
echo after_count: $after
diff_check

Use logs

You should use ʻelasticsearch-py`. I'm rubyist, but python may be used in infrastructure. Fit to the team. I'm not particular about it.

curl -X GET 'http://hoge.com:9200/_index/_type/_search?pretty=true&size=1000&sort=desc'

You can also check it.

See the state of ES

Eye grep

curl -X GET 'http://hoge.com:9200/_stats?pretty

Thousands of lines of statistical information come out. I usually look at the amount of documents (and storage). I haven't sharded or replicated so far, so I'll do it later.

Life and death monitoring

Those who get ES statistics on a regular basis and visualize them.

something like that.

Recommended Posts

Tips for using ElasticSearch in a good way
Tips for using Selenium and Headless Chrome in a CUI environment
A simple way to avoid multiple for loops in Python
Introducing a good way to manage DB connections in Python
Problems when using Elasticsearch as a data source in Redash
[TouchDesigner] Tips for for statements using python
Tips for implementing a slightly difficult Model or Training in Keras
A memo about using Colab Pro for about 2 months (good points / bad points)
How to make a model for object detection using YOLO in 3 hours
Windows → linux Tips for bringing in data
Tips for dealing with binaries in Python
Impressions of using Flask for a month
Change the list in a for statement
Precautions when using for statements in pandas
Reading and creating a mark sheet using Python OpenCV (Tips for reading well)
Tips for using python + caffe with TSUBAME
Scraping a website using JavaScript in Python
Get a token for conoha in python
Tips for building large applications in Flask
Draw a tree in Python 3 using graphviz
Notes for using python (pydev) in eclipse
Tips for making small tools in python
A useful note when using Python for the first time in a while
Searching for an efficient way to write a Dockerfile in Python with poetry
A memorandum of method often used in machine learning using scikit-learn (for beginners)
Is there a good sample code for nosetests?
Tips for hitting the ATND API in Python
A proposal for versioning of features in Kedro
Enter a specific value for variable in tensorflow
Create a MIDI file in Python using pretty_midi
I wrote a Japanese parser in Japanese using pyparsing.
Let's make a module for Python using SWIG
Record YouTube views in a spreadsheet using Lambda
I want to set up a mock server for python-flask in seconds using swagger-codegen.
Create a data collection bot in Python using Selenium
Released a web service for scoring handwriting using DeepLearning
Directory structure for test-driven development using pytest in python
Register a task in cron for the first time
Building a training environment for penetration testing using Naumachia
Try searching for a million character profile in Python
Try face detection in real time using a webcam
I built a Wheel for Windows using Github Actions
Command line collection for using virtual environment in Anaconda
When you want to plt.save in a for statement
How to execute a command using subprocess in Python
Fixed a way to force Windows to boot in UEFI
Memo for building a machine learning environment using Python
Tips for opening a scene with a broken reference with a script
Set a proxy for Python pip (described in pip.ini)
Seeking a unified way to wait and get for state changes in Selenium for Python elements
I can't send emails from Lambda in a VPC with Boto3 using a VPC endpoint for SES