[PYTHON] Launch a simple password-protected search service in 5 minutes

TL; DR (Summary for experienced people)

I made docker-compose.yml which is a collection of the following containers:

--Elasticsearch: Search engine --Kibana: Simple search screen --nginx: Reverse proxy (light user authentication by ID / PW) --Python3: For working with processing and indexing data

background

You might want to search for some text data in your home, right?

The search function itself can be provided by Elasticsearch

--I want a simple search screen so that it can be used with non programmers (Kibana) --Since it is secret data, I want to use password authentication (basic authentication by nginx) ――I want to be able to add data easily

In many cases, requests such as ... come up together.

We have created an environment to meet the above requirements. [^ 1]

[^ 1]: Basic authentication may be a little more nifty if you use Elasticsearch Security It may be possible. Also, the method introduced below uses http connection, and it is rugged in terms of security, so please consider converting it to https by referring to this article: https://qiita.com/hsano/items/3b4fb372c4dec0cabc08

What I made

You can find them all in this repository: https://github.com/chopstickexe/es-kibana-nginx-python

The main contents are as follows:

--docker-compose.yml: Configuration file for launching and linking 4 containers of Elasticsearch, Kibana, nginx, Python3 together. --index.py: Sample code for registering (indexing) data to Elasticsearch in Python3 container --data / sample.csv: Sample data registered in index.py (As you can see, there are 4 personal impressions of Nolan's work)

Docker Compose Overview

With this docker-compose, the following environment will be launched: IMG_0028.jpg

--Although four containers of Elasticsearch, Kibana, nginx, and Python are launched, nginx is the only container whose port is mapped to the host (HTTP access is possible from outside the host). Therefore, Elasticsearch and Kibana can only be accessed by those who know the ID / PW set in nginx. --Elasticsearch index data is stored in a directory somewhere on the host (local volume). Therefore, even if the container is closed and restarted, the same index can be accessed again. --Kibana can be accessed with a simple address such as http: // host address. --Since the source code and data are mounted in the Python container, it is possible to enter this container with docker exec and execute Python code that processes and indexes the data. [^ 2]

[^ 2]: Also, use VSCode's Remote Containers extension to connect to a container and edit / execute the code. Is also possible.

How to move

The explanation below is based on Ubuntu 18.04, but if docker-compose or Apache HTTP server (more specifically, the htpasswd command) works, it will work on CentOS, Mac, and Windows.

Preparation 1. Install docker-compose

First, prepare an environment where docker-compose is installed. Reference

Preparation 2. Install Apache HTTP Server

First, install apache2-utils (a package that contains an HTTP server).

$ sudo apt -y install apache2-utils

1. clone the Git repository

Clone the above repository chopstickexe / es-kibana-nginx-python.

2. Preparation of login ID and password

Then, think about an ID and password to log in to Kibana (search screen) that will be launched after this. (In the example below, log in with ID = admin)

Execute the htpasswd command as follows to create the password-set file (clone directory) / htpasswd / localhost.

$ cd /path/to/repo
$ mkdir htpasswd && cd htpasswd
$ htpasswd -c localhost admin 
#Enter your password here

3. (Optional) Edit docker-compose.yml

If you want to access Kibana from a machine other than the host via a web browser

Open docker-compose.yml in the cloned directory and select [VIRTUAL_HOST setting value of Kibana container](https://github.com/chopstickexe/es-kibana-nginx-python/blob/master/docker-compose.yml Change # L25) from localhost to the IP address of the host or an FQDN like foo.bar.com.

If another service is already up on port 80 of the host

Port mapping of nginx container from 80:80 to host free Port: Change to 80.

4. Start the Docker container

Start the container with the docker-compose command below.

$ cd /path/to/this/directory
$ docker-compose up

5. Make sure you can access Kibana from your web browser

If you haven't modified docker-compose.yml, go to the host's browser, http: // localhost, If you change it, open http: // host address from the browser of your environment, log in with the set ID and password, and check that the Kibana screen can be seen.

6. Register sample data in Elasticsearch

Go back to the host machine's terminal and enter the Python container with the following command:

$ docker exec -it python bash

After entering the Python container, create a virtual environment (venv) with the following command and pip install the required packages there:

# python -m venv .venv
# source .venv/bin/activate
(.venv) # pip install -r requirements.txt

After installing the package, register the sample data in Elasticsearch's nolan index with the following command:

(.venv) # python index.py
Finished indexing

The Python script index.py running here is here. Below, the data type of the column RELEASE_DATE is set to date and the format is set to yyyyMMdd.


    es.indices.create(
        index,
        body={
            "mappings": {
                "properties": {"RELEASE_DATE": {"type": "date", "format": "yyyyMMdd"}}
            }
        },
    )

7. Confirm that you can search on Kibana

Access Kibana again from your web browser and set the following:

Create Index Pattern

Menu on the left side of the screen (If it is not displayed, click the three on the upper left) Select Kibana> Index Patterns from and enter nolan for the index pattern name. If the above Python code can be executed without any problem, the message Your index pattern matches 1 source will appear. Click Next step:

Screenshot from 2020-09-28 09-20-29.png

Set the RELEASE_DATE column in the Time field and click Create index pattern.

Screenshot from 2020-09-28 09-20-45.png

Set Time range to ~ 11 years ago-> now on Discover screen

Select Discover from the menu on the left side of the screen, click the calendar icon in the middle of the screen, and set the Time range to Relative> 11 years ago. (Since it contains a fairly old RELEASE_DATE review, it will not hit the search unless you do this)

If set correctly, you will see 4 reviews as below:

Screenshot from 2020-09-28 09-18-20.png

You can also search for reviews that include "Tom Hardy" on this Discover screen:

Screenshot from 2020-09-28 09-18-48.png

Reference material

Please also refer to here for how to use kibana: https://qiita.com/namutaka/items/b67290e75cbd74cd9a2f

Recommended Posts

Launch a simple password-protected search service in 5 minutes
Write a binary search in Python
Write a depth-first search in Python
Implementing a simple algorithm in Python 2
Launch a Python script as a service
Run a simple algorithm in Python
A simple HTTP client implemented in Python
Try drawing a simple animation in Python
Create a simple GUI app in Python
Write a simple greedy algorithm in Python
Launch a Flask app in Python Anywhere
Write a simple Vim Plugin in Python 3
Create a web service in Flask-SQLAlchemy + PostgreSQL
Set up a simple HTTPS server in Python 3
Build a Django environment with Vagrant in 5 minutes
A simple Pub / Sub program note in Python
Create a custom search command in Splunk (Streaming Command)
Create a simple momentum investment model in Python
Set up a simple SMTP server in Python
Implement similar face search in half a day
[Docker] Create a jupyterLab (python) environment in 3 minutes!
Implement a circular expression binary search in Python. There is a comparison with a simple full search.
Write a super simple molecular dynamics program in python
Make a simple Slackbot with interactive button in python
Introducing gae-init to launch CMS on GAE in 5 minutes
Set up a free server on AWS in 30 minutes
[Simple procedure] To log in to ssh without a password