[PYTHON] Created an HTTP proxy that can record and play HTTP responses

As the title suggests, I created an HTTP proxy that can record and play back HTTP responses.

What i wanted

I'm developing a crawler at work, and I've always wanted something to mock an external service and return an HTTP response to write a system-wide E2E test using a crawler.

I was looking for the following four functions. ** 1. You can record an HTTP response and respond based on the recorded data ** ** 2. The recorded data can be easily edited to simulate any HTTP response ** ** 3. Can be used as an HTTP proxy to make it independent of any particular language or testing framework ** ** 4. Supports HTTPS **

There was a library that mocked an HTTP client that can be used for testing and returned a response, and there were various things that could create a mock by simulating an API response, but I could not find the one that could be used exactly, so I decided to make it. Did.

By the way, although it did not meet the purpose of this time, there is an OSS called mountebank that can be used for testing microservices, and it seems to be insanely convenient because it can also play arbitrary responses. was. There is also a library made by netflix called Polly.JS, which is limited to JavaScript (browser, node.js), and is an HTTP server. You can make a stub of and return an arbitrary response, and you can adjust the response time time, which seems to be super convenient. I would like to use both of them someday if there is a purpose to fit.

How it works

After a lot of research, I found a great HTTP proxy software called mitmproxy that fits this purpose. The "mitm" of mitmproxy comes from * Man In The Middle attack *, and by terminating SSL communication with this proxy, it is possible to manipulate the contents even if it is an HTTPS request response. Great HTTP proxy! (Scary)

Although mitmproxy itself does not have a function to record and play HTTP responses, it has a very convenient function that you can insert an arbitrary Python script at various points such as when receiving a request or when sending a response, so that it can be extended. Therefore, I made a plug-in to record and play back HTTP responses using that mechanism.

image.png

The source code is available here. https://github.com/Chanmoro/record-and-replay-proxy

We have published a Docker image for easy use. https://hub.docker.com/r/chanmoro/record-and-replay-proxy

How to use

The Docker image is published on Docker Hub (https://hub.docker.com/r/chanmoro/record-and-replay-proxy) so it's easy to use.

1. Record HTTP response

Start the HTTP proxy in recording mode with the following command.

$ docker run -it --rm -p 8080:8080 -v ${PWD}/response_data:/app/response_data chanmoro/record-and-replay-proxy record

The response recorded here is saved in / app / response_data, so mount the local directory so that the data remains.

2. Send an HTTP request (record)

Here, as an example, save the response when accessing https://github.com/ using curl.

$ curl -k -x localhost:8080 https://github.com/

At this time, curl uses the self-signed certificate issued by mitmproxy for SSL communication. If you send a request in the default state, an error will occur in the SSL validity check, so uncheck it.

Recorded HTTP response

The recorded HTTP response is stored in response_data in the current directory mounted on the Docker container.

$ tree response_data
response_data
└── https%3A%2F%2Fgithub.com%2F #Request URL becomes the directory name
    └── GET                     #HTTP method
        ├── metadata            #HTTP method, request URL, HTTP response status
        ├── response_body       #Response body
        └── response_header     #Response header

metadata stores the HTTP method, request URL, and HTTP response status of the recorded request.

metadata


{
    "url": "https://github.com/",
    "method": "GET",
    "status": 200
}

The response body is recorded in response_body.

response_body


<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
  <link rel="dns-prefetch" href="https://github.githubassets.com">
  <link rel="dns-prefetch" href="https://avatars0.githubusercontent.com">
  <link rel="dns-prefetch" href="https://avatars1.githubusercontent.com">
  <link rel="dns-prefetch" href="https://avatars2.githubusercontent.com">
  <link rel="dns-prefetch" href="https://avatars3.githubusercontent.com">
  <link rel="dns-prefetch" href="https://github-cloud.s3.amazonaws.com">
...

The response header is recorded in response_header.

response_header


Date: Thu, 12 Dec 2019 10:20:49 GMT
Content-Type: text/html; charset=utf-8
Transfer-Encoding: chunked
Server: GitHub.com
Status: 200 OK
Vary: X-PJAX
...

3. Play back the recorded HTTP response

To play the recorded HTTP response, start the HTTP proxy in replay mode. The directory to be mounted on / app / response_data specifies the directory that contains the recorded response data.

$ docker run -it --rm -p 8080:8080 -v ${PWD}/response_data:/app/response_data chanmoro/record-and-replay-proxy replay

4. Send HTTP request (play)

If you send an HTTP request with the HTTP proxy running in replay mode as you did during recording, the recorded response will be returned. In replay mode, the proxy does not access https://github.com/, so you can use it offline.

$ curl -k -x localhost:8080 https://github.com/

Let's edit the response data so that we can confirm that it has been replayed. Edit the data as follows.

metadata


{
    "url": "https://github.com/",
    "method": "GET",
    "status": 500
}

response_body


This is recorded response!

response_header


Date: Thu, 12 Dec 2019 10:20:49 GMT
Status: 500 Error
Hoge: This is Hoge header.

Let's check if the response is returned as edited.

$ curl -v -k -x localhost:8080 https://github.com/
...
> GET / HTTP/1.1
> Host: github.com
> User-Agent: curl/7.54.0
> Accept: */*
> 
< HTTP/1.1 500 Internal Server Error
< Date: Thu, 12 Dec 2019 10:20:49 GMT
< Status: 500 Error
< Hoge: This is Hoge header.
< content-length: 26
< 
* Connection #0 to host localhost left intact
This is recorded response!

You can see that the HTTP status, response header, and response body are as edited. The best!

Summary

This article introduces the functions and usage of the HTTP proxy record-and-replay-proxy that can record and play back the created HTTP response. Did.

This makes it easy to test the ability to access external services and APIs.

Please use it!

Recommended Posts

Created an HTTP proxy that can record and play HTTP responses
Set up an FTP server that can be created and destroyed immediately (in Python)
Use the pip command under an HTTP proxy environment that requires authentication