[LINUX] I tried ranking the user name and password of phpMyAdmin that was targeted by the server attack

Overview

I read The story of a suspicious attack from all over the world as soon as I opened a blog on my own server and checked the log of my server. It was confirmed that the access that seems to be the same cracking purpose was coming.

Among them, there were many attacks that I was relying on without guessing the user name and password of phpMyAdmin, so I counted the number of attacks on them and ranked them.

analysis

I got the Apache HTTP server log (/ var / log / httpd / access_log) of the server (CentOS 6.10) contracted with Sakura VPS. If you look at the contents, you can see that access is being attempted with a user name and password that does not apply to phpMyAdmin as shown below.

xxx.xxx.xxx.xxx - - [04/Oct/2018:02:57:44 +0900] "GET /phpMyAdmin/index.php?pma_username=root&pma_password=1qaz3edc5tgb&server=1 HTTP/1.1" 200 14538 "-" "Mozilla/5.0"
xxx.xxx.xxx.xxx - - [04/Oct/2018:02:57:46 +0900] "GET /phpMyAdmin/index.php?pma_username=root&pma_password=1qaz@123&server=1 HTTP/1.1" 200 14546 "-" "Mozilla/5.0"
xxx.xxx.xxx.xxx - - [04/Oct/2018:02:57:59 +0900] "GET /phpMyAdmin/index.php?pma_username=root&pma_password=1qaz@2wsx&server=1 HTTP/1.1" 200 14533 "-" "Mozilla/5.0"
xxx.xxx.xxx.xxx - - [04/Oct/2018:02:58:03 +0900] "GET /phpMyAdmin/index.php?pma_username=root&pma_password=1qaz@WSX&server=1 HTTP/1.1" 200 14543 "-" "Mozilla/5.0"
xxx.xxx.xxx.xxx - - [04/Oct/2018:02:58:06 +0900] "GET /phpMyAdmin/index.php?pma_username=root&pma_password=1qaz@WSX3edc&server=1 HTTP/1.1" 200 14538 "-" "Mozilla/5.0"
xxx.xxx.xxx.xxx - - [04/Oct/2018:02:58:09 +0900] "GET /phpMyAdmin/index.php?pma_username=root&pma_password=1qaz@wsx&server=1 HTTP/1.1" 200 14533 "-" "Mozilla/5.0"

I extracted this information using python and made a histogram information. The python code is listed as an appendix at the end.

Basic information of the log to be analyzed

Acquisition period: 2018/09/26 ~ 2020/01/22 Total number of lines: 658858 lines Of these, what seems to be an attack that seems to be relying on phpMyAdmin password without guessing: 186499 line

Number of user name attacks (25 types in total)

Ranking User name Number of times
1 root 185504
2 wordpress 183
3 admin 138
4 wp 47
5 blog 45
6 pma 45
7 shop 43
8 money 42
9 popa3d 40
10 joomla 39
11 http 35
12 ueer 35
13 project 35
14 nginx 33
15 apache 33
16 sql 33
17 db 32
18 nas 32
19 shopdb 31
20 dbs 31
21 web 28
22 backupdb 6
23 wordspress 5
24 backup 2
25 backups 2

Number of password attacks (top 30 of all 953 types)

Ranking password Number of times
1 pass 408
2 password 372
3 361
4 admin 359
5 123 347
6 root 344
7 123456 326
8 welcome 323
9 r00t 322
10 monkey 322
11 whatever 322
12 abc123 321
13 aa123456 321
14 123123 319
15 mysql 318
16 login 318
17 111111 318
18 password123 318
19 1234567890 317
20 access 317
21 666666 316
22 apache 315
23 oracle 315
24 654321 315
25 root123 315
26 123qwe 314
27 1234567 314
28 12345678 314
29 pass123 314
30 letmein 313

(By the way) Passes attacked and the number of times (2 types in total)

Ranking path Number of times
1 /phpmyadmin/index.php 173687
2 /phpMyAdmin/index.php 12812

Consideration

All rankings will be placed at here.

The overwhelming majority of user names are root, and other blog-related terms such as wordpress and blog, database-related terms such as admin and pma, and server-related terms such as apache are lined up. Compared to Article analyzing attack log on ssh server, the attack on ssh targets the person's name and OS name, while phpMyAdmin It is interesting that there are subtle differences in trends, such as blog-related terms being targeted in attacks on.

The top three passwords were "pass", "password" and "" (none). There wasn't a lot of prominent ones, and it seemed that the things that tended to be aimed at were evenly targeted. It seems to match the tendency of "worst password" as stated in here. The patterns were as follows.

In addition, the paths that were the targets of the attack were also tabulated. Only the following two types were targeted for attack.

Target access list to prevent unauthorized access Or About URLs that are easy to be targeted It is thought that the reason why there are few variations compared to the path mentioned in the article is that this time is the result after extracting the row under the condition that "pma_password =" is included. When I actually looked at the log with the extraction condition set to "contains" phpMyAdmin "", I found that access to paths other than the two types listed this time was also attempted.

in conclusion

I have little knowledge of security, so I don't know how to secure it, but at least I decided to stop using the user name and password that are often found in databases in the default path.

That is all for the text. Thank you for reading. The following is an appendix.

appendix

Python code to extract the required information from the log. This time, the following lines were analyzed.

xxx.xxx.xxx.xxx - - [04/Oct/2018:02:58:03 +0900] "GET /phpMyAdmin/index.php?pma_username=root&pma_password=1qaz@WSX&server=1 HTTP/1.1" 200 14543 "-" "Mozilla/5.0"

If "pma_password =" is included, it is considered to be analyzed. Information is extracted from here by dividing it by spaces, specific characters, or keywords. Here is an example of information and how to retrieve it. This may differ depending on the log file export settings, etc., so please correct it accordingly.

information How to take out
ip address 0th divided by space
Time stamp "["When"]"String enclosed in
Method (GET,POST etc.) 6th divided by space. However, since the first character is a double quote, delete it
phpMyAdmin path From the 7th element separated by a space"?"Until just before
User name "pma_username="When"&"String between
password "pma_password="When"&"の間の文字列。(ただしこの部分のqueryでurlが終了している場合があり、そのWhenきは後ろの"&"が存在しないため、スペースの直前までWhenする)

Not all lines can be extracted this way, but I'm ignoring it because I think some exceptions will have little effect. That area is appropriate. After retrieving the element, count the number of times of interest (this time phpMyAdmin path, username, password). Finally, it sorts in descending order and displays it on standard output.

The whole source code looks like the following.

from collections import defaultdict

with open('/path/to/access_log','r') as f:
    logs = f.readlines()

#Extraction of attacks on phpMyAdmin
pma_attacks = [log for log in logs if 'pma_password' in log]

#ip, time_stamp, method, path, username,Extract password
#Rewrite as appropriate according to the log format
extracted_pma_infos = []
for pma in pma_attacks:
    #The ip address is the 0th element separated by spaces
    ip = pma.split(' ')[0] 
    #The time stamp is"["From"]"String up to
    time_stamp = pma.split('[')[1].split(']')[0] 
    #Method (POST, GET, etc.) is the sixth element separated by spaces with the first character removed.
    method = pma.split(' ')[5][1:]
    #path is the sixth element separated by spaces"?"Character string up to just before
    path = pma.split(' ')[6].split('?')[0]
    #The user name is"pma_username="Most recent"?"String up to
    username = pma.split('pma_username=')[1].split('&')[0]
    #The password is"pma_password="Most recent"?"String up to
    password = pma.split('pma_password=')[1].split('&')[0]
    #Exception handling when password is last query
    #At this time, the latest"?"Does not exist and is included up to the end of the log, so it can be done just before the space
    if ' ' in password:
        password = password.split(' ')[0]
    extracted_pma_infos.append([ip,time_stamp,method,path,username,password])

#Histogram of attacked paths, usernames and passwords
pathlist = defaultdict(int)
unlist = defaultdict(int)
pslist = defaultdict(int)
for pma in extracted_pma_infos:
    path = pma[3]
    un = pma[4]
    ps = pma[5]
    pathlist[path]+=1
    unlist[un]+=1
    pslist[ps]+=1

#Histogram data of arguments(dict_obj)A function that parses and creates a list in descending order
def orderedlistFromDict(dict_obj):
    count = list(set([dict_obj[v] for v in dict_obj]))
    count.sort()
    orderedlist = []
    for c in count[::-1]:
        for key in dict_obj:
            if dict_obj[key] == c:
                orderedlist.append((key,dict_obj[key]))
    return orderedlist

#List of attacked paths, usernames, passwords by number of times
ordered_path = orderedlistFromDict(drlist)
ordered_un = orderedlistFromDict(unlist)
ordered_ps = orderedlistFromDict(pslist)

#View pass ranking
for val in ordered_path:
    print(val[0],val[1],sep=',')

#Display user name ranking
for val in ordered_un:
    print(val[0],val[1],sep=',')

#Password ranking display
for val in ordered_ps:
    print(val[0],val[1],sep=',')

Recommended Posts

I tried ranking the user name and password of phpMyAdmin that was targeted by the server attack
I tried the asynchronous server of Django 3.0
I tried to verify and analyze the acceleration of Python by Cython
I tried to score the syntax that was too humorous and humorous using the COTOHA API.
I displayed the chat of YouTube Live and tried playing
I tried to make a script that traces the tweets of a specific user on Twitter and saves the posted image at once
I refactored "I tried to make a script that saves posted images at once by going back to the tweets of a specific user on Twitter".
I tried moving the image to the specified folder by right-clicking and left-clicking
I tried to visualize the age group and rate distribution of Atcoder
The file name was bad in Python and I was addicted to import
I tried to find the optimal path of the dreamland by (quantum) annealing
I tried to extract and illustrate the stage of the story using COTOHA
I checked the number of closed and opened stores nationwide by Corona
I tried to verify the result of A / B test by chi-square test
The tree.plot_tree of scikit-learn was very easy and convenient, so I tried to summarize how to use it easily.
I tried to wake up the place name that appears in the lyrics of Masashi Sada on the heat map
[Python] I tried to analyze the characteristics of thumbnails that are easy to play on YouTube by deep learning
This and that of the inclusion notation.
I tried to notify the update of "Hamelin" using "Beautiful Soup" and "IFTTT"
Talking about the features that pandas and I were in charge of in the project
I tried to predict the presence or absence of snow by machine learning.
I tried to rescue the data of the laptop by booting it on Ubuntu
I tried to pass the G test and E qualification by training from 50
I tried searching for files under the folder with Python by file name