Python, known as the frozen tuna scripting language, It is also famous for its many beautifully named packages.
Reference: 7 Python terms you want to read aloud http://doloopwhile.hatenablog.com/entry/20120120/1327062714
Fascinated by these beautifully named packages, ** How beautifully named packages exist ** ** I decided to seriously investigate. ** **
The reference information is a little old as 2012, and If you search again now, you will surely find ** even more beautiful names **! !!
Python package management system = ** pip ** for all target packages It is registered in ** PyPI **. https://pypi.org/
The total number is "** 219,370 *"! ( As of February 2020) It is not an amount that can be confirmed manually.
I want to exclude dormant packages that aren't used at all ** Packages that have been installed more than once in the last year ** I would like to target. For example, the reference site lists, ** Pychinko ** Does not seem to exist in the world anymore and is excluded ** Pyzuri ** Unfortunately, there seems to be no download at all, so it is excluded.
All these package names and their download information are It can be obtained by using ** pypinfo ** and ** BigQuery ** (details will be described later).
Because the package name is alphanumeric ** Forcibly read katakana ** Performs Japanese conversion processing. (Because the package name is not a simple English word This is a fairly difficult process)
Finally, using the ** "Beautiful Word List" ** that I made in advance Search for the package name in Japanese.
With such a steady effort ~~ Omoshiroi ~~ A package with a beautiful name ** I was able to find a lot! ** **
Before the code, I will introduce the result earlier. I found many things, but I chose 18 of them. If the reference article is the Heisei edition, 18 selections of Reiwa version, abbreviated as "** R18 **".
Please enjoy the beautiful naming sense of ** Paison ** with example sentences.
31,001 DL in the last year A tool to determine if a name is female or male.
Let's tell a new programmer loudly in April. ** Example: If you don't understand, ask [sex machine]! ** **
163 DL in the last year Mantissa form and utility widget library.
Let's scream at work in April. ** Example: I used to play with [methanal] all the time on holidays **
64,492 DL in the last year A tool to fix console command errors. https://github.com/nvbn/thefuck
When an error occurs, just say "Fat !!" It seems that it is popular because it automatically responds to those who make a surprise voice.
Let's read it in April. ** Example: [the fuck] [the fuck] [the fuck]! !! ** **
427 DL in the last year A Python package that easily sends data to Microsoft Azure SQL DB. https://github.com/dacker-team/pyzure
Although the original pyzuri has disappeared, he has discovered a new talent.
Let's talk to everyone in April. ** Example: I'm glad I tried [pyzure] last night **
78 DL in the last year It is a cli tool for ** inserting ** data into remote AskOmics.
Let's kindly remind you in April. ** Example: When inserting, first [askocli] **
71 DL in the last year Windows / Linux, which allows you to send and receive complete messages Python 2 and 3 compatible socket wrapper.
Let's confide secretly in April ** Example: I'm actually using [stockings] **
34 DL in the last year Details are unknown. It can be awkward because there is no documentation.
Let's talk in April ** Example: I'm addicted to [osex] and I'm in trouble **
488 DL, 109 DL in the last year
Since there are a large number of Paipai systems, the total amount cannot be listed. I'm sure there are many easy-to-use packages.
Let's praise it in April ** Example: [mypypi] is the best! ** **
570 DL, 1,114 DL in the last year
Let's declare in a loud voice in April ** Example: I'm always [pypandas] **
535 DL, 40 DL in the last year
Let's introduce it to a colleague in April ** Example: I'll show you my [fancy pants]! ** **
512 DL in the last year A general purpose automation framework for acceptance testing and robotic process automation (RPA) = It seems that "robot framework" is shaped like a blue raccoon dog?
Don't be afraid to forget your homework in April ** Example: If you have a problem, I will ask [doraemon-robot framework] **
49 DL in the last year A web application framework that uses the core of Pyramid? It seems like.
Let's try it in April ** Example: I put [baka] on my computer **
52 DL in the last year
Let's tweet somehow in April ** Example: [hn comments]. Fufufu **
52 DL, 25 DL in the last year
Let's talk about future expectations in April ** Example: Let's start [sexy time] from now on! ** **
It was the most difficult to change from English to katakana, At the time of English, there were already many ** power words **.
By all means in April, at work or school Let's read it aloud. ** I'm sure the people around me will feel the arrival of spring **.
The following are technical details, so I think many people don't have to look at them. ~~ Please refer to those who are interested.
** Introducing ** beautifully named packages ** seriously ** And this article ** that explains the ** acquisition code ** ** I'm not worried about being "censored / deleted" at all **.
However, when ** an adult with a dirty heart ** sees, You may receive it in a meaning different from the original intention **.
** This article is about various circumstances ** ** Please note that it may disappear unexpectedly. ** ** Please try it by all means before it disappears.
On PyPI, where the pip package is registered, A dataset of that statistic It is published on ** Google / BigQuery **. A tool that can easily obtain that information ** pypinfo **.
To work with BigQuery Follow the steps on the site below https://github.com/ofek/pypinfo Google Cloud Platform (GCP) account and You need to create the authentication information (JSON file).
After creating the JSON file, in the browser Colaboratory(https://colab.research.google.com/?hl=ja) Let's start up and execute the command as follows.
Mount Google Drive.
from google.colab import drive
drive.mount('/content/drive')
Create this working folder.
!mkdir "drive/My Drive/PYPI"
#Let's upload the JSON file for authentication created earlier here.
Install pypinfo.
pip install pypinfo
Specify the path of the JSON file for authentication to get the authentication information.
!pypinfo --auth "/content/drive/My Drive/PYPI/YourGCPProjectName-XXXXXXXXX.json"
Confirmation of communication with pypinfo (You can get the number of downloads of "request" like this)
!pypinfo requests
#Served from cache: False
#Data processed: 67.70 GiB
#Data billed: 67.70 GiB
#Estimated cost: $0.34
#
#| download_count |
#| -------------- |
#| 61,319,474 |
Besides, by country, by version, by installation destination OS, etc. You can get various information, so let's try it according to the example on the official website.
As you can see in "Estimated cost: $ 0.34" above In BigQuery, every time you throw a query It is important to note that you will be charged according to the amount of data read. However, with the Always Free frame of 1 TB / month, Because there is a $ 300 / year free tier for new GCP users Normal usage should be fine. Be careful not to fire only heavy queries for full acquisition.
Now, let's finally throw a query for this data acquisition.
Submit a query for the last year and save the result to a file.
!pypinfo --days 365 --limit 250000 "" project > "drive/My Drive/PYPI/PYPINFO_365_LIST.txt"
#Served from cache: False
#Data processed: 636.49 GiB
#Data billed: 636.49 GiB
#Estimated cost: $3.11
#| project | download_count |
#| --------------------------------------------------------------------------------- | -------------- |
#| urllib3 | 950,108,414 |
#| six | 788,263,157 |
#| botocore | 693,156,212 |
#| requests | 656,942,399 |
#~~ The following is omitted ~~
For your information, The total number of downloads in the last year is About 37,498,000,000 times There were approximately 215,000 package types.
Since the total number of packages is about 220,000, Looking at the level of the last year, what is registered Most will be "alive". Because there is no Pychinko that is said to have existed before It may be inventoried on a regular basis. Also, looking at the last 30 days, there were about 134,000 types, so Is there less than 100,000 types that are used somewhat decently?
The file of the package name & the number of downloads obtained earlier is It is easy to see and convenient for people to browse at hand, To handle it programmatically, you need to parse and process it.
Be careful to remove the cost row of the opening query, the heading row / Total row of the table, etc. Process as follows to make LIST format.
Read the result file while processing it and make it a LIST
f = open('/content/drive/My Drive/PYPI/PYPINFO_365_LIST.txt')
line = f.readline() #Read line by line(Includes: Newline character)
pypinfo_list = []
while line:
#When there are three thresholds = Heading, frame and Total are jammed, but other than that, it can be distinguished by this condition
if line.count('|') != 3:
line = f.readline()
continue
else:
#Line feed code, comma, and half-width space are removed
parsed_line = line.replace('\n', '').replace(' ', '').replace(',', '')
one_data = parsed_line.split('|')
#['', 'urllib3', '950108414', '']Use the middle two in the shape of
#Remarks: Numerical values are currently treated as character strings
one_data = one_data[1:3]
pypinfo_list.append(one_data)
line = f.readline()
f.close
#Remove the first two heading lines and the last total line
pypinfo_list = pypinfo_list[2:-1]
I wonder if pypinfo provides it, but I made it myself. If so, It costs about $ 3 each time, so rather than throwing it separately from the text version It will also save the number of query submissions. I think these ① and ② are also useful when performing Python-related "data analysis".
Now that we have a list of package names, For example, urllib ⇒ URlib python-dateutil ⇒ python-dateutil To katakana the package name like What do you do?
The policy is the following 4 steps.
The first "English words in katakana" is as follows I used the conversion table of ** alkana.py **. https://github.com/cod-sushi/alkana.py/blob/master/README_ja.md
For 2-4, mainly from the Roman alphabet rule table I created a conversion table with about 330 rows. Add to the data from alkana.py mentioned above, Create a conversion table as alkana_list.
The point here is to use the length of the English character string as a key. Sort alkana_list in descending order.
x[0]Enter the length of the character string in advance in the item of
alkana_list = sorted(alkana_list, key=lambda x: x[0], reverse=True)
#For high priority items such as py ⇒ pie and python ⇒ python
# [30, 'py', 'pie']If you register it as a long length, the priority will increase.
The conversion will now be applied in order from longest word. The actual conversion is as follows. Due to the amount, it takes about 50 minutes at a time. You can use tqdm to display the progress on the way as shown below. It will be easier to use if you save it with pickle after processing.
Add Katanaka reading information to all modules
from tqdm import tqdm
pypinfo_jp_list = []
for pypinfo in tqdm(pypinfo_list):
#Japanese module name storage variable (English is stored at this point)
jp_module_name = pypinfo[0]
for data in alkana_list:
#Convert the conversion table in order.
jp_module_name = jp_module_name.replace(data[1], data[2])
pypinfo_jp_list.append([pypinfo[0], jp_module_name, int(pypinfo[1])])
print(len(pypinfo_jp_list))
print(pypinfo_jp_list[0:10])
import pickle
with open('/content/drive/My Drive/PYPI/pypinfo_jp_list.pickle', 'wb') as f:
pickle.dump(pypinfo_jp_list, f)
As a special natural language processing tool There may be some uses.
Finally, look for packages that contain specific keywords. Register ** your favorite words ** in "Beautiful_tango_list" in advance, It just loops. If you include a lot of terms such as "pai", Keep in mind that the results will be huge. This time, a certain "site that lists elegant words" I borrowed the word.
I think the print output is up to 5000 lines in Colaboratory, so If you go about 10,000 lines, it is better to output to a file as shown below.
Beautiful_tango_Search the contents of list and write it in the text
result_str = ""
for word in Beautiful_tango_list:
result_str += "■"+" "+ word + "\n"
for data in pypinfo_jp_list:
if word in data[1]:
result_str += str(data) + "\n"
result_str += "\n"
with open('/content/drive/My Drive/PYPI/Beautiful_Result.txt', 'w') as f:
print(result_str, file=f)
** Thank you for your support. ** ** Making full use of these techniques & codes, like the above results, I was able to find many packages with beautiful names.
** Python's naming sense ** is profound. Just like looking at the next word when you look up the dictionary Even just by accidental encounter with the name It would be great if you could meet your favorite package.
** Fateful encounter brought together by PyPI **, It can be said that. The reason for my interest in technology was "I was curious about the name." But isn't it good?
Since it is an article that introduces the package very seriously, It's like reminiscent of a meaning different from the original intention ** Do not throw stones if you are an adult with a dirty heart. ** ** Best regard.
** Come on, everyone ** ** Pieeen and Pieeen are ** ** Did you like it? ** **
To the wise readers who have read this far, ** The words in ○ are clear **.
If anyone misunderstands or complains about something that isn't in this article, It must be ** those who are always thinking about such things **.
That's all from the field.
Recommended Posts