[PYTHON] Summary of OSS tools and libraries created in 2016

Overview and examples of the following OSSs created in 2016

--CLI tools x 2 --Python library x 10

CLI tool

sqlitebiter A tool to convert CSV / ʻExcel / HTML/JSON / LTSV/Markdown / TSV/Google Sheets to SQLite` database files. You can also convert from the specified URL.

from-file


$ ls
sample_data.csv  sample_data.xlsx  sample_data_multi.json  sample_data_single.json
$ sqlitebiter file * -o sample.sqlite
[INFO] convert 'sample_data.csv' to 'sample_data' table
[INFO] convert 'sample_data.xlsx' to 'samplesheet1' table
[INFO] convert 'sample_data.xlsx' to 'samplesheet3' table
[INFO] convert 'sample_data_multi.json' to 'sample_data_multi_table_b' table
[INFO] convert 'sample_data_multi.json' to 'sample_data_multi_table_a' table
[INFO] convert 'sample_data_single.json' to 'sample_data_single_json3' table
$ ls sample.sqlite
sample.sqlite

from-URL


$ sqlitebiter -v url "https://en.wikipedia.org/wiki/Comparison_of_firewalls"
[INFO] convert 'https://en.wikipedia.org/wiki/Comparison_of_firewalls' to 'Comparison_of_firewalls_Wikipedia_html1 (Firewall TEXT, License TEXT, CostUsageLimits TEXT, OS TEXT)' table
[INFO] convert 'https://en.wikipedia.org/wiki/Comparison_of_firewalls' to 'Comparison_of_firewalls_Wikipedia_html2 (Firewall TEXT, License TEXT, Cost TEXT, OS TEXT)' table
[INFO] convert 'https://en.wikipedia.org/wiki/Comparison_of_firewalls' to 'Comparison_of_firewalls_Wikipedia_html3 (CanTarget TEXT, Changingdefaultpolicytoacceptrejectbyissuingasinglerule TEXT, IPdestinationaddresses TEXT, IPsourceaddresses TEXT, TCPUDPdestinationports TEXT, TCPUDPsourceports TEXT, EthernetMACdestinationaddress TEXT, EthernetMACsourceaddress TEXT, Inboundfirewallingress TEXT, Outboundfirewallegress TEXT)' table
[INFO] convert 'https://en.wikipedia.org/wiki/Comparison_of_firewalls' to 'Comparison_of_firewalls_Wikipedia_html4 (Can TEXT, [workatOSILayer4statefulfirewall] TEXT, [workatOSILayer7applicationinspection] TEXT, ChangeTTLTransparenttotraceroute TEXT, ConfigureREJECTwithanswer TEXT, DMZdemilitarizedzoneallowsforsingleseveralhostsnottobefirewalled TEXT, Filteraccordingtotimeofday TEXT, RedirectTCPUDPportsportforwarding TEXT, RedirectIPaddressesforwarding TEXT, FilteraccordingtoUserAuthorization TEXT, TrafficratelimitQoS TEXT, Tarpit TEXT, Log TEXT)' table
[INFO] convert 'https://en.wikipedia.org/wiki/Comparison_of_firewalls' to 'Comparison_of_firewalls_Wikipedia_html5 (Features TEXT, ConfigurationGUItextorbothmodes TEXT, [RemoteAccessWebHTTPTelnetSSHRDPSerialCOMRS232] TEXT, Changeruleswithoutrequiringrestart TEXT, Abilitytocentrallymanageallfirewallstogether TEXT)' table
[INFO] convert 'https://en.wikipedia.org/wiki/Comparison_of_firewalls' to 'Comparison_of_firewalls_Wikipedia_html6 (Features TEXT, Modularitysupportsthirdpartymodulestoextendfunctionality TEXT, [IPS : Intrusion prevention system] TEXT, OpenSourceLicense TEXT, [supports IPv6 ?] TEXT, ClassHomeProfessional TEXT, OperatingSystemsonwhichitruns TEXT)' table
[INFO] convert 'https://en.wikipedia.org/wiki/Comparison_of_firewalls' to 'Comparison_of_firewalls_Wikipedia_html7 (Can TEXT, [NAT44staticdynamicwoportsPAT] TEXT, [NAT64NPTv6] TEXT, IDSIntrusionDetectionSystem TEXT, VPNVirtualPrivateNetwork TEXT, AVAntiVirus TEXT, Sniffer TEXT, Profileselection TEXT)' table

tcconfig A wrapper tool for the network bandwidth control command tc. I made it because I immediately forget the setting format of tc. The following three types of commands are included.

-- tcset: tc filter settings -- tcdel: tc filter removed -- tcshow: Show current tc settings

tcset can filter both ingress and outbound packets.

tcconfig-example


# tcset --device eth0 --delay 10 --delay-distro 2  --loss 0.01 --rate 0.25M --network 192.168.0.10 --port 8080
# tcset --device eth0 --delay 1 --loss 0.02 --rate 500K --direction incoming
# tcshow --device eth0
{
    "eth0": {
        "outgoing": {
            "network=192.168.0.10/32, port=8080": {
                "delay": "10.0",
                "loss": "0.01",
                "rate": "250K",
                "delay-distro": "2.0"
            },
            "network=0.0.0.0/0": {}
        },
        "incoming": {
            "network=0.0.0.0/0": {
                "delay": "1.0",
                "loss": "0.02",
                "rate": "500K"
            }
        }
    }
}

Python library

pytablewriter A library for writing tables. Supports multi-byte characters.

Supported formats: CSV / HTML / JavaScript / JSON / LTSV / Markdown / MediaWiki / ʻExcel / Pandas/Python / reStructuredText/TOML /TSV`.

-example


import io
import pytablewriter

writer = pytablewriter.MarkdownTableWriter()
writer.table_name = u"Creational patterns".encode("utf_8")
writer.header_list = [u"Pattern name".encode("utf_8"), "Overview", "GoF", "Code Complete[1]"]
writer.value_matrix = [
    ["Abstract Factory", u"It provides a way to appropriately generate a set of related instances depending on the situation.".encode("utf_8"), "Yes", "Yes"],
    ["Builder", "Hide the process of creating a complex instance.", "Yes", "No"],
    ["Factory Method", "It provides an instance generation method that does not depend on the actually generated instance.", "Yes", "Yes"],
    ["Prototype", "Duplicate the prototype instance to create a similar instance.", "Yes", "No"],
    ["Singleton", "Guarantee a single instance for a class.", "Yes", "Yes"],
]

with io.open("multibyte_table_output.txt", "w", encoding="utf-8") as f:
    writer.stream = f
    writer.write_table()

multibyte_table_output.txt


#Creational patterns
Pattern name|Overview|GoF|Code Complete[1]
----------------|----------------------------------------------------------------------------|---|----------------
Abstract Factory|It provides a way to appropriately generate a set of related instances depending on the situation.|Yes|Yes             
Builder         |Hide the process of creating a complex instance.|Yes|No              
Factory Method  |It provides an instance generation method that does not depend on the actually generated instance.|Yes|Yes             
Prototype       |Duplicate the prototype instance to create a similar instance.|Yes|No              
Singleton       |Guarantee a single instance for a class.|Yes|Yes             

When drawn, it looks like ↓:

Pattern name Overview GoF Code Complete[1]
Abstract Factory It provides a way to appropriately generate a set of related instances depending on the situation. Yes Yes
Builder Hide the process of creating a complex instance. Yes No
Factory Method It provides an instance generation method that does not depend on the actually generated instance. Yes Yes
Prototype Duplicate the prototype instance to create a similar instance. Yes No
Singleton Guarantee a single instance for a class. Yes Yes

SimpleSQLite Wrapper library for the sqlite3 module in Python built-in. To simplify the creation of SQLite database. For details → Created Python library SimpleSQLite that simplifies SQLite table creation / data insertion

DateTimeRange A library for dealing with time ranges. Available during daylight saving time. For details → Created Python library DateTimeRange that handles time range

DataProperty A library for extracting attribute information such as type information and length as a character string. It can be a single data or a data array. It is also possible to extract the information of each column from the data matrix.

Example of extracting attribute information from a single data


>>> from dataproperty import DataProperty
>>> DataProperty(-1.1)
data=-1.1, typename=FLOAT, align=right, str_len=4, ascii_char_width=4, integer_digits=1, decimal_places=1, additional_format_len=1

Example of extracting the attribute information of each column from the data matrix


import sys
import datetime
from dataproperty import DataPropertyExtractor, Typecode

def display_col_dp(dp_list, attr_name):
    """show a value assocciated with an attribute for each
    DataProperty instance in the dp_list"""

    print()
    print("---------- {:s} ----------".format(attr_name))
    print([getattr(dp, attr_name) for dp in dp_list])

def main():
    # sample data definitions
    dt = datetime.datetime(2017, 1, 1, 0, 0, 0)
    inf = float("inf")
    nan = float("nan")
    data_matrix = [
        [1, 1.1,  "aa",   1,   1,     True,   inf,   nan,   dt],
        [2, 2.2,  "bbb",  2.2, 2.2,   False,  "inf", "nan", dt],
        [3, 3.33, "cccc", -3,  "ccc", "true", inf,
            "NAN", "2017-01-01T01:23:45+0900"],
    ]

    # extract property for each column from a matrix
    dp_extractor = DataPropertyExtractor()
    dp_extractor.header_list = [
        "int", "float", "str", "num", "mix", "bool", "inf", "nan", "time"]
    dp_extractor.data_matrix = data_matrix
    col_dp_list = dp_extractor.to_col_dataproperty_list()

    print("---------- typename ----------")
    print([Typecode.get_typename(dp.typecode) for dp in col_dp_list])

    display_col_dp(col_dp_list, "align")
    display_col_dp(col_dp_list, "ascii_char_width")
    display_col_dp(col_dp_list, "decimal_places")

if __name__ == "__main__":
    sys.exit(main())

output


---------- typename ----------
['INTEGER', 'FLOAT', 'STRING', 'FLOAT', 'STRING', 'BOOL', 'INFINITY', 'NAN', 'STRING']

---------- align ----------
[right, right, left, right, left, left, left, left, left]

---------- ascii_char_width ----------
[3, 5, 4, 4, 3, 5, 8, 3, 24]

---------- decimal_places ----------
[0, 2, nan, 1, 1, nan, nan, nan, nan]

pingparsing ping A library for parsing command execution results. You can also execute ping.

input


>ping google.com -n 10 > ping_win.txt

>type ping_win.txt

Pinging google.com [216.58.196.238] with 32 bytes of data:
Reply from 216.58.196.238: bytes=32 time=87ms TTL=51
Reply from 216.58.196.238: bytes=32 time=97ms TTL=51
Reply from 216.58.196.238: bytes=32 time=56ms TTL=51
Reply from 216.58.196.238: bytes=32 time=95ms TTL=51
Reply from 216.58.196.238: bytes=32 time=194ms TTL=51
Reply from 216.58.196.238: bytes=32 time=98ms TTL=51
Reply from 216.58.196.238: bytes=32 time=93ms TTL=51
Reply from 216.58.196.238: bytes=32 time=96ms TTL=51
Reply from 216.58.196.238: bytes=32 time=96ms TTL=51
Reply from 216.58.196.238: bytes=32 time=165ms TTL=51

Ping statistics for 216.58.196.238:
    Packets: Sent = 10, Received = 10, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 56ms, Maximum = 194ms, Average = 107ms

Parse the execution result of ping like ↑. The execution result of Sample using the library is ↓

output


parse_sample.py -f ping_win.txt
# properties ---
packet_transmit: 10
packet_receive: 10
packet_loss: 0.0
rtt_min: 56.0
rtt_avg: 107.0
rtt_max: 194.0
rtt_mdev: None

# asdict ---
{
    "packet_loss": 0.0,
    "packet_transmit": 10,
    "rtt_min": 56.0,
    "rtt_avg": 107.0,
    "packet_receive": 10,
    "rtt_max": 194.0,
    "rtt_mdev": null
}

pathvalidate A library that validates character strings such as file names and replaces invalid characters.

Windows/Example of replacing with a valid string as a file name on both Linux


import pathvalidate

filename = "_a*b:c<d>e%f/(g)h+i_0.txt"
print(pathvalidate.sanitize_filename(filename))
_abcde%f(g)h+i_0.txt

mbstrdecoder A library that makes strings Unicode. This has reduced the annoyance of Unicode.

from mbstrdecoder import MultiByteStrDecoder

encoded_multibyte_text = u"Multibyte character".encode("utf-8")

#Both encoded and Unicode strings can be used.
decoder = MultiByteStrDecoder(encoded_multibyte_text)

print(encoded_multibyte_text)
print(decoder.unicode_str)
print(decoder.codec)

output


b'\xe3\x83\x9e\xe3\x83\xab\xe3\x83\x81\xe3\x83\x90\xe3\x82\xa4\xe3\x83\x88\xe6\x96\x87\xe5\xad\x97'
Multibyte character
utf_8

pytablereader A library for extracting table data from the text / files of CSV / ʻExcel / HTML/JSON / LTSV/Markdown / TSV`.

sqliteschema A library to get the schema of SQLite database files.

sqliteschema-example


import sys
import simplesqlite
import sqliteschema

def make_database():
    db_path = "example.sqlite"
    con = simplesqlite.SimpleSQLite(db_path, "w")

    con.create_table_from_data_matrix(
        table_name="sampletable0",
        attr_name_list=["attr_a", "attr_b"],
        data_matrix=[[1, 2], [3, 4]])

    con.create_table_from_data_matrix(
        table_name="sampletable1",
        attr_name_list=["foo", "bar", "hoge"],
        data_matrix=[
            [1, 2.2, "aa"],
            [3, 4.4, "bb"],
        ],
        index_attr_list=("foo", "hoge"))

    con.create_table(
        "constraints",
        [
            "primarykey_id INTEGER PRIMARY KEY",
            "notnull_value REAL NOT NULL",
            "unique_value INTEGER UNIQUE",
        ]
    )

    return db_path

def main():
    db_path = make_database()
    extractor = sqliteschema.SqliteSchemaExtractor(db_path)
    print(extractor.dumps())

    return 0

if __name__ == "__main__":
    sys.exit(main())

output


.. table:: sampletable0

    +--------------+---------+-----------+--------+------+-----+
    |Attribute name|Data type|Primary key|Not NULL|Unique|Index|
    +==============+=========+===========+========+======+=====+
    |attr_a        |INTEGER  |           |        |      |     |
    +--------------+---------+-----------+--------+------+-----+
    |attr_b        |INTEGER  |           |        |      |     |
    +--------------+---------+-----------+--------+------+-----+

.. table:: sampletable1

    +--------------+---------+-----------+--------+------+-----+
    |Attribute name|Data type|Primary key|Not NULL|Unique|Index|
    +==============+=========+===========+========+======+=====+
    |foo           |INTEGER  |           |        |      |X    |
    +--------------+---------+-----------+--------+------+-----+
    |bar           |REAL     |           |        |      |     |
    +--------------+---------+-----------+--------+------+-----+
    |hoge          |TEXT     |           |        |      |X    |
    +--------------+---------+-----------+--------+------+-----+

.. table:: constraints

    +--------------+---------+-----------+--------+------+-----+
    |Attribute name|Data type|Primary key|Not NULL|Unique|Index|
    +==============+=========+===========+========+======+=====+
    |primarykey_id |INTEGER  |X          |        |      |     |
    +--------------+---------+-----------+--------+------+-----+
    |notnull_value |REAL     |           |X       |      |     |
    +--------------+---------+-----------+--------+------+-----+
    |unique_value  |INTEGER  |           |        |X     |     |
    +--------------+---------+-----------+--------+------+-----+

subprocrunner A wrapper library for the subprocess module.

subprocrunner-example


from subprocrunner import SubprocessRunner

runner = SubprocessRunner("echo test")
print("return code: {:d}".format(runner.run()))
print("stdout: {}".format(runner.stdout))

output


return code: 0
stdout: test

Recommended Posts

Summary of OSS tools and libraries created in 2016
Summary of modules and classes in Python-TensorFlow2-
Summary of tools used in Command Line vol.8
Summary of tools used in Command Line vol.5
Summary of tools needed to analyze data in Python
Summary of date processing in Python (datetime and dateutil)
Summary of various operations in Tensorflow
Summary of Python indexes and slices
[201911 ~ 201912] Recently created miscellaneous programs and tools, etc.
Summary of methods often used in pandas
Summary of frequently used commands in matplotlib
Summary of various for statements in Python
Summary of stumbling blocks in installing CaboCha
Screenshots of Megalodon in selenium and Chrome.
Separation of design and data in matplotlib
Summary of probability distributions that often appear in statistics and data analysis
Summary of built-in methods in Python list
Project Euler # 1 "Multiples of 3 and 5" in Python
Correspondence summary of array operation of ruby and python
Summary of the differences between PHP and Python
Summary of how to import files in Python 3
Summary of what was used in 100 Pandas knocks (# 1 ~ # 32)
A memo created in a package and registered in PyPI
Summary of how to use MNIST in Python
Installation of Python3 and Flask [Environment construction summary]
Header shifts in read_csv () and read_table () of Pandas
Summary of evaluation functions used in machine learning
Coexistence of Anaconda 2 and Anaconda 3 in Jupyter + Bonus (Julia)
I / O related summary of python and fortran
Explanation of edit distance and implementation in Python
Summary of pickle and unpickle processing of user-defined class
Summary of Excel operations using OpenPyXL in Python
OSS and animals
[Introduction to Python] Summary of functions and methods that frequently appear in Python [Problem format]
Full-width and half-width processing of CSV data in Python
About import error of numpy and scipy in anaconda
Calculation of standard deviation and correlation coefficient in Python
Summary of Linux (UNIX) commands that appeared in Progate
List of Python libraries for data scientists and data engineers
Python --Explanation and usage summary of the top 24 packages
[Python] Type Error: Summary of error causes and remedies for'None Type'
Difference between Ruby and Python in terms of variables
Summary of tools for operating Windows GUI with Python
A quick comparison of Python and node.js test libraries
[python] Calculation of months and years of difference in datetime
[Competition Pro] Summary of stock buying and selling problems
Overview of generalized linear models and implementation in Python
Sample of getting module name and class name in Python
Explanation of package tools and commands for Linux OS