What's new in Python 3.9 (Summary)

Introduction

I've posted an article summarizing the contents of What's New since Python 3.5.

From now on, release cycle will be faster, so it seems that the frequency of posting will increase, but I can do it as an old stock engineer who has been chasing Python since 1.x. I just want to continue.

And, as I wrote in this article, Python2 reached the end-of-life on January 1st of this year. I'm looking forward to seeing what new features will be added to 3.9, which will be released for the first time among the only Pythons that maintain Python 3.

In addition, from the previous time, I changed the way of summarizing from the previous time, such as "Small changes are written in this article, large changes are written in another article and a link is made here", but I would like to follow that this time as well. And instead of covering all the changes, I will focus on what I personally care about.

First of all, the development roadmap (PEP-596), which has become a one-year cycle from this time.

3.9 Development started: 2019-06-04 (completed)
3.9.0 alpha 1: 2019-11-19 (completed)
3.9.0 alpha 2: 2019-12-16-> 2019-12-18 (Completed)
3.9.0 alpha 3: 2020-01-13-> 2020-01-25 (Completed)
3.9.0 alpha 4: 2020-02-17-> 2020-02-26 (Completed)
3.9.0 alpha 5: 2020-03-16-> 2020-03-23 (completed)
3.9.0 alpha 6: 2020-04-22-> 2020-04-28 (Completed)
3.9.0 beta 1: 2020-05-18-> 2020-05-19 (Completed, no new features added after this)
3.9.0 beta 2: 2020-06-08
3.9.0 beta 3: 2020-06-29
3.9.0 beta 4: 2020-07-20
3.9.0 candidate 1: 2020-08-10
3.9.0 candidate 2: 2020-09-14
3.9.0 final: 2020-10-05

Change log

2020.05.31

The first beta was released on 2020-05-19 almost as planned. Now the new features in 3.9 are complete.

The following has been added to "Newly added modules".

zoneinfo module

The following was added to "Module Improvement".

datetime module (specification change of ʻisocalendar ()`)
functools module (addition of TopologicalSorter)
hashlib module (compilation option can disable built-in implementation)
ipaddress module (support for scoped addresses in ipv6)
math module (extension of gcd, addition of lcm)
os module (addition of ʻos.pidfd_open () and ʻos.P_PIDFD)
random module (addition of Random.randbytes ())

2020.05.04

The final alpha version, a6, was released on 2020-04-28. It's still about a week late.

The following has been added to "New Features of Interest".

Added a method to remove prefixes and suffixes in strings
You will be able to type with standard collection types
Introducing a new parser

Added the following to "Module Improvement".

typing module (such as adding typing.Annotated)

2020.04.06

A4 was released on 2020-02-26 and a5 was released on 2020-03-23. It's almost as planned with a delay of about a week. I was thinking about writing new features after the beta came out, but I updated it because it was quite accumulated. I won't call them all at once, so I'll add them little by little.

Added release highlight ("Check Deprecation Warning!")
Added "The union operator can be used in dictionary type"

2020.01.27

A3 was released on 2020-01-25. As far as I can see, there seems to be no major update after 1/18. GC (garbage collection) bugの修正が取り上げられていました。なおスケジュール的にa3は少し送れましたが、次のa4の予定は変わっていません。

2020.01.18

First version. a2 was released on 2019-12-18, but I am writing based on what's new. Since a3 was scheduled for 2020-01-13, I thought I'd wait for it, but it seems that it's late, so I'll put it out for the time being.

Release highlights

Check the Deprecation Warning!

The support period of Python2.7 has expired, and the past compatible functions that have been left so far will be cut off or will be cut off soon. They've been issuing DeprecatedWarning for years, but it's possible that they won't work as soon as they're updated unless they're seriously checked. To check, you can either issue a warning message as -W default at runtime, or take the plunge and make an error with -W error.

New features of interest

You can use union operators in dictionary type

→ Separate article: New features of Python 3.9 (1)-Union operators can be used in dictionary type

PEP 616 Added method to remove prefix and suffix to string

The methods remove prefix (prefix) and remove suffix (suffix) have been added to str, bytes, bytearray, and collections.UserString. This is a method that cuts off the substrings at the beginning and end of a string (or byte string). For example

>>> 'test_sample.txt'.removeprefix('test_')
'sample.txt'
>>> 'abc.doc'.removesuffix('.doc')
'abc'

You can do something like that. I'm wondering, "Huh? Wasn't there such a method?", But it wasn't. In fact, there are similar methods, lstrip and rstrip. Normally

>>> '   spacey_head'.lstrip()
'spacey_head'
>>> 'spacey_tail   '.rstrip()
'spacey_tail'

It is used to erase whitespace characters in the form of, but when an argument is given to this, it has a strange (?) Specification that "deletes until it does not match any of the characters contained in the character string". It is easy to make mistakes. For example, trying to do the same as the above example

>>> 'abc.doc'.rstrip('.doc')
'ab'
>>> 'test_sample.txt'.lstrip('test_')
'ample.txt'

Then, it disappears to an unexpected place and you may be surprised. In fact, this behavior confuses Python users, which is one of the reasons for adding new methods to remove prefixes and suffixes.

PEP 585 Now you can type with standard collection types

Until now, when you want to type annotation with a collection type such as a list or tuple, you used the typing module List or Tuple to do this.

import typing
def greet_all(names: typing.List[str]) -> None:
    for name in names:
        print("hello", name)

Many people would like to use the type names list and tuple as they are, just like str and ʻint`. That will be achieved in 3.9. In other words

def greet_all(names: list[str]) -> None:
    for name in names:
        print("hello", name)

You will be able to write!

PEP 617 Introducing a new parser

The part that reads and parses the code in the language processing system is called a parser, but until now Python used a type of parser called LL (1). This is a parser that performs top-down parsing with only one look-ahead, and is often used in programming language analysis together with the bottom-up type called LR type. While efficient analysis is possible, there is a problem that the grammar that can be handled is limited. In Python, there is already a syntax that LL (1) cannot handle, so some special logic is built in as post-processing of parser analysis. This time, we will introduce a new PEG parser so that the parser alone can handle various syntaxes.

For example, if you want to bind multiple variables with the with statement,

with (
    open("a_really_long_foo") as foo,
    open("a_really_long_baz") as baz,
    open("a_really_long_bar") as bar
):
    ...

I want to write it like this, but this cannot be handled by the current parser. I want to introduce a new parser to handle this kind of thing.

Python 3.9 also has a traditional LL (1) parser. The default uses the new parser, but you can also call the old one with the invocation option (-X oldparser) or the environment variable (PYTHONOLDPARSER = 1). However, this is a transitional measure, and when it comes to Python 3.10, old parsers will be retired and only new ones. And the language specifications are based on that.

Other language changes

Import error now raises ʻImportError `instead of` ValueError`

Within a hierarchical package, the modules in the upper hierarchy

import ..module_1

You can import it like this, but if you do this at the top level of the package, for example, you will get an error. In that case, until now

ValueError: attempted relative import beyond top-level package

I got a Value Error, but from 3.9

ImportError: attempted relative import beyond top-level package

It becomes ʻImport Error. Similarly, the error in ʻimportlib.util.resove_name () goes from ValueError to ʻImportError`. It's an incompatible change, so the code that catches and handles this exception needs to be changed, but that's a legitimate change.

The run-time path when the local file is executed becomes an absolute path

When you do python script.py, the file name of the executed script is entered in the __file__ attribute, but until now it was a relative path as written on the command line, but from 3.9 it will be an absolute path I will. ~~ In addition, What's new says that the value of sys.path [0] is also an absolute path, but this is also the case with the current version, so I think it is a mistake in the document. I am thinking. ~~-> This was fixed by listing a bug report.

Then, following the discussions in the past, it seems that sys.argv [0] was also trying to change to an absolute path. However, there was a tsukkomi saying that the range of influence was too large, and it seems that it was canceled. I'm glad I think it's a little overkill. Huh.

Changes to the behavior of `replace` for the empty string ("")

The behavior when applying the replace method to the empty string ("") changes. Previously, weird results were given with the optional count argument (which specifies how many changes to apply at most).

"".replace("", "p") = "p"
"".replace("", "p", 1) = ""
"".replace("", "p", 2) = ""

This becomes more consistent when it comes to 3.9.

"".replace("", "p") = "p"
"".replace("", "p", 1) = "p"
"".replace("", "p", 2) = "p"

Well, to be clear, I think it's at the level of bug fixes, but "(I don't think it's possible) I don't want to backport to previous versions because it would be a problem for people who implement this behavior." thing.

Newly added module

zoneinfo

A new zoneinfo module has been added to the standard library. It provides IANA timezone database functionality as standard. Some of the functions provided by the datetime module take timezone information as an argument, but the standardtimezone class provides minimal functionality and tries to do something a little more elaborate. Then I had to rely on an external library such as python-dateutil. This is much easier with the new zoneinfo module added in 3.9.

For example.

>>> from zoneinfo import ZoneInfo
>>> from datetime import datetime

>>> tzinfo = ZoneInfo("Europe/Brussels")
>>> dt = datetime(2020, 10, 24, 12, tzinfo=tzinfo)
>>> print(dt)
2020-10-24 12:00:00+02:00
>>> dt.tzname()
'CEST'

>>> dt = datetime(2020, 10, 25, 12, tzinfo=tzinfo)
>>> print(dt)
2020-10-25 12:00:00+01:00
>>> dt.tzname()
'CET'

Many countries in Western Europe have adopted "Central European Time". Countries in this time zone have daylight saving time from the last Sunday in March to the last Sunday in October, but if you use zoneinfo, there is no problem from only the location information without being aware of the switching timing. You can see that it can handle time zones.

Module improvements

asyncio

There are some changes to ʻasyncio`.

What I found most interesting was the addition of PidfdChildWatcher. It is possible to create multiple child processes in ʻasyncioand wait for the result asynchronously, but it is surprisingly difficult to" detect the termination of the child process ". So far, about 4 methods have been implemented, one is to create a thread for monitoring each time a child process is created, two are to use a signal (SIGCHLD), and ʻos.waitpid ()is used. There was a way. Each has its advantages and disadvantages and can be changed by the user as needed (default is threaded). This time, the one using pidfs is added here.

I also learned about pidfs for the first time this time, but with a new mechanism introduced in Linux, it will be possible to point processes with file descriptors. In Unix, processes are usually pointed to by PID (Process ID), but since they are shared throughout the system and are 32-bit signed integers, they will eventually be exhausted if the system keeps running and processes are created and deleted repeatedly. I will do it. This results in the reuse of previously used PIDs (of processes that have already disappeared), which we know can be a security hole. As a countermeasure, the idea of a file descriptor that assigns an individual number to each process is applied to the process, and access to the child process can be performed using it.

The PidfdChildWatcher added in 3.9 is an implementation of" just right "child process monitoring because it doesn't require threads or signals and doesn't interfere with other processes. The problem is that it can only be used with relatively new versions of Linux (5.3 and above), but I think that if it gradually spreads, there will be more opportunities to use it.

datetime

datetime.isocalendar () and date.isocalendar () used to return tuples(year, week, weekday)as follows:

>>> from datetime import datetime
>>> dt = datetime(2020,5,30)
>>> isocal = dt.isocalendar()
>>> isocal
(2020, 22, 6)
>>> isocal[0]
2020

Tuples are confusing because you have to use a numeric index to access the information inside them. This will return named tuples from 3.9.

>>> from datetime import datetime
>>> dt = datetime(2020,5,30)
>>> isocal = dt.isocalendar()
>>> isocal
datetime.IsoCalendarDate(year=2020, week=22, weekday=6)
>>> isocal.year
2020

It's a lot easier because you can access it like an object attribute.

functools

A function for sorting directed graphs called TopologicalSorter was provided as a class. This is covered in a separate article in What's New in Python 3.9 (2) -Sort Directed Acyclic Graphs in Python.

hashlib

As an option when compiling Python processing systems, it is now possible to (selectively) disable the built-in hash function implementation. It is intended to enforce the use of the OpenSSL implementation.

ipaddress

The scoped address defined in RFC4007 can now be specified in the form <ipv6_address>% scope_id. Also, an attribute called scope_id has been added to the ʻIPv6Address` class, and you can check the value there for scoped addresses.

math

Gcd, which calculates the greatest common divisor, can now take 3 or more arguments. Previously it was limited to two. Also added is lcm, which calculates the least common multiple. This supports 3 or more arguments from the beginning.

>>> import math
>>> math.gcd(120, 156, 180)
12
>>> math.lcm(120, 156, 180)
4680

pidfd_open () and P_PIDFD have been added. It's part of the support for pidfs that we talked about in ʻasyncio` above.

pathlib

Added pathlib.Path.readlink () to follow the symbolic link. For example, if you have a link called b-> a, you can still use ʻos.readlink ()`

>>> import os
>>> os.readlink('b')
'a'

But I had to give the file path as a string. Since 3.9, readlink () has been added as a method to the Path object, so

>>> from pathlib import Path
>>> p = Path('b')
>>> p.readlink()
PosixPath('a')

You can do it like this. In this example, there are more steps than ʻos.readlink () , but a tool that uses a Path object to display a symbolic link under / usr / local / bin`, for example, looks like this Can write.

from pathlib import Path

p = Path('/usr/local/bin/')
for child in p.iterdir():
  if child.is_symlink():
    print(f'{child} -> {child.readlink()}')

random

Added Random.randbytes (), which returns random numbers in bytes of arbitrary length. However, since the random numbers generated by the random module are pseudo-random numbers, they are good for modeling and simulation, but they are not recommended for security applications or cryptography. A function called secrets.token_bytes () is already provided for such purposes.

typing

A class called ʻAnnotated has been added to the typing` module to allow you to add metadata to type hint information.

For example, suppose a member variable called value in a class called ʻIntInRange is of type integer (ʻint) and has a value in the range -10 to 5. In the past, type information that it was an integer type was added, but it was not possible to specify a range of numbers. From 3.9, you can put arbitrary data in the second and subsequent arguments of ʻAnnotated` as shown below, so I tried to specify the range of numerical values with tuples.

And the hint information can be retrieved by passing True to the argument ʻinclude_extras added from 3.9 to the get_type_hints` function. If you create a special type checker using this, you can check the type including the range of values.

from typing import Annotated, get_type_hints

class IntInRange:
    value: Annotated[int, (-10, 5)]

get_type_hints(IntInRange) ==  {'value': int}
get_type_hints(IntInRange, include_extras=True) == {'value': Annotated[int, (-10, 5)]}

optimisation

Scheduled to be abolished

--If math.factorial () inserts a floating point number, a warning that it will be deprecated will appear even if the decimal point is 0. It seems to be TypeError in the future. --In the random module, any type of data could now be used as a seed as long as it can be hashed, but to ensure that the result is unique, None, ʻint, float, Only str, bytes and bytearray` can be seeded and used.

Function removed

--The tostring () and fromstring () methods of ʻarray.arrayhave been removed. It has been deprecated since around 3.2 and has become an alias fortobytes ()andfrombytes ()`.

Summary

Here's a summary of the changes in Python 3.9. Since 3.9 is in beta, I think this article is almost complete, but there are some optimization parts and parts that have not been written yet, so I will add them as appropriate.