[Translation] Python static type, amazing mypy!

This article is a translation of an article written by Tim Abbott on Thursday, October 13, 2016.

Disclaimer

This article is a ** unofficial ** translated article (we have confirmed that the translation will be published to the author Tim Abbott). Tim Abbott and Dropbox assume no responsibility for the content of this article.

If you have any mistranslations, please send me an edit request.

Acknowledgments

Thank you for improving my poor translation!

Python static types, awesome mypy!

October 13, 2016

— Tim Abbott

Over the last few years, static type checkers have been using PHP (Hack) and JavaScript (Flow and [TypeScript](https: /). It has become available in popular dynamically typed languages such as /www.typescriptlang.org/)) and is becoming more widely adopted. Two years ago, "Temporary Syntax for Static Type Annotations" (https://www.python.org/dev/peps/pep-0484/) was added to Python 3. However, static types in Python are not yet widely adopted. The reason is that the type annotation checking tool mypy was not of a quality that could be used in production. But that's the story so far!

There's some interesting news over the last year. A team of Dropbox (including Guido van Rossum, the creator of Python!) Has been working on mypy to work well as a type checker that brings static type integrity to Python programs. There is even more interesting news for many developers with a large Python 2 code base. mypy also fully supports Python 2 program type checking, can support large Python code bases, and using mypy greatly simplifies upgrading to Python 3.

Throughout 2016, the Zulip development community has witnessed these benefits to mypy. Zulip is a popular open source group chat application. It has apps for all major platforms, REST APIs, and many extension tools. To help you understand the scale, Zulip is a Python 2 product with about 50,000 lines and hundreds of commits by dozens of developers each month. Throughout 2016, we annotated the backend with 100% static types using mypy (!). And, thanks to mypy, I'm just about to switch to Python 3. Zulip is now the largest open source Python project with fully static types. However, I'm skeptical about whether I can keep that title for a long time in the future :)

In this article, I'll explain how mypy works, the benefits and pains we've experienced with mypy. And share a detailed guide to adopting mypy for large code bases in production (finding and fixing dozens of challenges for large projects that occur in the first few days of using mypy. Including methods!).

A brief introduction to mypy

Here is a concise example of the annotation syntax for mypy / PEP-484 in Python 3.

def sum_and_stringify(nums: List[int]) -> str:  
    """Adds up the numbers in a list and returns the result as a string."""
    return str(sum(nums))

And I'll show you what the same code looks like with the comment syntax available in both Python 2 and 3.

def sum_and_stringify(nums):  
    # type: (List[int]) -> str
    """Adds up the numbers in a list and returns the result as a string."""
    return str(sum(nums))

With this comment syntax, mypy supports normal Python 2 program type checking. And programs annotated with mypy will run normally in any Python runtime environment (this great property of mypy is also common to the JavaScript checker Flow). This is amazing. This means that you can adopt mypy for your project without changing the way Python runs.

Running mypy like linter prints an error in a sophisticated compiler-style format. For example, if you mistakenly annotate sum_and_stringify to return a float, mypy will return output similar to the following:

$ mypy /tmp/test.py
/tmp/test.py: note: In function "sum_and_stringify":
/tmp/test.py:6: error: Incompatible return value type: expected builtins.float, got builtins.str

If you are interested in how to annotate, mypy syntax cheat sheet (simple use) and [PEP-484](https: // Check out www.python.org/dev/peps/pep-0484/) (for complex uses). These are great documents. If you want to try mypy right now, you can install it with pip3 install mypy-lang.

When mypy also annotates modules and their dependent packages with full type annotations, you get a very powerful consistency check feature. It's similar to what the compiler gets in a statically typed language. mypy uses typeshed, a repository of type "stubs" (stubs: module type definitions for header file styles), and uses the Python standard library, requests, six and It provides type information for dozens of popular libraries such as sqlalchemy. Importantly, mypy is designed to progressively add types. If the type information of the imported one is not available, it is simply treated as consistent with any type.

Benefits of using mypy

Here are some of the benefits we have found with mypy. We will pick up the most important ones in order.

What was painful

I think it's also important to talk about the pain of using mypy today to give a complete picture of the experience of adopting mypy.

It wasn't a pain

In this section, I'm worried that it might be a problem (before trying mypy), but I'll look back after adopting mypy and consider what I don't think is a big problem.

Find bugs in mypy in the first few days

This section details what you need to do to be able to benefit from using mypy in a large code base. To give you an idea of the scale of the work needed, this section wrote down everything I did during the four-day hackathon in January (although mypy wasn't mature at the time, so I spent half my time myself I went to write an appropriate bug report about the bug I found). If you are considering using mypy but want more information to make that decision, I recommend that you follow all the steps discussed in this section. It's well worth the effort you've paid.

** Read the mypy cheat sheet. ** The mypy cheat sheet provides a clear overview of the PEP-484 syntax. And you will often refer to it when you start writing type annotations.

** Standardize how mypy is executed. ** Install mypy on your codebase (https://github.com/zulip/zulip/blob/master/tools/install-mypy) and Run (https: // Create a tool for github.com/zulip/zulip/blob/master/tools/run-mypy). Make sure that all members of the project can run the type checker in the same way. Two features are important about how to run mypy.

Make sure you don't get an error when you ** run mypy ** on your codebase. You usually need to add type annotations to the global empty data structures. This took a couple of hours around January (including the time to report a bug that wrote how to reproduce it). Probably much less work time now. By default mypy only checks within annotated functions. That's why unannotated codebases first allow you to parse the entire mypy codebase.

** Check basic integrity. ** Add --check-untyped-defs to the argument of mypy. And make sure you don't get an error when you run mypy on that codebase. This option causes mypy to check all defs in the codebase for internal integrity. This means that mypy will detect a lot of bugs and mistakes in the codebase even if no type annotations are written.

In many cases you'll want to fix bugs and terrible code, but you can also use the # type: ignore annotation or exclude files to postpone the problem. For example, we first excluded all Zulip tests. That's because it's not worth typing and there were a lot of monkey patches and suspicious Python scripts. In Zulip I worked hard for about 2 days to clear the error from the output of --check-untyped-defs and merged into Zulip's codebase to fix about 40 issues.

I spent another day or two trying to figure out a good way to reproduce the mypy bug I encountered and to improve typeshed. Mypy is no longer in its initial development and it is rare to encounter bugs in mypy anymore. However, in large projects you should expect to run into bugs and fix bugs in typeshed (just send a PR!).

** Run mypy with continuous integration. ** Once your codebase passes mypy --check-untyped-defs, it's a good idea to run a type checker for mypy in your CI environment to conclude your progress.

The type annotation of mypy is optional. Once you've completed the setup steps described above, you can annotate your code base at your own pace. Over time, you will benefit from static types in the annotated parts of your codebase. You don't have to do anything special for the rest of your codebase (the wonder of progressive typing!). In the next section, we'll consider strategies for getting your code base fully annotated.

Fully annotate large code bases

This section considers the work required from the time mypy is set up to the time the entire codebase is annotated.

The great thing about mypy is that you can do all the work in a gradual manner. After the initial setup, we didn't actually do anything for 2-3 months. The change happened when we presented the type annotations for mypy as part of our Google Summer of Code (GSOC) project. In this project we met a wonderful student named Eklavya Sharma. Eklavya did most of the hard work annotating Zulip. He upgraded our tools, annotated the core libraries, contributed bug reports and PR to the upstream of mypy and typeshed, and fixed all the mistakes we made in the early days. did. Surprisingly, he also migrated Zulip to use virtualenv that summer and upgraded Zulip to Python 3!

The task of annotating large projects can be divided into multiple phases.

** Phase 1: Annotate the core library. ** Strategically you will want to first annotate the code in the core libraries that are used everywhere in other files. Type annotations on these functions place constraints on the types used elsewhere in the code base. As a result, if you work on these files first, you will spend less time fixing incorrect annotations and faster catching of real bugs. In addition, this phase writes documentation about how the project uses mypy (and from the failure of mypy in the CI system). (Link to documentation) It's also a good opportunity.

** Phase 2: Annotate most of the code base. ** Many projects will probably take months for developers to slowly work on different parts of the codebase. It's a very rational strategy.

It also works well to focus and annotate your code base. It would be helpful to talk about how Zulip did this work. Around half of Eklavya's summer of code, we aimed to annotate Zulip as much as possible and we PyCon Sprint I went to. PyCon Sprint is my favorite event at PyCon. It's the best four-day rally that takes place after the core PyCon conference. There, hundreds of developers work together on open source projects. It's completely free to participate and is a great opportunity to contribute to an open source project.

We reserved a table next to mypy developers and adjusted to take turns drawing 5-10 developers into Zulip's mypy annotation project every day. During the PyCon sprint, the percentage of Zulip annotated increased from 17% to 85% (25-30 engineers work daily, most of whom are inexperienced in both Zulip and mypy. did) . We used mypy's coverage support and coveralls.io to keep track of progress, but the progress bar on a large piece of paper is more interesting. This was taken at the beginning of the last day.

Zulip mypyカバレッジゴール

I think our experience with PyCon has clearly proven that mypy is easy to work with new developers. All contributors who added annotations, except me, were inexperienced with both Zulip and mypy. With the right 5-minute demo and good documentation, we've found that new contributors work efficiently within an hour of starting to touch mypy. I confidently recommend this mypy hackathon approach for other open source projects. This great approach can have a significant impact on your contributors, even on unfamiliar projects.

** Phase 3: 100%. ** Annotating the last few files is a more difficult task than ever before. The reason is that this phase will debug all the mistakes you made in Phase 2. While doing this, it's important to clean up to 100% of the files and directories, which is why the mypy flag is marked with the --disallow-untyped-defs option (type annotations). Let's avoid regression by adding (report functions that are not).

Eklavya made it 85% to 96% before the university reopened. After that, we did 2-3 hours of work a few weeks ago to achieve 100%. All the new Python code I add to Zulip is now annotated with mypy (although it's reduced in number, with the exception of some scripts, settings, and test files).

** Phase 4: Celebrate and write a blog post! ** At least this was the next step for Zulip :)

Overall, it was a week-long hackathon, GSOC project, and PyCon sprint gathering that led to Zulip being fully annotated during intensive work. Of course, this is a trivial effort.

I must say that although Zulip is 100% annotated, Zulip's mypy journey is not yet complete. Eventually we want to add a stub to typeshed for the most important libraries used by Zulip (eg Django).

Recommendations for annotating code

There are recommendations that can save you time when actually annotating.

bad_code # type: ignore # https://github.com/python/typeshed/issues/372

This method makes it easy to see if issues that were avoided with type: ignore in the future have been fixed upstream. If you need to annotate a file with a lot of type: ignore, you can add it to the exclusion list (a feature of our run-mypy wrapper) and postpone it. ..

Conclusion

All in all, the experience with mypy (and the PEP-484 type system) was great. And we feel that adopting mypy is a big step forward for the Zulip project. mypy improves readability, catches bugs without running, has very few false positives and has no major drawbacks. Leveraging mypy in a large code base was a relatively small investment in our project. In addition, annotating the code base has the added benefit of facilitating the transition to Python 3.

If you have a large Python codebase and want to improve your codebase, you should give yourself a week to start using mypy!

Finally, if you're curious about what Python static types look like in a large codebase, check out the Zulip Server Project on GitHub (https://github.com/zulip/zulip/). .. We welcome new contributors!

Special thanks to Guido van Rossum, Alya Abbott, Steve Howell, Jason Chen, Eklavya Sharma, Anurag Goel and Joshua Simmons for their feedback on this blog post.

Tim Abbott

Tim Abbott is the lead developer of the Zulip open source project. He was the CTO of Ksplice (before it was acquired by Dropbox) and later Zulip.

San Francisco https://zulip.org

Recommended Posts

[Translation] Python static type, amazing mypy!
Python numeric type
Python2 string type
Python # string type
Static type checking that starts loosely in Python
Python callable type specification
[Translation] 25 year old Python
Check Python # type identity
[Translation] PEP 0484 --Type Hints
Python --Check type of values
Static analysis of Python programs
Python immutable type int memo
Python data type summary memo
Image data type conversion [Python]