This article is a translation of an article written by Tim Abbott on Thursday, October 13, 2016.
This article is a ** unofficial ** translated article (we have confirmed that the translation will be published to the author Tim Abbott). Tim Abbott and Dropbox assume no responsibility for the content of this article.
If you have any mistranslations, please send me an edit request.
Thank you for improving my poor translation!
October 13, 2016
— Tim Abbott
Over the last few years, static type checkers have been using PHP (Hack) and JavaScript (Flow and [TypeScript](https: /). It has become available in popular dynamically typed languages such as /www.typescriptlang.org/)) and is becoming more widely adopted. Two years ago, "Temporary Syntax for Static Type Annotations" (https://www.python.org/dev/peps/pep-0484/) was added to Python 3. However, static types in Python are not yet widely adopted. The reason is that the type annotation checking tool mypy was not of a quality that could be used in production. But that's the story so far!
There's some interesting news over the last year. A team of Dropbox (including Guido van Rossum, the creator of Python!) Has been working on mypy to work well as a type checker that brings static type integrity to Python programs. There is even more interesting news for many developers with a large Python 2 code base. mypy also fully supports Python 2 program type checking, can support large Python code bases, and using mypy greatly simplifies upgrading to Python 3.
Throughout 2016, the Zulip development community has witnessed these benefits to mypy. Zulip is a popular open source group chat application. It has apps for all major platforms, REST APIs, and many extension tools. To help you understand the scale, Zulip is a Python 2 product with about 50,000 lines and hundreds of commits by dozens of developers each month. Throughout 2016, we annotated the backend with 100% static types using mypy (!). And, thanks to mypy, I'm just about to switch to Python 3. Zulip is now the largest open source Python project with fully static types. However, I'm skeptical about whether I can keep that title for a long time in the future :)
In this article, I'll explain how mypy works, the benefits and pains we've experienced with mypy. And share a detailed guide to adopting mypy for large code bases in production (finding and fixing dozens of challenges for large projects that occur in the first few days of using mypy. Including methods!).
Here is a concise example of the annotation syntax for mypy / PEP-484 in Python 3.
def sum_and_stringify(nums: List[int]) -> str:
"""Adds up the numbers in a list and returns the result as a string."""
return str(sum(nums))
And I'll show you what the same code looks like with the comment syntax available in both Python 2 and 3.
def sum_and_stringify(nums):
# type: (List[int]) -> str
"""Adds up the numbers in a list and returns the result as a string."""
return str(sum(nums))
With this comment syntax, mypy supports normal Python 2 program type checking. And programs annotated with mypy will run normally in any Python runtime environment (this great property of mypy is also common to the JavaScript checker Flow). This is amazing. This means that you can adopt mypy for your project without changing the way Python runs.
Running mypy like linter prints an error in a sophisticated compiler-style format. For example, if you mistakenly annotate sum_and_stringify
to return a float, mypy will return output similar to the following:
$ mypy /tmp/test.py
/tmp/test.py: note: In function "sum_and_stringify":
/tmp/test.py:6: error: Incompatible return value type: expected builtins.float, got builtins.str
If you are interested in how to annotate, mypy syntax cheat sheet (simple use) and [PEP-484](https: // Check out www.python.org/dev/peps/pep-0484/) (for complex uses). These are great documents. If you want to try mypy right now, you can install it with pip3 install mypy-lang
.
When mypy also annotates modules and their dependent packages with full type annotations, you get a very powerful consistency check feature. It's similar to what the compiler gets in a statically typed language. mypy uses typeshed, a repository of type "stubs" (stubs: module type definitions for header file styles), and uses the Python standard library, requests, six and It provides type information for dozens of popular libraries such as sqlalchemy. Importantly, mypy is designed to progressively add types. If the type information of the imported one is not available, it is simply treated as consistent with any type.
Here are some of the benefits we have found with mypy. We will pick up the most important ones in order.
List
or Tuple
) really stands out.I think it's also important to talk about the pain of using mypy today to give a complete picture of the experience of adopting mypy.
pyflakes
do not read Python 2 type annotation styles (because the annotations are comments after all!). So when you import the required module for mypy, linter will report it as unused. We have turned off those warnings because cleaning up unused imports is of little help. This issue is resolved by migrating to the new Python 3 type syntax. With Python 3 type syntax, it looks like the type annotations are also imported from tools like pyflakes. We hope that the widely used Python linter will add an option to check for type comment imports. (which will soon be more explicit ʻif typing.TYPE_CHECKING
). But it doesn't work when migrating to Python 3 syntax. I hope someone will find a good solution to the problem that forced me to create this circular import. The solution may simply be a recommended configuration of the code to avoid this problem.In this section, I'm worried that it might be a problem (before trying mypy), but I'll look back after adopting mypy and consider what I don't think is a big problem.
type: ignore
that ignores false positives to get error-free output. And finally, where there are many dynamic parts of our system (eg Framework for parsing REST API requests Fortunately, even request-variables)) was successfully typed.type: ignore
, and type annotate their code rather than report bugs. I feel like I've taken the time.This section details what you need to do to be able to benefit from using mypy in a large code base. To give you an idea of the scale of the work needed, this section wrote down everything I did during the four-day hackathon in January (although mypy wasn't mature at the time, so I spent half my time myself I went to write an appropriate bug report about the bug I found). If you are considering using mypy but want more information to make that decision, I recommend that you follow all the steps discussed in this section. It's well worth the effort you've paid.
** Read the mypy cheat sheet. ** The mypy cheat sheet provides a clear overview of the PEP-484 syntax. And you will often refer to it when you start writing type annotations.
** Standardize how mypy is executed. ** Install mypy
on your codebase (https://github.com/zulip/zulip/blob/master/tools/install-mypy) and Run (https: // Create a tool for github.com/zulip/zulip/blob/master/tools/run-mypy). Make sure that all members of the project can run the type checker in the same way. Two features are important about how to run mypy.
mypy --py2 --silent-imports --fast-parser -i <paths>
. You should be able to do the same with the mypy.ini file.Make sure you don't get an error when you ** run mypy ** on your codebase. You usually need to add type annotations to the global empty data structures. This took a couple of hours around January (including the time to report a bug that wrote how to reproduce it). Probably much less work time now. By default mypy only checks within annotated functions. That's why unannotated codebases first allow you to parse the entire mypy codebase.
** Check basic integrity. ** Add --check-untyped-defs
to the argument of mypy. And make sure you don't get an error when you run mypy on that codebase. This option causes mypy to check all def
s in the codebase for internal integrity. This means that mypy will detect a lot of bugs and mistakes in the codebase even if no type annotations are written.
In many cases you'll want to fix bugs and terrible code, but you can also use the # type: ignore
annotation or exclude files to postpone the problem. For example, we first excluded all Zulip tests. That's because it's not worth typing and there were a lot of monkey patches and suspicious Python scripts. In Zulip I worked hard for about 2 days to clear the error from the output of --check-untyped-defs
and merged into Zulip's codebase to fix about 40 issues.
I spent another day or two trying to figure out a good way to reproduce the mypy bug I encountered and to improve typeshed. Mypy is no longer in its initial development and it is rare to encounter bugs in mypy anymore. However, in large projects you should expect to run into bugs and fix bugs in typeshed (just send a PR!).
** Run mypy with continuous integration. ** Once your codebase passes mypy --check-untyped-defs
, it's a good idea to run a type checker for mypy
in your CI environment to conclude your progress.
The type annotation of mypy is optional. Once you've completed the setup steps described above, you can annotate your code base at your own pace. Over time, you will benefit from static types in the annotated parts of your codebase. You don't have to do anything special for the rest of your codebase (the wonder of progressive typing!). In the next section, we'll consider strategies for getting your code base fully annotated.
This section considers the work required from the time mypy is set up to the time the entire codebase is annotated.
The great thing about mypy is that you can do all the work in a gradual manner. After the initial setup, we didn't actually do anything for 2-3 months. The change happened when we presented the type annotations for mypy as part of our Google Summer of Code (GSOC) project. In this project we met a wonderful student named Eklavya Sharma. Eklavya did most of the hard work annotating Zulip. He upgraded our tools, annotated the core libraries, contributed bug reports and PR to the upstream of mypy and typeshed, and fixed all the mistakes we made in the early days. did. Surprisingly, he also migrated Zulip to use virtualenv that summer and upgraded Zulip to Python 3!
The task of annotating large projects can be divided into multiple phases.
** Phase 1: Annotate the core library. ** Strategically you will want to first annotate the code in the core libraries that are used everywhere in other files. Type annotations on these functions place constraints on the types used elsewhere in the code base. As a result, if you work on these files first, you will spend less time fixing incorrect annotations and faster catching of real bugs. In addition, this phase writes documentation about how the project uses mypy (and from the failure of mypy in the CI system). (Link to documentation) It's also a good opportunity.
** Phase 2: Annotate most of the code base. ** Many projects will probably take months for developers to slowly work on different parts of the codebase. It's a very rational strategy.
It also works well to focus and annotate your code base. It would be helpful to talk about how Zulip did this work. Around half of Eklavya's summer of code, we aimed to annotate Zulip as much as possible and we PyCon Sprint I went to. PyCon Sprint is my favorite event at PyCon. It's the best four-day rally that takes place after the core PyCon conference. There, hundreds of developers work together on open source projects. It's completely free to participate and is a great opportunity to contribute to an open source project.
We reserved a table next to mypy developers and adjusted to take turns drawing 5-10 developers into Zulip's mypy annotation project every day. During the PyCon sprint, the percentage of Zulip annotated increased from 17% to 85% (25-30 engineers work daily, most of whom are inexperienced in both Zulip and mypy. did) . We used mypy's coverage support and coveralls.io to keep track of progress, but the progress bar on a large piece of paper is more interesting. This was taken at the beginning of the last day.
I think our experience with PyCon has clearly proven that mypy is easy to work with new developers. All contributors who added annotations, except me, were inexperienced with both Zulip and mypy. With the right 5-minute demo and good documentation, we've found that new contributors work efficiently within an hour of starting to touch mypy. I confidently recommend this mypy hackathon approach for other open source projects. This great approach can have a significant impact on your contributors, even on unfamiliar projects.
** Phase 3: 100%. ** Annotating the last few files is a more difficult task than ever before. The reason is that this phase will debug all the mistakes you made in Phase 2. While doing this, it's important to clean up to 100% of the files and directories, which is why the mypy
flag is marked with the --disallow-untyped-defs
option (type annotations). Let's avoid regression by adding (report functions that are not).
Eklavya made it 85% to 96% before the university reopened. After that, we did 2-3 hours of work a few weeks ago to achieve 100%. All the new Python code I add to Zulip is now annotated with mypy (although it's reduced in number, with the exception of some scripts, settings, and test files).
** Phase 4: Celebrate and write a blog post! ** At least this was the next step for Zulip :)
Overall, it was a week-long hackathon, GSOC project, and PyCon sprint gathering that led to Zulip being fully annotated during intensive work. Of course, this is a trivial effort.
I must say that although Zulip is 100% annotated, Zulip's mypy journey is not yet complete. Eventually we want to add a stub to typeshed for the most important libraries used by Zulip (eg Django).
There are recommendations that can save you time when actually annotating.
str
or Text
. Bytes or str and str or unicode errors account for most of the problems that occur when migrating a code base from Python 2 to Python 2 + 3. If you do the migration properly while annotating your code base, you'll save a lot of time when upgrading to Python 3. I've found that annotating most of the codebase makes upgrading to Python 3 very quick. Eventually we have several helper functions to cast correctly (and easier to read!) Between str
, Text
and bytes
in both Python 2 and 3 in our codebase. //github.com/zulip/zulip/blob/master/zerver/lib/str_utils.py) has been added.type: ignore
to workaround around potential mypy and typeshed bugs, we recommend using the following styles to log the underlying GitHub issue:bad_code # type: ignore # https://github.com/python/typeshed/issues/372
This method makes it easy to see if issues that were avoided with type: ignore
in the future have been fixed upstream. If you need to annotate a file with a lot of type: ignore
, you can add it to the exclusion list (a feature of our run-mypy
wrapper) and postpone it. ..
All in all, the experience with mypy (and the PEP-484 type system) was great. And we feel that adopting mypy is a big step forward for the Zulip project. mypy improves readability, catches bugs without running, has very few false positives and has no major drawbacks. Leveraging mypy in a large code base was a relatively small investment in our project. In addition, annotating the code base has the added benefit of facilitating the transition to Python 3.
If you have a large Python codebase and want to improve your codebase, you should give yourself a week to start using mypy!
Finally, if you're curious about what Python static types look like in a large codebase, check out the Zulip Server Project on GitHub (https://github.com/zulip/zulip/). .. We welcome new contributors!
Special thanks to Guido van Rossum, Alya Abbott, Steve Howell, Jason Chen, Eklavya Sharma, Anurag Goel and Joshua Simmons for their feedback on this blog post.
Tim Abbott is the lead developer of the Zulip open source project. He was the CTO of Ksplice (before it was acquired by Dropbox) and later Zulip.
San Francisco https://zulip.org
Recommended Posts