This article is a translation of an article written by Armin Ronacher (@mitsuhiko) on Wednesday, May 27, 2015.

Rust for Python Programmers

Since the translator doesn't know Rust at all, I think there are some misunderstandings and mistranslations (especially terms). If you find such an error, it would be helpful if you could send us an edit request.

Getting Started with Rust for Python Programmers

Now that Rust 1.0 is out there, it's so stable that I thought it would be interesting to write an introductory article on Rust for Python programmers. This guide explores the basics of the Rust language and compares different constructs and how they behave.

The language Rust is a completely different beast than Python. Not only is one a compiled language and the other an interpreter language, but its major language features are also quite different. As such, the core parts of the languages are quite different, but these languages have a lot in common when it comes to thinking about how APIs work. As a Python programmer, many of the concepts are very familiar to you.

syntax

The first difference that Python programmers notice is the syntax. Unlike Python, Rust is a language with many curly braces. But there are good reasons for this. That's because Rust has anonymous functions, closures, and many chaining features that Python doesn't support well. In non-indented languages, these features make it much easier to write and understand code. Now let's look at the same example in both languages.

First, here is an example of Python that displays "Hello World" three times.

    def main():
        for count in range(3):
            print "{}. Hello World!".format(count)

The same example in Rust.

    fn main() {
        for count in 0..3 {
            println!("{}. Hello World!", count);
        }
    }

As you can see, they are very similar. def becomes fn and the colon becomes curly braces. The other major syntactic difference is that Rust requires type information for function parameters. You don't do that in Python. Type annotations are available in Python 3, so you can use the same syntax as you would find in Rust.

One of the new concepts compared to Python is a function with an exclamation mark at the end. They are macros. Macros are expanded at compile time. For example, this macro is used for string format and its output. This macro is a way to force the compiler to format a string at compile time. By doing so, you will not make a mistake in the number or type of arguments passed to the output function.

Trait vs. protocol

It can be said that the behavior of an object is the most familiar but has different characteristics. In Python, implementing special methods allows a class to choose a particular behavior. This is commonly referred to as "following the protocol". For example, implement the __iter__ method that returns an iterator to create an iterable object. These methods must be implemented in the class itself. And later _actually _ cannot be changed (monkeypatch ignores).

Rust's concept is very similar, but it uses traits instead of special methods. Traits have a slightly different way of achieving the same purpose, but their implementation is placed in local scope, allowing you to implement more traits for types from another module. For example, if you want integers to have special behavior, you can do so without changing anything about integer types.

To compare this concept, let's look at how to implement a type that is added to the type itself. First from Python.

    class MyType(object):
    
        def __init__(self, value):
            self.value = value
    
        def __add__(self, other):
            if not isinstance(other, MyType):
                return NotImplemented
            return self.__class__(self.value + other.value)

The same example in Rust.

    use std::ops::Add;
    
    struct MyType {
        value: i32,
    }
    
    impl MyType {
        fn new(value: i32) -> MyType {
            MyType { value: value }
        }
    }
    
    impl Add for MyType {
        type Output = MyType;
    
        fn add(self, other: MyType) -> MyType {
            MyType { value: self.value + other.value }
        }
    }

Rust's code here is a bit longer, but it also handles automatic types that Python's code doesn't have. The first thing to notice is that in Python methods belong to classes, whereas in Rust the data and its operations are independent. struct defines the data layout. ʻImpl MyType defines the methods that the type itself has, while ʻimpl Add for My Type implements the ʻAdd trait for that type. For the implementation of ʻAdd, we also need to define the result type of the add operation implemented here, but it eliminates the extra complexity of run-time type checking that we have to do in Python.

Another difference is that the constructor is explicit in Rust, while it's pretty magical in Python. In Python, when you instantiate an object, you finally call __init__ to initialize the object. Rust, on the other hand, just defines a static method (by convention called new) that assembles and assigns objects.

Error handling

Error handling in Python and Rust is quite different. Python errors are thrown as exceptions, while Rust errors are returned as a return value. It may seem strange at first, but it's a really good concept. It's easy to see what error is returned when you look at a function.

This means that functions in Rust can return a Result. Result is a parameterized type that has two aspects: success and failure. For example, Result <i32, MyError> means that if this function succeeds, it returns a 32-bit integer, and if an error occurs, it returns MyError. What if I need to return one or more errors? This is where the philosophical perspective is different.

In Python, a function can fail with any error and there is nothing you can do about it. If you've used Python's "requests" library to catch all request exceptions, you'll understand the essence of the problem after being frustrated that SSL errors aren't caught. If a library does not document what it returns, there is little that a user of that library can do.

That's a very different situation in Rust. The function signature contains the error it returns. If you need to return two errors, there is a way to create a custom error type to convert the internal error to a better one. For example, assuming you have an HTTP library, if that library might internally fail with errors like Unicode, IO, SSL, etc., you need to convert it to one library-specific error type. , The user of that library should only handle that error. If necessary, Rust provides an error chain that goes back to where the error was created and points to the original error.

You can also use the Box <Error> type anywhere. If you find it cumbersome to create your own custom error type, that type will convert it to any error.

Whereas Python propagates errors implicitly, Rust propagates errors explicitly. What this means is that you can immediately see that the function returns an error, even if you don't control the error. This is achieved by the try! Macro. Here is an example:

    use std::fs::File;
    
    fn read_file(path: &Path) -> Result<String, io::Error> {
        let mut f = try!(File::open(path));
        let mut rv = String::new();
        try!(f.read_to_string(&mut rv));
        Ok(rv)
    }

Both File :: open and read_to_string can fail with IO errors. The try! macro returns immediately from the function to propagate the error upwards on error and unpacks on success. When returning the result of a function, it must be wrapped in either ʻOk to indicate success or ʻErr to indicate failure.

The try! macro calls the From trait, which makes the error convertible. For example, implement the conversion from ʻio :: ErrortoMyError by changing the return value from ʻio :: Error to MyError and implementing the From trait. And that conversion is called automatically there.

Alternatively, you can change the return value from ʻio :: ErrortoBox ` to return any error. But this is just thinking about run-time errors, not compile-time errors.

You can also ʻunwrap ()` the result if you want to interrupt its execution without controlling the error. That way, if it succeeds, it gets the value, and if the result is an error, the program breaks.

Mutability and Ownership

The completely different language part of Rust and Python is the concept of variability and ownership. Python is a language with garbage collection. As a result, almost everything happens with the object at run time. You are free to hand over such objects and they will "move normally". You can explicitly leak memory, but many problems are resolved automatically at run time.

Rust doesn't have a garbage collector, but memory management is still automatic. This is made possible by a concept known as ownership tracking. Everything you create is owned by another. If you compare this in Python, imagine that all Python objects are owned by the interpreter. Ownership of Rust is much more local. Suppose you have a list of objects that have a function call. In that case, the list owns the object and the scope of the function owns the list.

More complex ownership scenarios can be represented by ownership lifetime annotations and function signatures. For example, in the ʻAdd implementation example above, the receiver was named self`, just like Python. However, unlike Python, the value is "moved" into a function, whereas in Python the method is called with a variable reference. What this means is that in Python you can:

    leaks = []
    
    class MyType(object):
        def __add__(self, other):
            leaks.append(self)
            return self
    
    a = MyType() + MyType()

Leaks self to the global list when adding MyType instances to other objects. When I run the code above, I have two references to the first instance of MyType. One to leaks and the other to ʻa. You can't do this with Rust. Because there is only one owner. If you try to add self to leaks`, the compiler will "move" the value there and the function will not be able to return it because it has moved somewhere. To return that value from a function, you must first move it back (for example, remove it from the list).

So what if you need to have two references to an object? You can actually borrow that value. There is no limit to the number of immutable borrows, but there can only be one mutable borrow (and only if there is no immutable borrow).

Functions that operate on immutable borrowing have & self, and functions that require variable borrowing have & mut self. References can only be rented by the owner. If you want to move that value out of this function (for example, return it from the function), you can't have any outstanding lending, and you lend that value after you move ownership somewhere. I can't do that either.

This is a big change in the way you think about programs, but you'll soon get used to it.

Runtime Borrows and Multiple Owners [^ 1]

[^ 1]: The original text is "Mutible Owners", but I guess it's a typo in Multiple.

So far, almost all ownership tracking has been verified at compile time. But what if you can't verify ownership at compile time? There are multiple options for free use. One example is to use a mutex. A mutex guarantees that only one person has variable borrowing on an object at run time, but the mutex itself owns the object. That way, I write code that accesses the same object, but at a given time only one thread can access that object.

As a result, this also means that you won't forget to use a mutex and cause data races. That's because such code doesn't compile.

But if you want to do that programming in Python, how can you find the owner of the memory? In such a case, set up an object in the reference count wrapper and lend the value to this side at runtime. It's very close to Python's behavior simply because it can cycle. Python splits the cycle with its garbage collector, and Rust doesn't have one.

To illustrate this in a better way, let's look at a complex Python example and the Rust equivalent.

    from threading import Lock, Thread
    
    def fib(num):
        if num < 2:
            return 1
        return fib(num - 2) + fib(num - 1)
    
    def thread_prog(mutex, results, i):
        rv = fib(i)
        with mutex:
            results[i] = rv
    
    def main():
        mutex = Lock()
        results = {}
    
        threads = []
        for i in xrange(35):
            thread = Thread(target=thread_prog, args=(mutex, results, i))
            threads.append(thread)
            thread.start()
    
        for thread in threads:
            thread.join()
    
        for i, rv in sorted(results.items()):
            print "fib({}) = {}".format(i, rv)

What we're doing here is to spawn 35 threads and do some terrible calculations to increase the Fibonacci number. Then join the thread to see the sorted results. One thing you'll notice right away here is that there is no essential relationship between the mutex (lock) and the resulting array.

Next is an example of Rust.

    use std::sync::{Arc, Mutex};
    use std::collections::BTreeMap;
    use std::thread;
    
    fn fib(num: u64) -> u64 {
        if num < 2 { 1 } else { fib(num - 2) + fib(num - 1) }
    }
    
    fn main() {
        let locked_results = Arc::new(Mutex::new(BTreeMap::new()));
        let threads : Vec<_> = (0..35).map(|i| {
            let locked_results = locked_results.clone();
            thread::spawn(move || {
                let rv = fib(i);
                locked_results.lock().unwrap().insert(i, rv);
            })
        }).collect();
        for thread in threads { thread.join().unwrap(); }
        for (i, rv) in locked_results.lock().unwrap().iter() {
            println!("fib({}) = {}", i, rv);
        }
    }

The big difference here from the Python code is that it uses a B-tree map instead of a hash table and adds that map to Arc'ed mutex. What is that? First of all, the reason I use B-trees is that they sort automatically, which is what I needed here. Then add it to the mutex so that you can lock the map at runtime. The relationship was established here. Finally, add the mutex to Arc. The Arc reference counts what it encloses. In this case it is a mutex. This means that we guarantee that the mutex can be removed only after the last thread has finished executing. It's a clever mechanism.

Now let's see how this code works. Count up to 35 [^ 2] like Python and execute a local function for each number. Unlike Python, closures can be used here. Then make a copy of Arc in your local thread. This means that each method will see its own Arc individually (internally this will automatically increase the reference count and decrease it when the thread dies). Then spawn that thread with a local function. This move tells you to move the closure inside the thread. Then execute the Fibonacci function in each thread. Unwrap and then add the value when locking the Arc that returns the result. Ignore this unwrap for a moment. It simply confuses the explicit result. But the point is that you can only get the resulting map when you unlock the mutex. You can never forget to lock it inadvertently!

[^ 2]: The original text says "we count to 20 like in Python," but I think it's probably a typo of 35.

Then collect all threads into a one-dimensional array (vector). Finally, join all threads iteratively and display the result.

There are two things to note here. There are very few visible molds. Of course, Arc types and Fibonacci functions handle unsigned 64bit integer types, but there are no other explicit types. I also used B-tree maps instead of hashable objects because Rust provides such types.

Iterating works exactly like Python. The only difference is that in this example Rust needs to get a mutex. The reason is that the compiler doesn't know that you don't need the finished thread or its mutex. That said, there are APIs that don't require this mutex, and those APIs aren't yet stable in Rust 1.0.

In terms of performance, exactly what you expected will happen. (This example intentionally writes terrible code to illustrate the behavior of threads.)

Unicode

The Unicode topic is my favorite :) Rust and Python are quite different. Python (both 2 and 3) is very similar to the Unicode model, which maps Unicode data to an array of characters. Rust, on the other hand, is a Unicode string that is always stored as UTF-8. I explained earlier why this is a much better solution than what Python and C # are trying to do (UCS vs UTF-8 as Internal String Encoding. See 2014/1/9 / ucs-vs-utf8 /)). A very interesting thing about Rust is how to deal with the ugly reality of the world around encoding.

First of all, Rust is completely aware that the operating system APIs (both Windows Unicode and non-Linux non-Unicode) are quite terrible. Unlike Python, we don't force Unicode into those areas. Instead, they have different string types that can be (reasonably) converted to each other at low cost. This works fine in practice and makes string manipulation very fast.

Accepting UTF-8 for most of your programs eliminates the need for encoding / decoding. You only have to perform low-cost validation checks, and processing on UTF-8 strings does not need to be encoded in the middle. If you need to integrate the Windows Unicode API, you can internally convert to UCS2 like UTF-16 at a fairly low cost WTF-8 encoding -8 /) is used.

You can convert between Unicode and bytes anywhere, and handle bytes as needed. You can then perform validation steps later to ensure that everything is as intended. This allows you to write a protocol that is both really fast and really convenient. Compare this to just Python, which has to constantly encode and decode to support string indexing for ʻO (1) `.

Besides the really good storage model of Unicode, there are also many APIs for working with Unicode. It's either part of the language or in the Great crates.io index (https://crates.io/search?q=unicode). This includes case folding, categorization, Unicode regular expressions, Unicode normalization, well-known URI / IRI / URL APIs, splitting, and simple name mapping.

What are the drawbacks? You cannot undo 'ö' as you intended with a string like"föo" [1]. But in any case it's not a good idea.

As an example of how to interact with the operation of the OS, we will introduce an application that opens a file in the current directory and displays its contents and file name.

    use std::env;
    use std::fs;
    
    fn example() -> Result<(), Box<Error>> {
        let here = try!(env::current_dir());
        println!("Contents in: {}", here.display());
        for entry in try!(fs::read_dir(&here)) {
            let path = try!(entry).path();
            let md = try!(fs::metadata(&path));
            println!("  {} ({} bytes)", path.display(), md.len());
        }
        Ok(())
    }
    
    fn main() {
        example().unwrap();
    }

All IO operations use the Path object introduced earlier. It properly encapsulates the internal path of the operating system. It could be bytes or unicode, or something else that the operating system is using. However, it is properly formatted by calling .display (). This method returns an object in the string that can format itself. This is useful. The reason is that you will never inadvertently leak a bad string, as you would in Python 3, for example. A clean separation of concerns.

Distributions and libraries

Rust has a "cargo" that combines virtualenv + pip + setuptools. Well, by default it only works with one version of Rust, so it's not exactly virtualenv equivalent, but otherwise it works as expected. The advantage of Python is that you can make dependencies for different versions of the library in the git repository or in the crates.io index. If you downloaded rust from the website, it comes with the cargo command and everything should work as expected.

Will Rust replace Python?

I don't think there is a direct relationship between Python and Rust. For example, Python has been successful in the field of computer science, and I don't think Rust will work in that field in the near future simply because of how much work it will require. Similarly, it doesn't make any sense to write a shell script in Python while writing it in Rust. That said, I think it's starting to look at Rust in areas where it used to be Python, just as many Python programmers started using Go.

Rust is a very powerful language, has a strong foundation, and operates under a free license with a friendly community and a democratic attitude towards language evolution.

Rust's runtime support is so small that it's very easy to use from Python via ctypes or CFFI. I could imagine the future very clearly. It will create a Python package that includes a distribution of binary modules written in Rust, and will call Rust modules from Python without the effort required by developers.

[Translation] Getting Started with Rust for Python Programmers