Learn about Go slices

This article is a translation and addition of Arrays, slices (and strings): The mechanics of'append'.

Introduction

One of the most common features of procedural programming languages is the concept of arrays.

Arrays look simple, but there are still many questions about the mechanism for adding arrays to the language.

--Fixed length or variable length? --What is the element type? ――What about multidimensional arrays? --What does an empty array mean?

The answers to these questions even affect whether arrays are just a function of the language or a core part of its design.

In the early development of Go, it took about a year to come up with the right design with the answers to these questions.

The hardest part was the introduction of slices.

Slices are built on fixed-size arrays, providing a flexible and extensible data structure.

However, programmers new to Go often stumble when it comes to slicing behavior. This is probably due to experience in other languages.

In this post, I'll try to get rid of the confusion.

To do this, we'll walk through each of the components needed to explain how the built-in function append works and why it works that way.

Arrays

Arrays are an important component of Go, but an important component hidden behind slices.

Before we move on to the interesting, powerful, and prominent ideas that slices have, we need a brief explanation of arrays.

Arrays are fixed length, so you won't often see them in Go programs.

When declaring

var buffer [256]byte

This declaration declares a variable called buffer that holds 256 bytes.

The type of buffer is [256] byte and the type includes the size. If you want to keep 512 bytes, it will be [512] byte.

The data associated with an array is literally an array of elements. In general, buffer looks like this in memory:

buffer: byte byte byte ... 256 times ... byte byte byte

So this variable has nothing but 256 bytes of data.

Each element can be accessed using a familiar index. (buffer \ [0 ], buffer \ [1 ], ..., buffer \ [255 ])

The program will crash if you try to access the index out of the array.

Arrays, slices and some other data types have a built-in function called len that returns the number of elements.

For arrays, it's clear what len returns. In this example, len (buffer) returns a fixed value of 256.

Arrays are often used to properly represent transformation matrices, but the most common purpose in Go is to preserve storage for slices.

Slices: About slice headers

Slices are a commonly used data structure, but to use them well, you need to understand exactly what they are and what they do.

A slice is a data structure that describes a continuous section of an array inside, and the slice itself is not an array. Slices represent part of an array.

By slicing this array, given the variable buffer that represents the array in the previous section, you can create a slice that describes elements 100-150 (more precisely, 100-149).

var slice []byte = buffer[100:150]

In the example above, we used an explicit variable declaration.

The variable slice is of type [] byte, pronounced "byte type slice", and is initialized from the array buffer by slicing elements 100 to 150.

In more idiomatic syntax, you can omit the description of the type to be set.

var slice = buffer[100:150]

You can also make the following short declaration inside the function:

slice := buffer[100:150]

What exactly does this slice variable have?

I won't talk about everything here, but for now think of slices as small data structures with two elements: length and pointers to array elements.

So you can think of the slices as being built this way behind the scenes:

type sliceHeader struct {
    Length        int
    ZerothElement *byte
}

slice := sliceHeader{
    Length:        50,
    ZerothElement: &buffer[100],
}

Of course, this is just an example, not the actual structure.

In this code example, the sliceHeader structure is not visible to the programmer, and the type of the pointer to the array depends on the type of the elements in the array, which gives us a general idea of the mechanism.

So far, we've used the slice operation on arrays, but you can also slice slices like this:

slice2 := slice[5:10]

As before, this operation creates a new slice.

In this case, use elements 5-9 (\ [5,9 ]) of the original slice.

It represents elements 105-109 of the original array.

The sliceHeader structure on which the variableslice2 is based looks like this:

slice2 := sliceHeader{
    Length:        5,
    ZerothElement: &buffer[105],
}

Note that this header refers to the same buffer as the original slice as the reference to the array.

You can also slice it again. That is, cut the slice and save the result to the original slice structure as follows:

slice = slice[5:10]

The sliceHeader structure of the variableslice looks the same as for the variable slice2.

This is often the case when you want to truncate a portion of a slice. The following example truncates the first and last elements of the slice.

slice = slice[1:len(slice)-1]

We often hear experienced Go programmers talk about "slice headers".

Because the slice header is actually what is stored in the slice variable.

For example, if you call a function that takes a slice as an argument, such as bytes.IndexRune, the slice header will be passed to the function.

slashPos := bytes.IndexRune(slice, '/')

For example, in this function call, what is actually passed to the function is the slice header.

There is another data item in the slice header. We'll talk about this later, but let's first see what the presence of slice headers means when writing a program that uses slices.

When a slice is used as an argument

It is important to understand the fact that the slice itself is a value, even if the slice contains a pointer.

The structure of a slice is a ** structure value ** that holds a pointer and length. It is not a pointer to a structure.

This is important. When I called IndexRune in the previous example, a copy of the slice header was passed. Its behavior has important implications.

Consider this simple function.

func AddOneToEachElement(slice []byte) {
    for i := range slice {
        slice[i]++
    }
}

As the name implies, it is a function that loops through slices and increments the elements of the slice at each iteration.

Try the following program.

func main() {
    slice := buffer[10:20]
    for i := 0; i < len(slice); i++ {
        slice[i] = byte(i)
    }
    fmt.Println("before", slice)
    AddOneToEachElement(slice)
    fmt.Println("after", slice)
}
before [0 1 2 3 4 5 6 7 8 9]
after [1 2 3 4 5 6 7 8 9 10]

Program exited.

The slice header itself was passed as a value (that is, copied and passed), but because the header contains pointers to the elements of the array, the original slice header and a copy of the header passed to the function Both refer to the same array.

So when the function returns, the modified element can be seen through the original slice variable.

As this example shows, the function arguments are actually copies.

func SubtractOneFromLength(slice []byte) []byte {
    slice = slice[0 : len(slice)-1]
    return slice
}

func main() {
    fmt.Println("Before: len(slice) =", len(slice))
    newSlice := SubtractOneFromLength(slice)
    fmt.Println("After:  len(slice) =", len(slice))
    fmt.Println("After:  len(newSlice) =", len(newSlice))
}
Before: len(slice) = 50
After:  len(slice) = 50
After:  len(newSlice) = 49

Program exited.

In this example, you can see that the contents of the slice argument can be changed by the function, but the header itself cannot be changed.

The function is passed a copy of the slice header instead of the original header, so the length stored in the slice variable (the Length property of the original header) is not changed by the function call.

Therefore, if you create a function that modifies the header, you must return the modified slice as the return value, as you did in this example.

The original slice variable hasn't changed, but the slice returned in the return value has a new length and is stored in newSlice.

Pointer to slice as method receiver

Another way the function interferes with the slice header is to pass a pointer to the slice

func PtrSubtractOneFromLength(slicePtr *[]byte) {
    slice := *slicePtr
    *slicePtr = slice[0 : len(slice)-1]
}

func main() {
    fmt.Println("Before: len(slice) =", len(slice))
    PtrSubtractOneFromLength(&slice)
    fmt.Println("After:  len(slice) =", len(slice))
}
Before: len(slice) = 50
After:  len(slice) = 49

Program exited.

The above example doesn't look very smart.

There is one common case where a pointer to a slice is displayed.

For example, it is common practice to use pointers as receivers in methods that make slice changes.

Let's say you need a method to truncate slices at the last / from the file path.

For example, dir1 / dir2 / dir3 should be dir1 / dir2.

A method that satisfies this can be written as:

type path []byte

func (p *path) TruncateAtFinalSlash() {
    i := bytes.LastIndex(*p, []byte("/"))
    if i >= 0 {
        *p = (*p)[0:i]
    }
}

func main() {
    pathName := path("/usr/bin/tso") // Conversion from string to path.
    pathName.TruncateAtFinalSlash()
    fmt.Printf("%s\n", pathName)
}

If you run it, you'll see that it works. This time the slice has been modified in the calling method.

On the other hand, if you want to write a method that capitalizes ASCII characters in the path (ignoring anything other than English), the receiver of the method can be a value because the array reference remains the same and only its contents are changed.

type path []byte

func (p path) ToUpper() {
    for i, b := range p {
        if 'a' <= b && b <= 'z' {
            p[i] = b + 'A' - 'a'
        }
    }
}

func main() {
    pathName := path("/usr/bin/tso")
    pathName.ToUpper()
    fmt.Printf("%s\n", pathName)
}

Here the ToUpper method uses two variables in the for range syntax, with the index of the array and the element corresponding to that index.

By doing this, you can reduce the number of times you write p [i] one by one.

Capacity: Slice capacity

Let's look at the following function that extends the int type slice slice passed as an argument by adding element.

func Extend(slice []int, element int) []int {
    n := len(slice)
    slice = slice[0 : n+1]
    slice[n] = element
    return slice
}

Let's run

func main() {
    var iBuffer [10]int
    slice := iBuffer[0:0]
    for i := 0; i < 20; i++ {
        slice = Extend(slice, i)
        fmt.Println(slice)
    }
}
[0]
[0 1]
[0 1 2]
[0 1 2 3]
[0 1 2 3 4]
[0 1 2 3 4 5]
[0 1 2 3 4 5 6]
[0 1 2 3 4 5 6 7]
[0 1 2 3 4 5 6 7 8]
[0 1 2 3 4 5 6 7 8 9]
panic: runtime error: slice bounds out of range [:11] with capacity 10

goroutine 1 [running]:
main.Extend(...)
	/tmp/sandbox021597325/prog.go:16
main.main()
	/tmp/sandbox021597325/prog.go:25 +0x105

Program exited: status 2.

The slices are expanding more and more, but they stop in the middle.

Describes the third component of the slice header, capacity.

In addition to the array pointer and length, the slice header also stores its capacity (capacity).

type sliceHeader struct {
    Length        int
    Capacity      int
    ZerothElement *byte
}

The capacity field contains the amount of space the underlying array actually has.

This is the maximum value that Length can reach. If you try to grow a slice beyond its capacity, you will exceed the array limit, that is, you will have out-of-array access and you will panic.

The header of the slice created by slice: = iBuffer [0: 0]

slice := sliceHeader{
    Length:        0,
    Capacity:      10,
    ZerothElement: &iBuffer[0],
}

The capacity field is equal to the length of the underlying array minus the index (0 in this case) of the corresponding array for the first element of the slice.

If you want to check the capacity of a slice, use the built-in function cap.

if cap(slice) == len(slice) {
    fmt.Println("slice is full!")
}

Make

What if you want to grow a slice beyond its capacity?

can not! By definition, capacity is the limit of expansion.

However, you can achieve equivalent results by assigning a new array, copying the data, modifying the contents of the slices and referencing the new array.

Let's start with the assignment. You can use the new built-in function to allocate a larger array and slice the result, but it's easier to use the built-in function make instead.

Allocate a new array and create a slice header that references it in one shot.

The make function takes three arguments: the slice type, its initial length, and its capacity (the length of the array that make allocates to hold the slice data).

As you can see in the example below, this call creates a slice of length 10 and the back array has room for 5 sizes (15-10).

    slice := make([]int, 10, 15)
    fmt.Printf("len: %d, cap: %d\n", len(slice), cap(slice))
len: 10, cap: 15

Program exited.
    slice := make([]int, 10, 15)
    fmt.Printf("len: %d, cap: %d\n", len(slice), cap(slice))
    newSlice := make([]int, len(slice), 2*cap(slice))
    for i := range slice {
        newSlice[i] = slice[i]
    }
    slice = newSlice
    fmt.Printf("len: %d, cap: %d\n", len(slice), cap(slice))
len: 10, cap: 15
len: 10, cap: 30

Program exited.

After running this code, the slice has much more space to expand (5-> 20).

When creating slices, it is common for them to be the same length and capacity. The built-in function make has an abbreviation for this common case.

gophers := make([]Gopher, 10) // = make([]Gopher, 10, 10)

If you do not specify a capacity, its default value is the same as the length, so you can omit it as above and set both to the same value.

Copy

When I doubled the capacity of the slice in the previous section, I created a loop that copies the old data to the new slice.

Go has a built-in function copy to make this easier. Its arguments are two slices, copying the data from the right argument to the left argument.

The following is an example of rewriting to use copy.

    newSlice := make([]int, len(slice), 2*cap(slice))
    copy(newSlice, slice)

The function copy makes a smart copy. Copy as much as you can, paying attention to the length of both arguments.

That is, the number of elements to copy is the minimum length of the two slices.

This saves a bit of writing. Also, copy returns an integer value, which is the number of elements copied, but it is not always worth checking.

The function copy is also handled properly if the source and destination are duplicates.

That is, it can be used to move the contents in a single slice. Here's how to use a copy to insert a value in the center of a slice:

// Insert inserts the value into the slice at the specified index,
// which must be in range.
// The slice must have room for the new element.
func Insert(slice []int, index, value int) []int {
    // Grow the slice by one element.
    slice = slice[0 : len(slice)+1]
    // Use copy to move the upper part of the slice out of the way and open a hole.
    copy(slice[index+1:], slice[index:])
    // Store the new value.
    slice[index] = value
    // Return the result.
    return slice
}

One thing to keep in mind in the above example is that the length has changed, so you must always return the changed slice as a return value.

It also needs to be capacity> length as it adds an element.

Append

A few sections ago, I created a function Extend that extends a slice by only one element.

However, there was a bug because the function would crash if the slice size was too small. (The Insert example has the same problem.)

Now that we have the elements to fix this, let's create a solid implementation of Extend that extends integer slices.

func Extend(slice []int, element int) []int {
    n := len(slice)
    if n == cap(slice) {
        //The slice is full and needs to be expanded
        //New capacity is 2x+1(+1 is x=For when 0)
        newSlice := make([]int, len(slice), 2*len(slice)+1)
        copy(newSlice, slice)
        slice = newSlice
    }
    slice = slice[0 : n+1]
    slice[n] = element
    return slice
}

Again, the referenced array changes completely, so you need to return the slice as the return value.

Let's use the Extend function to extend and check the behavior when the slice is full.

    slice := make([]int, 0, 5)
    for i := 0; i < 10; i++ {
        slice = Extend(slice, i)
        fmt.Printf("len=%d cap=%d slice=%v\n", len(slice), cap(slice), slice)
        fmt.Println("address of 0th element:", &slice[0])
    }
len=1 cap=5 slice=[0]
address of 0th element: 0xc000078030
len=2 cap=5 slice=[0 1]
address of 0th element: 0xc000078030
len=3 cap=5 slice=[0 1 2]
address of 0th element: 0xc000078030
len=4 cap=5 slice=[0 1 2 3]
address of 0th element: 0xc000078030
len=5 cap=5 slice=[0 1 2 3 4]
address of 0th element: 0xc000078030
len=6 cap=11 slice=[0 1 2 3 4 5]
address of 0th element: 0xc00005e060
len=7 cap=11 slice=[0 1 2 3 4 5 6]
address of 0th element: 0xc00005e060
len=8 cap=11 slice=[0 1 2 3 4 5 6 7]
address of 0th element: 0xc00005e060
len=9 cap=11 slice=[0 1 2 3 4 5 6 7 8]
address of 0th element: 0xc00005e060
len=10 cap=11 slice=[0 1 2 3 4 5 6 7 8 9]
address of 0th element: 0xc00005e060

Program exited.

Note the reassignment when the size 5 initial array is full.

When a new array is assigned, both the capacity and the address of the 0th element change.

You can use the robust Extend function as a guide to create even better functions that allow you to extend a slice with multiple elements. To do this, take advantage of Go's ability to treat the list of function arguments as a conversion to slices when the function is called, namely Go's variadic function.

Let's call the function Append. In the first version, Extend can be called repeatedly, which clarifies the mechanism of variadic functions. The signature of the Append function is:

func Append(slice []int, items ...int) []int

This signature indicates that it takes one slice followed by zero or more int values as arguments.

// Append appends the items to the slice.
// First version: just loop calling Extend.
func Append(slice []int, items ...int) []int {
    for _, item := range items {
        slice = Extend(slice, item)
    }
    return slice
}

It should be noted that the for range loop handles the elements of the argument items, which are treated as [] int, in each loop. It is also important to note that the identifier _ discards the index of unnecessary slices in this case.

    slice := []int{0, 1, 2, 3, 4}
    fmt.Println(slice)
    slice = Append(slice, 5, 6, 7, 8)
    fmt.Println(slice)

A new technique that appears in this example is to initialize a slice by writing a composite literal consisting of the slice type followed by the elements in curly braces.

    slice := []int{0, 1, 2, 3, 4}

What's even more interesting about the Append function is that you can not only add elements, but also add the second slice itself as an argument by" decomposing "the slice as an argument using the...notation on the caller side. It is a point that can be done.

    slice1 := []int{0, 1, 2, 3, 4}
    slice2 := []int{55, 66, 77}
    fmt.Println(slice1)
    slice1 = Append(slice1, slice2...) // The '...' is essential!
    fmt.Println(slice1)

In the previous Append, it was twice as long as the original slice, so the number of elements to add could cause multiple array reassignments, but the Append in the example below `It is streamlined to live with only one allocation.

// Append appends the elements to the slice.
// Efficient version.
func Append(slice []int, elements ...int) []int {
    n := len(slice)
    total := len(slice) + len(elements)
    if total > cap(slice) {
        // Reallocate. Grow to 1.5 times the new size, so we can still grow.
        newSize := total*3/2 + 1
        newSlice := make([]int, total, newSize)
        copy(newSlice, slice)
        slice = newSlice
    }
    slice = slice[:total]
    copy(slice[n:], elements)
    return slice
}

Note that copy is called twice, once when copying the slice data to the newly allocated memory and when copying the item to be added to the end of the array.

Append as a built-in function

Finally, we have a piece to explain why we designed the built-in function append.

The built-in function append runs with equal efficiency, just like the Append example above, but works with any slice type.

The weakness of Go is that it needs to provide generic type operations at runtime.

It may change someday, but for now, Go has a built-in generic append function to make slicing easier.

It works the same as a slice of [] int, but it works with any slice type.

Note that the slice header is constantly updated with the call to append, so you need to save the slice returned after the call.

In fact, the compiler cannot call append without saving the result.

Below is an example of using append. It may be interesting to actually run it or edit it.

    // Create a couple of starter slices.
    slice := []int{1, 2, 3}
    slice2 := []int{55, 66, 77}
    fmt.Println("Start slice: ", slice)
    fmt.Println("Start slice2:", slice2)

    // Add an item to a slice.
    slice = append(slice, 4)
    fmt.Println("Add one item:", slice)

    // Add one slice to another.
    slice = append(slice, slice2...)
    fmt.Println("Add one slice:", slice)

    // Make a copy of a slice (of int).
    slice3 := append([]int(nil), slice...)
    fmt.Println("Copy a slice:", slice3)

    // Copy a slice to the end of itself.
    fmt.Println("Before append to self:", slice)
    slice = append(slice, slice...)
    fmt.Println("After append to self:", slice)
Start slice:  [1 2 3]
Start slice2: [55 66 77]
Add one item: [1 2 3 4]
Add one slice: [1 2 3 4 55 66 77]
Copy a slice: [1 2 3 4 55 66 77]
Before append to self: [1 2 3 4 55 66 77]
After append to self: [1 2 3 4 55 66 77 1 2 3 4 55 66 77]

Program exited.

In particular, the final append in the above example would be a good example of why slice design is so simple.

The "Slice Tricks" Wiki page, created by a community of volunteers, has examples of various functions and operations related to append,'copy', and other slices. Introduced.

Nil

As an aside, the new knowledge I learned this time will give me an idea of how the nil slice is actually represented.

Naturally, this is the zero value of the slice header.

sliceHeader{
    Length:        0,
    Capacity:      0,
    ZerothElement: nil,
}

Or

sliceHeader{}

The key is that the pointer to the element is nil.

array[0:0]

Arrays created this way are zero in length and capacity, but pointers are not treated as nil slices and are not treated as nil slices.

Obviously, empty slices can be large (assuming the capacity is non-zero)

However, the nil slice does not have an array to hold the values in and cannot be extended to hold the elements.

However, a nil slice is functionally equivalent to a zero-length slice, even if it doesn't point to anything, so you can use append to assign an array and add elements. Append.

Strings

Here's a brief explanation of Go strings from a slice perspective.

In reality, the string mechanism is very simple. The string is a read-only byte slice with some added syntax support from the language.

They are read-only, so they don't require space (that is, they can't be scaled), but apart from that, they can be treated like read-only byte slices in most cases.

To get started, you can index individual bytes to access them.

slash := "/usr/ken"[0] // yields the byte value '/'.

It is also possible to slice the string and extract the substring.

usr := "/usr/ken"[0:4] // yields the string "/usr"

It should be obvious to everyone who has read this far what is happening behind the scenes when slicing a string.

It is also possible to generate a character string by converting a byte-type slice as follows.

str := string(slice)

The reverse is also possible.

slice := []byte(usr)

The array of bytes on which the string is based does not appear in the table. That is, there is no way to access its contents except through a string.

When performing any of these conversions, you need to make a copy of the byte array.

Of course, Go handles this for you, so you don't have to do it yourself. After any of these conversions, changes to the underlying array of byte slices do not affect the corresponding string. (For example, changing slice afterstr: = string (slice)does not affect str)

The important result of designing a string like a slice is that the creation of substrings is very efficient.

All you need to do is create a two-word string header. The string is read-only, so the original string and the substrings resulting from the slice operation can safely share the same array.

Historical Note: In the earliest implementations of strings, new byte arrays were always assigned when creating substrings, but when slices were added to the language, they were efficient characters. Provided a model for column processing. As a result, some benchmarks have seen significant speedups.

Of course, there are many more strings, and you can read more about strings in Another Blog Post (https://blog.golang.org/strings).

Summary

To understand how slices work, it's helpful to understand how slices are implemented.

There is a small data structure called the slice header, which is the item associated with the slice variable, and that header references a portion of the individually allocated array.

If you pass a slice value, the header will be copied, but the array it points to will always be shared.

Understanding how they work makes slices not only easy to use, but also powerful and expressive data structures, especially with the help of the copy and append built-in functions.

Also, if you'd like to write other articles about Go, please!

-About Go Interface -Introduction to Go Assembly

reference

Recommended Posts

Learn about Go slices
About python slices
Learn about programming
About Go functions
About Go Interface
Note about pointers (Go)
About Go control syntax
About Go GET request
About Go Modules (vgo)
[Golang] About Go language channel
Learn algorithms with Go @ recursive call
About the basic type of Go
Learn about logging using Python's logging module ①
[Golang] About Go language Producer and Consumer
Learn more about AWS serverless web applications