Various Python built-in string operations

Before

--This article is a partial excerpt from a particular chapter of the book "Getting Started with Python Programming" (https://github.com/simon-ritchie/python-novice-book-preview). ――Because it is an introductory book, it is for beginners. --Mainly the contents of the chapter related to the built-in character string operation. --Please note that there are some parts that do not match the Qiita article (words such as "chapter" and "page" are used, the number of line breaks is extra, links do not correspond, etc.). If you are interested, please use the e-book version linked above (you can download it on Github). ――If you give us feedback in comments, we may use and reflect it on the book side as well.

Each operation of the string

Up to the previous chapter, you have learned basic Python operations, types, and built-in functions. In this chapter, we will learn about additional operations for strings (we have learned some operation methods up to the previous chapter, but there are various other important and convenient operation methods, so we will learn them). ..

There are so many, so even if you don't remember everything, it's okay to just say, "By the way, there was something like this." There is no problem if you can search and remember each time you need it.

String index and slice

You learned about slicing (such as extracting only to a certain range of values in a list) in the description related section of the list, but you can actually do the same slicing with a string. As in the case of the list, it can be controlled by specifying the index range using parentheses, numbers and colons such as [1: 3].

Index numbers are assigned to each letter. As with the list, it will start from 0.

For example, if the character string is ʻApple`, the index will be assigned as follows.

You could say that a string is like a list of characters.

Let's try writing code to actually refer to the value to the index in a list.

First, try to output the contents of a specific index (specify such as [0] or [1]). You can confirm that the corresponding character alone is output.

str_value = 'Apple'
print(str_value[0])

Output contents of code execution result:

A

str_value = 'Apple'
print(str_value[1])

Output contents of code execution result:

p

Next, try extracting a character string in a specific range by slicing, such as "after the index of XX and less than the index of XX". As with the list, the number on the left side of the colon represents "after the index of XX", and the number on the right side of the colon is "less than the index of XX". In other words, if you specify [1: 4], the condition is "1 or more and less than 4" (for indexes of 1, 2, 3), and if you set it for the character ʻApple, it will be ppl`. The part can be extracted.

str_value = 'Apple'
print(str_value[1:4])

Output contents of code execution result:

ppl

Of course, as with slices in the list, you can also specify only the number on the left side of the colon and specify only the condition "after XX", or specify only the number on the right side and specify only the condition "less than XX". can.

Find out if a string starts with a particular string: startswith method

With the stringswith method, you can get the boolean value of whether the target string starts with the string passed as the first argument.

Since it is written in English such as "starts with XX character string" meaning "starts with XX character string", the method name is derived from that.

In programs, variable names and constant names are often written by unifying the values of the same group and starting with a specific character string, which is also called a prefix. pre has the meaning of" before "and is used like" what is added to the beginning "(and others).

For example, a constant can be named like ʻITEM_ID_〇〇 using the prefix ʻITEM_ID_.

The startswith method is useful for checking if such a prefix is the target string. As will be mentioned in a later chapter, variable names and constant names in the program can also be obtained as character strings, so it is possible to control such as processing if it is a specific prefix.

If you specify a prefix as the first argument, the boolean value of True or False will be returned.

Case where True is returned because the string starts with the specified prefix:

str_value = 'FRUIT_ID_APPLE'
print(str_value.startswith('FRUIT_ID_'))

Output contents of code execution result:

True

Case where False is returned because the string does not start with the specified prefix:

str_value = 'FRUIT_ID_APPLE'
print(str_value.startswith('ITEM_ID'))

Output contents of code execution result:

False

Find out if a string ends with a particular string: endswith method

The endswith method is similar to the startswith method, but the startswith method targets the beginning of the string, whereas it targets the end.

The character string at the end is called a suffix.

Like the startswith method, the endswith method is used by specifying the suffix as the first argument. The result is also returned as a boolean value.

Case where True is returned because the string ends with the specified suffix:

str_value = 'CAT_NAME'
print(str_value.endswith('_NAME'))

Output contents of code execution result:

True

Case where False is returned because the string does not end with the specified suffix:

str_value = 'CAT_AGE'
print(str_value.endswith('_NAME'))

Output contents of code execution result:

False

Although the startswith and endswith methods can do the same with slices ...

As mentioned in the previous section, strings can be sliced to extract a specific range of strings. For example, you can get the prefix part of any number of characters by doing the following.

str_value = 'FRUIT_ID_APPLE'
print(str_value[:6])

Output contents of code execution result:

FRUIT_

Also, as we will learn in a later chapter, in Python you can specify two half-width equal symbols and arbitrary values to the left and right, such as value on the left side == value on the right side, and if the values on both sides are one. If you do, it will be True, and if it does not match, it will be False.

Sample that is True because the values on both sides match:

int_value = 100
print(int_value == 100)

Output contents of code execution result:

True

Sample that is False because the values on both sides do not match:

int_value = 95
print(int_value == 100)

Output contents of code execution result:

False

Using slices and these two equal symbols, you can do the same thing as you would with the startswith and endswith methods. For example, if you specify [: 9] as a slice as shown below, you can get the first 9 characters, so if you specify the prefix that is assumed in the character string on the left side and the right side, use the startswith method. You can get the boolean value of True or False just like when.

str_value = 'FRUIT_ID_APPLE'
print(str_value[:9] == 'FRUIT_ID_')

Output contents of code execution result:

True

However, this writing method does not behave as expected if you make a mistake in counting the number of characters or inadvertently specify the number in the slice. For example, the following code does not make the expected judgment.

str_value = 'FRUIT_ID_APPLE'
print(str_value[:9] == 'FRUIT_ID')

Output contents of code execution result:

False

With this writing method, it is difficult to instantly understand the point "What is wrong ...?" At first glance, and the contents of the code also seem difficult to read. Comparison of prefixes (beginning of characters) is still more difficult to read on the suffix (end of characters) side.

The susceptibility to mistakes around here is stipulated to use starts with and ends with even in PEP 8 of the Python coding standard.

To check if a string has a specific prefix or suffix, use'' .startswith () and''.endswith () instead of string slicing.

Use startswith () and endswith () for clean, error-free code: Python Code Style Guide

If you take a little time when other people read the code, you can grasp the contents firmly, but since the time is limited in your daily work, the ideal code should be "instantly grasp the contents" as much as possible.

Unless you have a specific reason, use starts with etc. instead of slicing according to PEP8.

Find the position in a string that contains a particular string: find, rfind, index, rindex methods

In this section you will learn about the four methods of string find, rfind, index and rindex. I use the find method a lot, but the other three may be relatively rare.

The find method is used to find out where a particular string is contained within a string.

To use, specify the character string you want to search for in the first argument. The return value is indexed at the first found position. Like indexes such as lists, index values start at 0 (the first character is 0 in the index, the second character is 1 in the index, and so on).

For example, the code below searches for the character cat. The result is that 3 is returned, so you can find out that there is a character cat in the index of 3 (4th character).

str_value = 'I am a cat. There is no name yet.'
print(str_value.find('Cat'))

Output contents of code execution result:

3

If you specify the obtained index integer (3 this time) in the character string, you can confirm that the position matches the character cat.

str_value = 'I am a cat. There is no name yet.'
print(str_value[3])

Output contents of code execution result:

Cat

If you specify a string such as cat instead of the character cat, the index at which that string starts will be returned. Therefore, in this sample, the integer of the index returned when the character cat is specified is the same value of 3.

str_value = 'I am a cat. There is no name yet.'
print(str_value.find('Be a cat'))

Output contents of code execution result:

3

If the string specified in the search is not found, -1 is returned.

str_value = 'I am a cat. There is no name yet.'
print(str_value.find('dog'))

Output contents of code execution result:

-1

By using this, it can also be used to judge "whether a specific character string is included" depending on whether -1 is returned.

The second argument is the starting value of the index range to search. For example, if 4 is specified, the behavior will be "Search for character strings in the index range after 4" (if omitted, the search will be executed from the first index 0).

In the sample below, 3 and 4 are specified as the second argument, and it is confirmed that the result of whether or not the character string is found changes.

Sample case where the corresponding character is found:

str_value = 'I am a cat.'
print(str_value.find('Cat', 3))

Output contents of code execution result:

3

Sample case where the corresponding character is not found due to the influence of the start index:

str_value = 'I am a cat.'
print(str_value.find('Cat', 4))

Output contents of code execution result:

-1

The third argument is the end value of the index to be searched. If omitted, the search will be executed up to the last character string.

Please note that the value specified here is not a condition of "less than or equal to" but a condition of "less than". Therefore, if 3 is specified, index 3 will not be included in the index target, and up to 2 will be searched.

As a result of specifying 3 as the third argument, a sample case where no hit is found in the search:

str_value = 'I am a cat.'
print(str_value.find('Cat', 0, 3))

Output contents of code execution result:

-1

As a result of specifying 4 as the third argument, the range is less than 4, so a sample case where the character with the index of 3 is hit:

str_value = 'I am a cat.'
print(str_value.find('Cat', 0, 4))

Output contents of code execution result:

3


Next to the find method is the rfind method.

Contrary to the find method, the rfind method searches from the right side of the character string. r is right r which means "from the right side".

For example, if you search for the character cat in the character string" I am a cat. I sometimes think while being a cat. ", Search from the right and the position of the first hit is the return value. Since it is used, the position of the second cat character is the target.

str_value = 'I am a cat. Although I am a cat, I sometimes think about it.'
print(str_value.rfind('Cat'))

Output contents of code execution result:

11

It should be noted that although the search itself is performed from the right, the index number of the result is returned as a normal index counted from the left.

As mentioned in the list in the previous chapter, if you specify -1, -2, -3, ... in the index, the rightmost character, the second character from the right end, and the third character from the right end, respectively. You can also access by index based on the right end of the string, such as character ..., but the rfind method returns the value of the normal index based on the left end, so if you specify the index as usual You can get the corresponding character.

str_value = 'I am a cat. Although I am a cat, I sometimes think about it.'
index = str_value.rfind('Cat')
print(str_value[index])

Output contents of code execution result:

Cat

The second and third arguments are the start value and end value (less than) of the index range to be searched, as in the find method. Again, the index number is not from the right, but the index number from the left is used like a normal index. The search will be performed "from the right side" within the specified index range.

str_value = 'I am a cat. Although I am a cat, I sometimes think about it.'
print(str_value.rfind('Cat', 11))

Output contents of code execution result:

11

str_value = 'I am a cat. Although I am a cat, I sometimes think about it.'
print(str_value.rfind('Cat', 12))

Output contents of code execution result:

-1


The index method behaves much like the find method. This is also the behavior to search the character string and get the number of the index found.

However, in the find method, -1 was returned when the searched character string was not found, but in the index method, an error occurs instead of -1.

A sample case that behaves like find because the corresponding string exists:

str_value = 'I am a cat.'
print(str_value.index('Cat'))

Output contents of code execution result:

3

Sample case where an error occurs because the corresponding string cannot be found:

str_value = 'I am a cat.'
print(str_value.index('dog'))
ValueError: substring not found

substring means a part of a particular string. In these methods, it refers to the character string to be searched for as the first argument. Therefore, the error message will be something like "The character string specified in the search was not found."


As you can guess from the name, the last rindex method will behave like an error if the search is executed from the right like the rfind method and the character string specified for the search is not found like the index method.

str_value = 'I am a cat. Although I am a cat, I sometimes think about it.'
print(str_value.rindex('Cat'))

Output contents of code execution result:

11

str_value = 'I am a cat. Although I am a cat, I sometimes think about it.'
str_value.rindex('dog')
ValueError: substring not found

Replace string with another string: replace, translate, maketrans methods

In this section you will learn about string replacement. Controls such as replacing a specific character string part with another character string are applicable. It targets three methods: replace, translate, and maketrans. Especially the replace method is used frequently.


First, let's talk about the replace method. The replace method searches for a specific string and replaces that string with another.

Specify the character string to be searched for in the first argument and the character string after replacement in the second argument. For example, if you want to replace the "cat" part in the character string with "dog", write as follows.

str_value = 'I am a cat. Although I am a cat, I sometimes think about it.'
print(str_value.replace('Cat', 'dog'))

Output contents of code execution result:

I am a dog. Although I am a dog, I sometimes think about it.

The third argument is the number of replacements. If omitted, all the searched and hit character strings will be replaced. If 1 is specified, it will be replaced only once, and if 2 is specified, it will be replaced only twice. In the code below, 1 is specified for the third argument, so only the first character "cat" is replaced.

str_value = 'I am a cat. Although I am a cat, I sometimes think about it.'
print(str_value.replace('Cat', 'dog', 1))

Output contents of code execution result:

I am a dog. Although I am a cat, I sometimes think about it.


Use the translate and maketrans methods together. You can replace multiple characters at once by specifying a specific combination of the characters before and after the replacement.

In addition, the target is "one character". It cannot be used for multiple strings, so use the replace method. If you need to replace a lot of characters, you can use the translate method to replace them quickly and with simple code.

The maketrans method is a method for creating data for the combination of replacements. It can be used with a string instance, but it is more common to directly specify the class to be touched later (write it as str.maketrans).

The method name comes from the English sentence make translation table. Since translation means transfer or interpreter, and table means table, it means to make a character-to-character conversion table.

There are two ways to specify the value in the maketrans method: "the method to specify by the key and value of the dictionary" and "the method to specify by two sets of the first argument and the second argument".

First, let's see how to set in the dictionary. Specify the dictionary as the first argument of the maketrans method, specify the character before replacement for the key, and the character after replacement for the value. If you have multiple targets, specify multiple key / value sets.

This time, I will try with a code that replaces punctuation characters with the following combinations (left is before replacement, right is after replacement).

trans_table = str.maketrans(
    {
        '、': ',',
        '。': '.',
    }
)

Alternatively, specify a character string in which the characters before replacement are set in order one character at a time in the first argument (character strings , . in this example), and in the second argument the character strings after replacement are ordered character by character. You can also specify the character string set to (in this example, the character string , .). Make sure that the order of the characters in the first and second arguments is the same. This writing method has the same behavior as when using a dictionary.

trans_table = str.maketrans('、。', ',.')

Let's use the translate method with the data created by the maketrans method. Specify the data created by the maketrans method in the first argument.

str_value = 'Meow, I tried with meow, but no one came.'
print(str_value.translate(trans_table))

Output contents of code execution result:

Meow,Meowと試みにやって見たが誰も来ない.


In this section you learned about character-to-character or string-to-string replacement. There is also a replacement method (and convenient) that uses what is called a regular expression to "replace something that matches a particular pattern".

We'll talk more about regular expressions later in the regular expressions chapter.

Split strings: split, rsplit, splitlines, partition, rpartition methods

In this section you will learn about string splitting. Five methods, split, rsplit, splitlines, partition, and rpartition, are targeted. Especially the most basic split method is used a lot.

The result of dividing the character string is a list that stores the character string. It is used in cases where each value has a meaning with a specific character delimiter such as comma delimiter, space delimiter, and tab delimiter.


Let's start with the split method. split is a word that means "split".

If you specify an arbitrary delimiter in the first argument, a list containing the character strings divided by the delimiter will be returned.

In the sample below, the character strings are divided by single-byte commas to create a list that stores each character string. In the split method, the character string specified by splitting (, in this sample) is not included in the result list.

str_value = '100,200,300'
print(str_value.split(','))

Output contents of code execution result:

['100', '200', '300']

The second argument is the maximum number of splits. For example, if you specify 2, it will be split twice and the number of results in the list will be 3. The string that exceeds the number of times is not divided and remains as it is at the last value in the result list.

str_value = '100,200,300,400,500'
print(str_value.split(',', 2))

Output contents of code execution result:

['100', '200', '300,400,500']

If the second argument is omitted, the division will be performed with all delimiters.


The rsplit method, like any other method with an r at the beginning, performs the split process "from the right side". However, if the second argument is omitted, the division will be executed for all delimiters, so the same result will be obtained regardless of whether the division is from the left or the right. In other words, it behaves the same as the split method.

Code example of rsplit that gives the same result as split:

str_value = '100,200,300'
print(str_value.rsplit(','))

Output contents of code execution result:

['100', '200', '300']

When the second argument (maximum number of divisions) is specified, the division is performed "from the right", so the undivided character string remains at the beginning (left end) of the list.

str_value = '100,200,300,400,500'
print(str_value.rsplit(',', 2))

Output contents of code execution result:

['100,200,300', '400', '500']


The splitlines method splits the string on a newline basis. line is a word that also means a line.

Should I specify a line break with the split method? However, the expression of line breaks can change depending on the environment such as OS and version, so if you add everything, the code will become complicated.

Line breaks are represented by \ n in the string, \ r \ n, \ r, or by using three quotation marks, depending on the environment. You can also put in.

If you try to output a character string containing the characters \ n and \ r \ n with Jupyter on Windows with the print function, both will be displayed as line breaks.

print('a\nb')

Output contents of code execution result:

a
b

print('a\r\nb')

Output contents of code execution result:

a
b

There are multiple expressions for line breaks like this, but what if you try to split them with the split method? For example, if the split method specifies line breaks separated by \ n, when the data that the line breaks are represented by \ r \ n comes in.

When I try it, the \ r part remains in the result as shown below, and it is not what I expected.

str_value = '100\r\n200\r\n300'
print(str_value.split('\n'))

Output contents of code execution result:

['100\r', '200\r', '300']

When dealing with it in a program, I would like to simply "split by line breaks" without worrying about the difference in line break expressions around that area. For such cases, a splitlines method is provided, which allows you to simply split with line breaks without having to write complicated code.

You can see that the same result can be obtained by executing it on a character string containing various line breaks as shown below.

Split sample in the case where line breaks are represented by \ n:

str_value = '100\n200\n300'
print(str_value.splitlines())

Output contents of code execution result:

['100', '200', '300']

Split sample in the case where line breaks are represented by \ r \ n:

str_value = '100\r\n200\r\n300'
print(str_value.splitlines())

Output contents of code execution result:

['100', '200', '300']

Split sample in the case where line breaks are represented by \ r:

str_value = '100\r200\r300'
print(str_value.splitlines())

Output contents of code execution result:

['100', '200', '300']

Split sample in the case where line breaks are written directly in string representation with three consecutive quotes:

str_value = """100
200
300
"""
print(str_value.splitlines())

Output contents of code execution result:

['100', '200', '300']


The partition method, like the split method, executes a partition by specifying a delimiter. However, it differs from the split method in the following points.

--The split will only be done once. --A tuple containing the three values of the split left character string, the delimiter, and the split right character string will be returned. --The split method returns a list, and the partition method is a tuple, taking into account the fact that the split result value is three. Please note that the returned value types are different. --The split method does not include the delimiter in the list of results, but the partition method also includes the delimiter in the resulting tuple.

I will actually write the code and try it. If you try it on a string that contains a colon as a sample, you can see that the result is split between the strings to the left and right of the colon.

str_value = '100:200'
print(str_value.partition(':'))

Output contents of code execution result:

('100', ':', '200')

Even if the string contains multiple delimiters, it will only be split once. The delimiter remains in the value to the right of the tuple.

str_value = '100:200:300:400'
print(str_value.partition(':'))

Output contents of code execution result:

('100', ':', '200:300:400')

If there is no delimiter specified by the argument in the string, the original string is placed in the first index of the tuple, and the empty string is set in the second and third indexes. The delimiter is not included. The number of tuple values remains three.

str_value = '100:200'
print(str_value.partition(','))

Output contents of code execution result:

('100:200', '', '')


The final rpartition method is split "from the right", as you can guess from the method name. Other behavior is the same as partition.

str_value = '100:200:300:400:500'
print(str_value.rpartition(':'))

Output contents of code execution result:

('100:200:300:400', ':', '500')

Insert variables etc. in strings or insert values in a specific format:% symbol, format, format_map method, f-strings

When creating a character string with the value of a variable inserted, there is a method of concatenating the character strings with the + symbol. For example, write as follows.

name = 'Tama'
concatenated_str = 'The name of my cat is' + name + 'is.'
print(concatenated_str)

Output contents of code execution result:

The name of my cat is Tama.

However, if the value of the variable is a value other than a character string, such as an integer, this method will result in an error. For example, if you try to concatenate an integer variable called ʻage` as shown below, an error will occur.

age = 5
concatenated_str = 'The age of my cat' + age + 'I'm old.'
TypeError: can only concatenate str (not "int") to str

Since concatenate is a word that means "concatenate", the error message will be something like "only strings and strings (not integers) can be concatenated".

If you want to use a variable of a type other than an integer or another string to concatenate strings, you need to make the target variable a string with a process called cast (cast will be mentioned in a later chapter). ).

Casting to a string can be achieved by passing the target variable etc. to the argument of the str () function. If you rewrite the code that gave the error earlier so that it is cast to a character string so that the error does not occur, it will be as follows.

age = 5
concatenated_str = 'The age of my cat' + str(age) + 'I'm old.'
print(concatenated_str)

Output contents of code execution result:

My cat is 5 years old.

Even with this writing method, I was able to achieve the purpose of "inserting variables into character strings and concatenating them". However, the description is a little complicated (+ symbol, cast part, etc.), and the part that was supposed to be a variable of the character string becomes a value of another type in some time signature, so I forgot to cast it. It's not that there are cases where an error occurs.

The introduction has been lengthened, but in this section we will learn how to insert variables into a string in a simpler and more readable form in such cases.


First, we will learn how to write using the % symbol. As mentioned in the previous chapter, the % symbol can be used as an integer to calculate the remainder (remainder).

6 % 4

Output contents of code execution result:

2

On the other hand, in a string, it is used to insert a variable into the string using the % symbol. Use the % symbol along with the alphabetic characters that represent a particular format in the string. First of all, I will write the sample code with a character string in the notation of % s with s added as "format as it is of a character string (string)".

Divide the space between the character string and the variable you want to insert with the % symbol, and write the following in the form of a character string on the left and a variable on the right.

age = 5
concatenated_str = 'The age of my cat%I'm s years old.' % age
print(concatenated_str)

Output contents of code execution result:

My cat is 5 years old.

The part where the variable is inserted is only % s, and multiple+symbols etc. have disappeared, resulting in a neat description. Also, the cast (str ()) of the target variable to the character string can be omitted.

In addition to % s, there are various specifications such as% d, % .3f, and% x. I mentioned various things in the format function in the chapter on built-in functions, but each has the following meaning and behavior (there are various other things besides those mentioned here. I explained various things in the format function section, so here 10 I will omit the explanation of decimal numbers and decimal numbers).

--% s-> string. It is treated as a string as it is (similar to casting with the str () function). I think that it will be used most often when inserting the value of a variable into a character string. --% d-> digit. It is inserted in the character string as a decimal value in the range of 0 to 9 that is used in everyday life. --% .3f-> float. It is inserted in the string as a floating point number. The part such as 3 is an arbitrary number, and it is a specification of how many decimal places to display. If you specify 3, it will be reflected in the character string with a numerical value such as 0.000. --% x-> hex. It is inserted in the string in hexadecimal.

Let's actually write some code other than % s and check the behavior. First is % d. % d inserts the value into the string as a decimal integer. Even if you specify a value that includes a decimal number such as 5.5, it will be converted to an integer, so it will be output as a character string with a value such as 5.

age = 5.5
concatenated_str = 'The age of my cat%I'm d years old.' % age
print(concatenated_str)

Output contents of code execution result:

My cat is 5 years old.

When % d is used, the value of the specified variable etc. must be a" numerical value that can be converted to an integer ". Floating point numbers and integers can be specified, but if you specify other than that, for example, a character string, an error will occur (if you need to insert a character string, use % s etc.) ..

name = 'Tama'
concatenated_str = 'The name of my cat is%It is d.' % name
TypeError: %d format: a number is required, not str

The error message is something like "The% d format requires a number, not a string. "

With % .3f, the value of the specified variable is inserted as a string with a specific number of decimal places. If you specify the format as % .3f with a variable with the value 5.5, it will be inserted into the string with the value 5.500. If you set % .2f, it will be displayed to the second decimal place and will be inserted with a value such as 5.50.

age = 5.5
concatenated_str = 'The age of my cat%.I'm 3f years old.' % age
print(concatenated_str)

Output contents of code execution result:

My cat is 5 years old.I'm 500 years old.

If you want to insert multiple variables in a string, use tuple brackets to specify multiple variables. Write in tuples, for example (name, age).

name = 'Tama'
age = 5
concatenated_str = \
    'The name of my cat is%s. Age is%I'm s years old.' % (name, age)
print(concatenated_str)

Output contents of code execution result:

The name of my cat is Tama. I am 5 years old.

If the specification such as % s in the character string does not match the number of values in the tuple, an error will occur (similar to an error when the number does not match in the argument of the function). is not it).

In the code below, there are three % s specifications in the string, but there are only two variables on the tuple side, so an error occurs.

name = 'Tama'
age = 5
concatenated_str = \
    'I have%The name of s is%s. Age is%I'm s years old.' % (name, age)
TypeError: not enough arguments for format string

The error message will be a message such as "The arguments (variables) required for character string formatting (inserting variables here) are not enough".

An error will occur even if the number of tuples is large.

name = 'Tama'
age = 5
concatenated_str = \
    'The name of my cat is%s. I am 3 years old.' % (name, age)
TypeError: not all arguments converted during string formatting

An error message such as "During string formatting (inserting variables), all arguments (each variable in the tuple) could not be converted (inserted) (due to insufficient number of% s) " It will be the contents.

Also, tuples are used to insert multiple values, so if you want to insert the tuple itself, it will not work as it is. If you specify a tuple variable that stores multiple values with only one notation such as % s, it will be judged that the numbers do not match as described above and an error will occur. In such cases, control such as casting the tuple to a string is required.

Besides that, just as it becomes difficult to read as the number of arguments increases in the function, it is easy to make mistakes in controlling the order etc. when the description of % s increases in the character string. It will be. You'll want features like keyword arguments.

In order to improve the problem of control by % around that, the format method described later has been added in the Python version after the control by %.


After controlling with the % symbol, we will learn about the format method. It can be used in the same way as inserting a value such as a variable into a string using the % symbol, but this is a newer function in the Python version, and the amount of code written will increase a little, but various things. The number of functions has increased and problems have been improved.

To use it, first add the {} parentheses in the string where you want to put the value of the variable. After that, execute the format method with that string and specify the variable you want to insert as an argument. A simple example would look like this:

name = 'Tama'
formatted_str = 'The name of my cat is{}is.'
formatted_str = formatted_str.format(name)

print(formatted_str)

Output contents of code execution result:

The name of my cat is Tama.

If you want to insert multiple variables, set the parentheses of {} in multiple strings.

name = 'Tama'
age = 5
formatted_str = 'The name of my cat is{}is. Age is{}I'm old.'
formatted_str = formatted_str.format(name, age)

print(formatted_str)

Output contents of code execution result:

The name of my cat is Tama. I am 5 years old.

When multiple arguments are specified, the values are set in the {} part in order. For example, when (name, age) is specified as an argument, the value of the argument of name is set in the first{}part of the character string, and ʻage is set in the next {} `part. The value of the argument of is reflected.

If you want to adjust this order, set an integer value in the parentheses of {} and write something like {0} or {1}. The integer in parentheses is the number of the argument starting from 0 (the first argument is 0, the next is 1, the next is 2 ...).

When you actually write the code as follows, the third argument (name) is set in the string before the value of the second argument (ʻage`). Can be confirmed.

animal = 'Cat'
name = 'Tama'
age = 5
formatted_str = 'I have{0}Is the name{2}is. Age is{1}I'm old.'
formatted_str = formatted_str.format(animal, age, name)

print(formatted_str)

Output contents of code execution result:

The name of my cat is Tama. I am 5 years old.

If you want to use the parentheses of {} as they are, and you also want to insert variables in the format method, write {{}} and the parentheses twice to make the parentheses. It is treated as an ordinary parenthesis string ({}) that is ignored by the format method (sometimes called escaping these controls).

In the sample below, you can see that the escaped {} parentheses remain in the output even if you use the format method.

name = 'Tama'
age = 5
formatted_str = 'I have{{Cat}}Is the name{}is. Age is{}I'm old.'
formatted_str = formatted_str.format(name, age)

print(formatted_str)

Output contents of code execution result:

I have{Cat}The name of is Tama. I am 5 years old.

You can also specify a variable merge like a keyword argument by writing the argument name in parentheses, such as {name} or {age}. You can avoid mistakes in the order of arguments, and you can make the code easy to read even if there are many arguments.

cat_name = 'Tama'
cat_age = 5
formatted_str = 'The name of my cat is{name}is. Age is{age}I'm old.'
formatted_str = formatted_str.format(
    name=cat_name,
    age=cat_age,
)

print(formatted_str)

Output contents of code execution result:

The name of my cat is Tama. I am 5 years old.

Writing using this keyword argument is frequently used at work. For simple variable value insertion, processing using the % symbol is often used, but when the number of variables is large (3 or more, etc.), keywords are used in the format method from the viewpoint of readability. It is often described using arguments. As the number of arguments increases, it becomes difficult to read and it becomes easy to make mistakes, so let's actively use keyword arguments.

The following is a slightly more advanced way of writing and less likely to be used, but if the value specified in parentheses of {} is a list or dictionary, it can be referenced as an index.

For example, writing {0} refers to the variable of the first argument, but if the value of the first argument is a dictionary with the key name, write{0 [name]}. You can expand the value of the name key in the dictionary into a string with. In this sample, the first argument ({0}) is used, but of course it can be used after the second argument.

dict_value = {
    'name': 'Tama',
    'age': 5,
}
formatted_str = \
    'The name of my cat is{0[name]}is. Age is{0[age]}I'm old.'
formatted_str = formatted_str.format(dict_value)

print(formatted_str)

Output contents of code execution result:

The name of my cat is Tama. I am 5 years old.

You can do the same with lists. For example, if you specify a variable in the list as the first argument and write {0 [0]}, the value of index 0 of the first argument, and if you write {0 [1]}, the index 1 of the first argument The value is expanded.

list_value = [
    'Tama',
    5,
]
formatted_str = 'The name of my cat is{0[0]}is. Age is{0[1]}I'm old.'
formatted_str = formatted_str.format(list_value)

print(formatted_str)

Output contents of code execution result:

The name of my cat is Tama. I am 5 years old.

If you use this style a lot, the code may be difficult to read because the parentheses and index numbers are continuous. Even when using a list or dictionary, the same thing can be done by setting the value of the keyword argument alone in the character string (for example, in the form of {name}) and referring to the index etc. when specifying the argument. , If the code becomes difficult to read, it is recommended to write the keyword argument alone. Below is an example of rewriting.

list_value = [
    'Tama',
    5,
]
formatted_str = 'The name of my cat is{name}is. Age is{age}I'm old.'
formatted_str = formatted_str.format(
    name=list_value[0],
    age=list_value[1],
)

print(formatted_str)

Output contents of code execution result:

The name of my cat is Tama. I am 5 years old.

I've used the notation {0} as the first argument in the sample, but of course you can write it in other ways, such as using keyword arguments. For example, you can write {name_dict [cat_name]} (in the sample, the character string has become longer, so the parentheses and line breaks of () are used).

name_dict = {'cat_name': 'Tama'}
age_list = [5]

formatted_str = (
    'The name of my cat is{name_dict[cat_name]}is.'
    'Age is{age_list[0]}I'm old.'
).format(
    name_dict=name_dict,
    age_list=age_list,
)

print(formatted_str)

Output contents of code execution result:

The name of my cat is Tama. I am 5 years old.

As for how to access the dictionary for a specific key value in a string, note that you cannot use variables for keys.

Variables can be used as keys in the reference to the value of a normal dictionary. For example, you can access the following dictionary values (written as [name_key]).

dict_value = {'cat_name': 'Tama'}
name_key = 'cat_name'
print(dict_value[name_key])

Output contents of code execution result:

Tama

On the other hand, the reference of the dictionary value key in the character string is not written as {dict_value ['name']}, but directly described without quotation marks (' symbol etc.) such as [name]. To do. In that case, access will be done with the key name name instead of the variable name. If you want to insert a value into a string using a variable for a specific key, you cannot use a variable for the key inside the string, so in the place specified by the argument outside the string as shown below (It must be set (in writing name = dict_value [name_key]).

name_key = 'name'
age_key = 'age'
dict_value = {
    name_key: 'Tama',
    age_key: 5,
}

formatted_str = 'The name of my cat is{name}is. Age is{age}I'm old.'
formatted_str = formatted_str.format(
    name=dict_value[name_key],
    age=dict_value[age_key],
)

print(formatted_str)

Output contents of code execution result:

The name of my cat is Tama. I am 5 years old.

Then you will learn how to write with a colon inside the {} parentheses. Set a half-width colon in parentheses, and put the target variable (the part written as {0} or {1} according to the argument order or {name} by the keyword argument) on the left side of the colon. You can insert a value in a specific format by writing it and specifying a format string (such as .3f) to the right of the colon. Write with a colon, such as {0: .3f} or {name: .3f}.

The parts excluding % such as % d and% .3f when writing using the % symbol are applicable, and the functions work in the same way (for example, .3f). If you specify, the behavior is the same, such as displaying in the form of a character string up to the third decimal place).

In the sample below, the keyword argument ʻageis inserted, and the specification to display up to the third decimal place is specified in the character string as{age: .3f}. You can see that the character string of the output result is 5.500 in the form of displaying up to the third place instead of the original value 5.5`.

age = 5.5
formatted_str = 'The age of the cat{0:.3f}I'm old.'
formatted_str = formatted_str.format(
    age,
)

print(formatted_str)

Output contents of code execution result:

Cat age is 5.I'm 500 years old.


The last method in this section is format_map.

The format_map method behaves much like the format method. However, the argument is in the form of specifying one dictionary. Each key and value set in the argument dictionary is expanded as if the keyword argument was used in the format method, and each value is expanded in the string.

dict_value = {
    'name': 'Tama',
    'age': 5,
}
formatted_str = 'The name of my cat is{name}is. Age is{age}I'm old.'
formatted_str = formatted_str.format_map(dict_value)
print(formatted_str)

Output contents of code execution result:

The name of my cat is Tama. I am 5 years old.

As mentioned in the function chapter, I learned that if you specify two half-width asterisks and a dictionary as arguments when executing a function (or method), the keys and values in the dictionary will be expanded as keyword arguments. .. By combining that writing method with the format function, you can actually achieve the same behavior without using the format_map method. For example, if you specify ** dict_value in the argument of the format method as shown below, you can get the same result as when using the format_map method.

dict_value = {
    'name': 'Tama',
    'age': 5,
}
formatted_str = 'The name of my cat is{name}is. Age is{age}I'm old.'
formatted_str = formatted_str.format(**dict_value)
print(formatted_str)

Output contents of code execution result:

The name of my cat is Tama. I am 5 years old.

Why is the format_map method provided when the format method gives the same result as the format_map method? The reasons are as follows. However, there are rare times when format_map is needed when fine control is needed, and the format_map method is used less frequently.

--The format_map method does not make a copy of the argument dictionary. If two asterisks and a dictionary are specified in the format method, a copy of the dictionary specified in the argument is created. Memory and processing time will increase slightly by the amount copied. Therefore, format_map is more advantageous in terms of performance when specifying a dictionary of large data. However, in most cases, the difference is within the margin of error, as there are many cases of small dictionaries (numbers, character strings, etc.) that are specified by keyword arguments. ――As I will mention in a later chapter, I sometimes write code that rewrites (overwrites) some functions of the dictionary using functions such as inheritance using classes. In such a case, if you specify an argument using two asterisks, the overwritten part will be ignored because it is "copied as a dictionary". This area will be discussed in a later chapter, so for now, think about "customized things are rarely ignored by the format method".


At the end of this section, we'll also touch on f-strings.

f-strings is a function that allows you to insert variables etc. in a string or execute Python code by prepending the letter f before the quotation mark (single quotation mark etc.) of the string.

Enclose the variable part and the Python code part in {} parentheses like the format method.

The simple sample code is as follows. Note that the character f is added to the beginning of the string part cat ... and that the variable name is included in the resulting string without methods such as format. please.

name = 'Tama'
txt = f'The name of the cat is{name}is.'
print(txt)

Output contents of code execution result:

The name of the cat is Tama.

Using f-strings has the advantage that you can insert variables directly and you do not need to call the format method, so the amount of code can be shortened. Although the writing method is quite different, the content is similar to the format method.

You can also write Python processing etc. in the parentheses of {} in f-strings. For example, you can add in the character string as follows.

age = 5
print(f'next year{age + 1}I will be old.')

Output contents of code execution result:

I will be 6 years old next year.

You can also execute functions in the character string.

def get_name():
    return 'Tama'


print(f'The name of the cat is{get_name()}is.')

Output contents of code execution result:

The name of the cat is Tama.

To specify the format (for example, the number of decimal places after the decimal point) in the writing method using the % symbol or the format method, the half-width colon symbol (: ) is the same as in the format method. ), And specify the format on the right side.

value = 123.456789
print(f'Values including the third decimal place{value:.3f}is.')

Output contents of code execution result:

The value including the third decimal place is 123.It's 457.

This format specification part (the part that is .3f in the code) is called a format specifier. If you read the code inside Python, the argument name etc. is format_spec etc., but this argument name is derived from the format specifier.

In f-strings, you can also specify variables in the format specifier part by inserting additional {} parentheses (called nesting) in this format specifier part (format according to conditions). You can change the specifier).

format_spec = '.2f'
value = 123.456789
print(f'Values including the second decimal place{value:{format_spec}}is.')

Output contents of code execution result:

The value including the second decimal place is 123.It's 46.

Join the list values into a single string with any string: join method

The join method joins values such as a list containing an arbitrary character string with the specified character string in between.

Variables can also be used for the character string sandwiched between them, but fixed characters and character strings are often specified directly. For example, when you want to join the values in a list separated by commas, you can write them in the form ','. join. Specify the list (or value such as tuple) that you want to concatenate in the first argument of the method.

Sample to concatenate lists (variables called ʻanimals`) separated by commas:

animals = ['Cat', 'dog', 'rabbit']
print(','.join(animals))

Output contents of code execution result:

Cat,dog,rabbit

Sample to concatenate lists with two underscores (__):

animals = ['Cat', 'dog', 'rabbit']
print('__'.join(animals))

Output contents of code execution result:

Cat__dog__rabbit

An error will occur if the contents of the list are other than character strings (numerical values, etc.).

int_list = [1, 2, 3]
print(','.join(int_list))
TypeError: sequence item 0: expected str instance, int found

Stores of ordered values, such as lists and tuples, are also collectively referred to as sequences. Therefore, the error message will be something like "There was an integer (int) value in the instance assumption of the character string (str) in the element (item) of index 0 of the sequence (list)".

Get the number of times a particular string appears in a string: count method

The count method returns the number of cases that the character string specified in the first argument is included in the target character string.

4 is returned because there are 4 characters "cat" in the string Sample:

txt = (
    'I am a cat.'
    'After that, I met a cat a lot, but I have never met such a single wheel.'
    'A cat has come. Even at night, a cat starts crying loudly.'
)
print(txt.count('Cat'))

Output contents of code execution result:

4

The second argument is the start index of the search range, and the third argument is the value minus 1 from the end index of the search range.

Sample to search the character string range of index from 0 to 11:

txt = (
    'I am a cat.'
    'After that, I met a cat a lot, but I have never met such a single wheel.'
    'A cat has come. Even at night, a cat starts crying loudly.'
)
print(txt.count('Cat', 0, 12))

Output contents of code execution result:

2

It corresponds to the integer to the right and the integer to the right of the colon used in the slice, respectively. In other words, to check the search target range, you can respond by slicing to the following character string (specify 0 of the second argument and 12 of the third argument of the above code sample with a slice. I have).

print(txt[0:12])

Output contents of code execution result:

I am a cat. Then the cat

Remove certain characters, such as whitespace, from the edges of a string: strip, lstrip, rstrip methods

The strip method removes certain characters at the beginning and end of the string. strip is a word that means "remove".

You can also delete a specific string by specifying an empty string (replace with an empty string) with the replace method, but the strip method has the following differences compared to the replace method.

--The argument can be omitted. --If you omit the argument, characters called whitespace characters such as spaces and line breaks at both ends of the character string are deleted. --Deletes are performed character by character instead of strings. For example, if you specify the character string cat dog as an argument, it will be deleted in units of the characterscatanddoginstead of being deleted by the character stringcat dog.

First, let's check the behavior in the case where the argument is omitted. Proceed with a character string that includes spaces and whitespace characters such as line breaks (\ n is the character that represents one line break) at both ends of the character string as shown below.

txt = 'I am a cat.\n\n'
print(txt)

Output contents of code execution result:

I am a cat.


If you look through the strip method, you can see that the whitespace characters at both ends have been removed.

txt = 'I am a cat.\n\n'
print(txt.strip())

Output contents of code execution result:

I am a cat.

If a character string is specified as the first argument, the deletion process is executed at both ends one character at a time in that character string. For example, if you specify the character string cat dog as an argument, the deletion process will be executed until the characters are no longercator dog from both ends.

txt = 'Cat dog cat rabbit wolf dog cat dog'
print(txt.strip('cat dog'))

Output contents of code execution result:

Rabbit wolf


The lstrip method is a method that deletes characters only at the left end of the strip method. l becomes l on the left.

The usage and arguments are the same as the strip method. In the code below, you can see that the letters cat and dog on the far right remain.

txt = 'Cat dog cat rabbit wolf dog cat dog'
print(txt.lstrip('cat dog'))

Output contents of code execution result:

Rabbit wolf dog cat dog


As you can see from the flow of the lstrip method etc., the rstrip method deletes characters only at the right end. The r at the beginning of the method name is the r of right. Usage is the same as the strip and lstrip methods.

txt = 'Cat dog cat rabbit wolf dog cat dog'
print(txt.rstrip('cat dog'))

Output contents of code execution result:

Cat dog cat rabbit wolf

Uppercase strings: upper method

The upper method converts the string to all uppercase. The character string part of the alphabet will be converted.

The method name comes from that because it means "uppercase" in the upper case.

txt = 'Apple and orange'
print(txt.upper())

Output contents of code execution result:

APPLE AND ORANGE

Not only half-width characters but also full-width characters can be converted in the same way.

txt = 'Apple'
print(txt.upper())

Output contents of code execution result:

APPLE

Make strings all lowercase: lower method

The lower method, as opposed to the upper method, converts the uppercase string part of the alphabet to lowercase. The method name comes from that because it means "lowercase" in the lower case.

The usage and behavior are the same as the upper method, only the case conversion is reversed.

txt = 'Apple And Orange'
print(txt.lower())

Output contents of code execution result:

apple and orange

Zero pad a string of numbers: zfill method

The process of adding 0 to the left of an arbitrary integer until it reaches a certain number of characters is called zero padding or zero padding.

For example, zero padding to align the character 135 to 5 characters yields 00135.

The zfill method handles this zero filling process. In the first argument, specify the number of characters in the result as an integer. If you specify 5, 0 is given until it becomes 5 characters, and if you specify 7, 0 is given until it becomes 7 characters.

txt = '135'
print(txt.zfill(5))

Output contents of code execution result:

00135

Add characters to a string until it reaches a certain number of characters: rjust, ljust, center methods

The rjust, ljust, and center methods also behave like zfill to fill characters until they reach an arbitrary number of characters. However, the following behavior is different from zfill.

--You can specify any character to fill in instead of 0. --The method allows you to select the position of the character to be filled, left edge, right edge, or both ends.


In the rjust method, the string is placed on the right and any character is padded on the left edge. Right justifying means "align right", and the method name is derived from that.

Specify the integer of the final number of characters in the first argument and the character to be filled in the second argument.

txt = 'Dog dog'
print(txt.rjust(5, 'Cat'))

Output contents of code execution result:

Cat cat cat dog dog


In the ljust method, the original string is aligned to the left, and the missing characters are filled on the right.

txt = 'Dog dog'
print(txt.ljust(5, 'Cat'))

Output contents of code execution result:

Dog dog cat cat cat


The center method aligns the original string in the center and fills both the left and right edges with the ventral numbered characters.

txt = 'dog'
print(txt.center(5, 'Cat'))

Output contents of code execution result:

Cat cat dog cat cat

If the number of characters to be filled on the left and right is odd, the left end will be filled more.

txt = 'dog'
print(txt.center(5, 'Cat'))

Output contents of code execution result:

Cat cat dog cat cat

Uppercase the first letter and lowercase the other: capitalize method

Importance: ★★ ☆☆☆ (You don't have to know at first)


The capitalize method converts the first letter of the alphabet to uppercase and the others to lowercase. This is useful for English sentences.

The word capitalize itself has a meaning of "starting with a capital letter" in addition to a money-related meaning such as "capitalize", and the method name is derived from that.

txt = 'apple and orange'
print(txt.capitalize())

Output contents of code execution result:

Apple and orange

Not only lowercase letters are converted, but if the string contains uppercase letters, all but the first letter are converted from uppercase to lowercase.

txt = 'APPLE AND ORANGE'
print(txt.capitalize())

Output contents of code execution result:

Apple and orange

Capitalize the first letter of each English word: title method

The title method converts the first letter of each word in the alphabet to uppercase and the others to lowercase.

In English, titles and headings are often written as "capitalize the first letter of the main word and lowercase the rest", and this type of writing is called a title case. I will. The name of the title method comes from the title case.

Note that words such as ʻandandthe are usually kept in lowercase, and words such as nouns are reflected in the title case by capitalizing the beginning (for example, ʻApple and Orange). However, the Python title method will capitalize the first letter of every word.

txt = 'apple and orange'
print(txt.title())

Output contents of code execution result:

Apple And Orange

Swap case: swapcase method

The swapcase method makes the lowercase part uppercase and the uppercase part lowercase. swap has the meaning of "swap".

txt = 'Apple And Orange'
print(txt.swapcase())

Output contents of code execution result:

aPPLE aND oRANGE

Check if all strings are uppercase: isupper method

The isupper method returns True, a boolean value, if the alphabet in the string is all uppercase, and False otherwise. As mentioned in other sections, ʻupper case` means uppercase, so the method name is isupper, which returns a boolean value with the meaning of "character string is upper case".

txt = 'APPLE AND ORANGE'
print(txt.isupper())

Output contents of code execution result:

True

If the string contains lowercase letters, False will be returned instead of True.

txt = 'Apple And Orange'
print(txt.isupper())

Output contents of code execution result:

False

Check if all strings are lowercase: islower method

The islower method is the opposite of the isupper method and returns the boolean value True if the string alphabet is all lowercase. lower case means lowercase.

txt = 'apple and orange'
print(txt.islower())

Output contents of code execution result:

True

If even one character contains uppercase letters, it will be False.

txt = 'Apple and Orange'
print(txt.islower())

Output contents of code execution result:

False

The isupper method is the same, but even if symbols other than uppercase alphabets and Japanese are included, they do not affect the judgment. The condition is that the character string contains lowercase letters and does not contain uppercase letters.

txt = 'apple apple'
print(txt.islower())

Output contents of code execution result:

True

Check if the first letter of each word is all uppercase and the others are lowercase: istitle method

The istitle method returns the boolean value of whether or not the title case (the first letter of each word is uppercase) mentioned in the title method section a while back (although not exactly).

txt = 'Apple And Orange'
print(txt.istitle())

Output contents of code execution result:

True

If any word does not start with a capital letter, the result will be False.

txt = 'Apple and orange'
print(txt.istitle())

Output contents of code execution result:

False

Check if all strings are alphanumeric: isdecimal, isdigit, isnumeric, isascii methods

In this section you will learn about each method that gets the boolean value to determine if the content of a string is a specific string (decimal integer only string, alphabet only string, etc.) To go.


The isdecimal method returns the boolean value of whether the character string is an integer character string that can be represented by a decimal number (a numerical representation of 10 numbers from 0 to 9 that is used in everyday life). (Decimal number means decimal number). If a decimal number (or decimal symbol) other than an integer is included, it will be False. Even if it contains symbols or non-integers such as Japanese and English, it will be False.

A string of numbers, for example 150, is True.

txt = '150'
print(txt.isdecimal())

Output contents of code execution result:

True

Even if it is full-width, if the content is only an integer, the result will be True.

txt = '150'
print(txt.isdecimal())

Output contents of code execution result:

True

If a value other than an integer, such as a decimal point, is included, it will be False.

txt = '3.14'
print(txt.isdecimal())

Output contents of code execution result:

False

It will be False if symbols and whitespace characters are also included. For example, even if a space is included at the left end as shown below, it will be False. In the case of a program that may contain such whitespace characters, if you use the strip method mentioned in the previous section to delete the extra whitespace characters and process it, the judgment may result in unexpected results. It can be avoided.

txt = '  150'
print(txt.isdecimal())

Output contents of code execution result:

False


The isdigit method behaves like isdecimal and returns a boolean value indicating whether it is an integer or not. However, this is not only the usual numbers from 0 to 9, but also the letters of circled numbers such as and that surround the numbers with 〇 and the letters of the exponent (3 parts of 2 to the 3rd power). Some special numeric characters, such as (small numeric characters with tops), are allowed (more conditions are true than the isdecimal method).

While isdecimal is a decimal number as the name suggests, digit has the meaning of "Arabic numeral (characters such as 123)", so isdigit is a judgment as to whether it is an Arabic numeral including special characters. ..

Sample that becomes True with a normal integer like isdecimal:

txt = '150'
print(txt.isdigit())

Output contents of code execution result:

True

Sample that becomes False because it contains non-integer characters like isdecimal:

txt = '150 yen'
print(txt.isdigit())

Output contents of code execution result:

False

Sample to confirm that even special circled characters such as are True:

txt = '①②'
print(txt.isdigit())

Output contents of code execution result:

True

Sample to confirm that even special exponential characters such as ² are True (Because conversion is difficult, please copy the code sample when executing the code):

txt = '²³⁴'
print(txt.isdigit())

Output contents of code execution result:

True


The isnumeric method also removes the "Arabic numeral character" limitation of the isdigit method (more characters than isdigit are targeted). True if all strings represent numbers.

Since the restriction of Arabic numerals is removed, True is returned even for Roman numerals such as .

txt = 'ⅠⅢⅥ'
print(txt.isnumeric())

Output contents of code execution result:

True

Besides, even a kanji character string will be True.

txt = 'Shichigosan'
print(txt.isnumeric())

Output contents of code execution result:

True


The isascii method returns True if the string consists of characters called ASCII codes, such as half-width alphanumeric characters, some symbols (such as the @ symbol), and special characters such as line breaks. ..

txt = '~@abcABC123'
print(txt.isascii())

Output contents of code execution result:

True

False is returned for double-byte characters.

txt = 'ABC'
print(txt.isascii())

Output contents of code execution result:

False


There are other methods such as isalpha and isalnum, but this seems to be infrequently used, and it seems that the name is alphabetic (alphabet) (isalpha) and alphanumeric (isalnum). Full-width characters such as kanji are also a little counterintuitive, so I will omit the explanation here.

Check if all strings are whitespace: isspace

The isspace method returns a boolean value indicating whether the string is all whitespace characters. Whitespace characters are half-width spaces, full-width spaces, line breaks (often written as \ n in the character string), tabs (also often written as \ t in the character string). And so on.

txt = '  \n\t'
print(txt.isspace())

Output contents of code execution result:

True

Even if a space is included, it will be False if there is a character other than a white space in the character string.

txt = 'cat dog\n\t'
print(txt.isspace())

Output contents of code execution result:

False

Perform character code conversion: encode. decode method

The encode and decode methods convert the character code of the string. Basically, when handling text in Python, the character code called UTF-8 is almost the same, but there are occasional cases where you have to handle other character codes (Shift_JIS, etc.) depending on old files, environments, file formats, etc. Occurs.

However, rather than using the encode or decode method of the character string, there are many cases where these character codes are specified when reading or writing a text file or the like.

I think that the method of the character string itself is rarely used, so I will keep it to a light touch here (I will learn about file operations etc. in a later chapter).

First is the encode method. The encode method converts a normal string on Python to a value with a specific character code. The converted value will be an instance of the bytes class. For example, if you convert it to Shift_JIS, it will be a value such as \ x94L, which will be unreadable at first glance.

Specify the character code in the first argument of the encode method. This time I'm going to convert it to Shift_JIS, so specify the value sjis (there are various other fixed values such as ʻutf-8`).

txt = 'cat dog'
sjis_txt = txt.encode('sjis')
print('Text content:', sjis_txt, '\n type:', type(sjis_txt))

Output contents of code execution result:

Text content: b'\x94L\x8c\xa2' 
Mold: <class 'bytes'>

The decode method works the opposite of the encode method. In other words, the value converted to a character code such as Shift_JIS is returned to a normal character string that can be used on Python again (a character string that can be read normally by humans, such as cat dog). In the first argument, specify what character code the target value is.

txt = sjis_txt.decode('sjis')
print(txt)

Output contents of code execution result:

cat dog

References / Site Summary

-Difference between "function" and "method" -Right-justify, center-justify, left-justify strings / numbers in Python -List of string methods for manipulating case in Python -Format conversion with Python, format (0 padding, exponential notation, hexadecimal number, etc.) -Judgment / confirmation whether the character string is a number, an alphabet, or an alphanumeric character in Python -Remove whitespace for Python strip, lstrip, rstrip -Replace string with Python (replace, translate, re.sub, re.subn) -Zero padding of character strings / numbers in Python -Search for Python characters and return value (find / index) -Str.translate () is convenient for character conversion

Recommended Posts

Various Python built-in string operations
Various character string operations
Built-in python
Python string
Python: String concatenation
Python string format
python string slice
Python built-in object
Python built-in object
# 3 [python3] Various operators
Python2 string type
Python string format
Python # string type
Python string inversion
[Python] Chapter 02-05 Basics of Python programs (string operations / methods)
Various Numpy operations (correction)
File operations in Python
[Python] File / directory operations
String manipulation in python
[Python] Multi-line string assignment
Python string manipulation master
File operations in Python
[Python2] Date string-> UnixTime-> Date string
Random string generation (Python)
Summary of string operations
Python built-in functions ~ Zip ~
Python3> documentation string / docstring
Wrap Python built-in functions
Python string processing illustration
Various processing of Python
I tried to summarize the string operations of Python
[python] Convert date to string
Delete various whitespace characters [Python]
Summary of python file operations
Summary of Python3 list operations
Python indentation and string format
Four arithmetic operations in python
String object methods in Python
[Python] Use a string sequence
About various encodings of Python 3
Manipulate various databases with Python
Wrapping git operations in Python
[Python 2/3] Parse the format string
About Python string comparison operators
String date manipulation in Python
Python f character (formatted string)
String format with Python% operator
About Python and os operations
Perform Scala-like collection operations in Python
[Shell] Various patterns of string decomposition
Summary of various operations in Tensorflow
String replacement with Python regular expression
Python os related files, subdirectory operations
6 ways to string objects in Python
Python memo ① Folder and file operations
python string processing map and lambda
ORC, Parquet file operations in Python
Python --Symbols used when performing operations
Create a random string in Python
[Python] [Supplement] Chapter 04-09 Various data structures (set theory and operations in sets)