We have summarized the basic string operations in Python. It may not be enough for experienced people, but ...
(Addition 2018.12.23: The grammar of print
is made compatible with Python3. If the code copied and pasted in Python2 does not work, please put from __future__ import print_function
at the beginning of the code side.)
Python strings are immutable, so even if you want to partially rewrite them,
It will be assembled as a new string object.
For example, a method called replace
that performs a string replacement returns another string object with the replaced content.
Use the +
operator for concatenation.
a = 'Python'
b = '2.7'
c = a + b
print(c) # => 'Python2.7'
Since it is processed in order, it is ok even if you connect a lot
a = 'Python'
b = ' is '
c = 'fancy'
print(a + b + c) # => 'Python is fancy'
There is also a technique to join using the join
method and list / tuple.
As an aside, Ruby's join
is an Array method (takes a concatenated string as an argument), and Python's join
is a str method (takes a list / tuple as an argument). Experienced people need to be careful.
strings = ['dog', 'cat', 'penguin']
print(','.join(strings)) #=> 'dog,cat,penguin'
If you repeat the same content, give an integer with the *
operator and a string will be generated that repeats the specified number of times.
s = 'dog?'
print(s * 3) #=> 'dog?dog?dog?'
There are three ways to expand the value of a variable into a string. Maybe there are others just I don't know.
'% s,% s'% ('Hello','World')
'% (a) s,% (b) s'% dict (a ='Hello', b ='World')
'{0}, {1}'. format ('Hello','World')
(Note) I'm not sure about the exact name of the second one, but I decided to call it the extended sprintf style without permission.
If you give a value or list / tuple to a string with the %
operator, you can expand it as follows.
a = 'Python'
b = 'a programming language'
print('%s is %s' % (a, b)) # => 'Python is a programming language'
c = 'World'
print('Hello, %s!' % c) # => 'Hello, World!'
You need as many values to give as there are expansion symbols (such as % s
) in the string. It cannot be more or less. If there is one expansion symbol, the value after% does not need to be list / tuple. (It is also expanded by one element list / tuple) In the above example, the template string of the first print statement contains two expansion symbols % s
, so the number of tuple elements given by the value after% There are also two. If you want to keep the character %
as a character in the template string, use %%
and **'%' 2 characters **.
There are the following format specifiers. If you are not sure, it may be a good idea to set it to % s
for the time being. I would like to give the explanation of how to write the format specifier to printf's wikipedia page.
--% s
--Expand as a string
--% d
--Expand as an integer
--% f
--Expand as decimal point
When you want to expand tuples and lists as strings like '(1, 2, 3)'
tuple_var = (1, 2, 3)
print('tuple_var is: %s' % (tuple_var,))
If you don't do this, you'll get angry like there's only one placeholder to replace, even though there are three!
Specify the key of the dict object in parentheses after the %
of the format string, and specify the dict object on the right side of the %
operator for the format string. This is useful if you already have a dict variable when embedding the same value repeatedly.
v = dict(first='Michael', family='Jackson')
print('He is %(first)s, %(first)s %(family)s.' % v)
You can use the template language dedicated to the format
method by using the format
method.
print('{0}, {1}'.format('Hello', 'World')) #=> 'Hello, World'
For details, see Format specification mini language specification.
s = 'Today is Monday.'
ss = s.replace('Monday', 'Sunday') #=> 'Today is Sunday.'
print(ss)
s2 = 'Hello Hello'
ss2 = s2.replace('Hello', 'Bye') #=> 'Bye Bye'If you do not specify the third argument, everything is replaced
print(ss2)
s3 = 'World World'
ss3 = s3.replace('World', 'Hello', 1) #=> 'Hello World' #Specify the number to replace with the third number
print(ss3)
Use the sub method of re (regular expression) package to replace the character string according to a certain pattern.
import re
s = 'Hello World'
print(re.sub(r"[a-z]", "A", s)) #=> 'HAAAA WAAAA'
s = 'abc'
n = 1 # 'a'Want
print(s[n-1]) #0 Get characters at base index
s2 = 'xyz'
print(s[-1]) # 'z'Last character
s = "This is a pen."
n = 1
m = 4
print(s[n-1:n-1+m]) # 'This'
print(s[0:4]) # 'This'
print(s[-4:-1]) # 'pen'
Use find
. If you want to search backwards, you can use rfind
.
find returns the string position starting from 0 if the string is found, or -1 if not found.
s = 'abcabcabc'
index = s.find('b') #index is 1(2nd character)
You can specify the position to start the search with the second argument.
s = 'abcabcabc'
index = s.find('b', 2) #index is 4(5th character)
You can find all the targets in the string with the following code.
s = 'abcabcabc'
target = 'b'
index = -1
while True:
index = s.find(target, index + 1)
if index == -1:
break
print('start=%d' % index)
Since the string type is also an iterator, it can be processed with for as follows. If you want a list of characters, you can use list (strvalue)
.
for c in 'aiueo':
print(c)
print(list('hoge')) # => ['h', 'o', 'g', 'e']
There may be a way to retrieve while referring to the characters in the index.
s = 'aiueo'
for i in range(len(s)):
c = s[i]
print(c)
You can use strip
, lstrip
, and rstrip
.
strip is a character string with spaces, tab characters, and line breaks (\ r and \ n) removed from both ends.
lstrip applies the same processing as strip to only the left end,
rstrip returns the same processing as strip applied only to the right end.
s = ' x '
print('A' + s.strip() + 'B') # => 'AxB'
print('A' + s.lstrip() + 'B') # => 'Ax B'
print('A' + s.rstrip() + 'B') # => 'A xB'
It seems that you can use rstrip
. However, if there are two patterns with a space and a line break at the end and you want to delete only the line break, you need to specify the character to be deleted with the argument.
line = 'hoge\n'
msg = line.rstrip() + 'moge'
print(msg) # => 'hogemoge'
with open('./test.txt') as fh:
for line in fh:
no_line_break_line = line.rstrip()
#Do something
#Delete only line breaks without removing spaces
line_with_space = 'line \n' #I don't want to remove the whitespace before the line break
print(line_with_space.rstrip('\n')) # => 'line '
ʻUse the upper ()` method.
print('hello'.upper()) # => 'HELLO'
Use the lower ()
method.
print('BIG'.lower()) # => 'big'
s = 'abc'
print('b' in s) #=> True
print('x' in s) #=> False
You can do it yourself using the find
method that came out earlier, but there is a convenient method called count
.
s = 'aaabbc'
print(s.count('b')) #=> 2
v = 1
print(str(v))
print('%d' % v)
f = 1.234
print(str(f)) #=> '1.234'
print('%f' % f) #=> '1.234000'
There are times when you want to express it as a character string in a debug print, etc.
v = [1,2,3]
print(str(v)) #=> '[1, 2, 3]'
print('%s' % v) #=> '[1, 2, 3]'
If you try to display one tuple with % s
, Python will interpret the given tuple as a list of values for the template and you will get an error.
v = (1, 2, 3)
print(str(v)) #=> '(1, 2, 3)'Good example
print('%s' % v) #=> '(1, 2, 3)'I expect, but I get a TypeError
print('%s' % (v,)) #=> '(1, 2, 3)'Good example
It is also good to try assembling using join
etc.
v = [1,2,3]
print('<' + ('/'.join([ str(item) for item in v ])) + '>') #=> '<1/2/3>'
The same is true for tuple
objects.
There are times when you want to express it as a character string in a debug print, etc.
v = dict(a=1, b=2)
print(str(v)) #=> "{'a': 1, 'b': 2}"
print('%s' % v) #=> "{'a': 1, 'b': 2}"
You can also use keys
, list comprehensions, and join
to generate strings in one liner.
v = dict(a=1, b=2)
print('<' + ', '.join([ '%s=%s' % (k, v[k]) for k in v.keys() ]) + '>') #=> '<a=1, b=2>'
The data read from a file or socket (opened in binary mode) is a byte string as it is, so if you do not interpret it as a unicode string, you will not be able to operate in character units. In Python2 series (2.7 etc.), str (byte string) and unicode (character string) are distinguished, and it is better to treat the character string as a unicode object in the scene where multibyte characters are expected for input such as Web application. .. Use the decode ()
method to interpret a byte string as a unicode string with the encoding specified.
In Python3 series, str
type is a character string type (corresponding to Python2 series unicode type), and bytes
type is a byte string type (corresponding to Python2 series str type).
with open('utf8_content_file.txt', 'rb') as fh: #Binary mode because it is rb
byte_content = fh.read() #Read all,Byte sequence at this point
print len(byte_content) #Number of bytes
unicode_string = byte_content.decode('utf-8') # utf-Interpreted as a sequence of characters with 8 encodings
print len(unicode_string) #word count
The default encoding of the decode ()
method is ʻutf-8`, so if you know that the byte string to be interpreted is UTF-8, you can omit the encoding.
bytes_data = b'\xe3\x83\x90\xe3\x82\xa4\xe3\x83\x88\xe5\x88\x97'
print(bytes_data.decode()) # => 'Byte sequence'
The encodings that are often used in Japanese are listed below.
--ʻUtf_8 UTF-8 (Also known as: ʻutf-8
ʻU8 ʻutf8
cp65001
)
--shift_jis
Shift JIS (also known as csshiftjis
shiftjis`` sjis
s_jis
)
--cp932
Shift JIS (Extended Shift JIS) (Also known as: 932
ms932`` mskanji
mks-kanji
)
--ʻEuc_jp EUC-JP (Also known as: ʻeucjp
ʻujis ʻu-jis
)
--ʻIso2022_jpJIS (ISO-2022-JP) (Also known as:
csiso2022jp ʻiso2022jp
ʻiso-2022-jp`)
Other encodings supported by Python can be found on the codecs
package page: https://docs.python.org/ja/3/library/codecs.html
Conversely, when writing to a file or socket (opened in binary mode), the string must be a byte string. In that case, use the ʻencode ()` method of the unicode object.
unicode_string = u'String of multibyte characters'
with open('./utf8_content_file.txt', 'wb') as fh: #writing+Open in binary mode
byte_content = unicode_string.encode('utf-8') # utf-Get the byte string when expressed in 8 encoding
fh.write(byte_content) #Write byte string
If you don't pass the encoding for the ʻencode () method as well, it behaves as if you were passing ʻutf-8
.
str_data = 'Byte sequence'
print(str_data.encode()) # => b'\xe3\x83\x90\xe3\x82\xa4\xe3\x83\x88\xe5\x88\x97'
The template engine is so feature-rich that we'll only cover a few major libraries here.
Is jinja2 the most major?
-7.1. String — General string operations (Python2.7) -6.1. string — General string operations (Python3.5.1)
Recommended Posts