*** Addendum *** There was something wrong with the article, so please read the additional part below the text.
One day I was writing a web app on Google App Engine (Python).
There are times when you want to determine the characters that can be used in a password in a Web service, right?
When I looked it up, I heard that I could check if it was alphanumeric with string.isalnum (), so I used that.
But it's funny.
password = 'abc123'
password.isalnum()  #--> True
password = u'Ah'
password.isalnum()  #-->True eh?
Apparently, double-byte characters are judged to be True by the isalnum method. Terrible...
I have no choice but to use regular expressions. By the way, I prepared various things.
import re
#Half-width lowercase letters
lowerReg = re.compile(r'^[a-z]+$')
def islower(s):
    return lowerReg.match(s) is not None
#Half-width capital letters
upperReg = re.compile(r'^[A-Z]+$')
def isupper(s):
    return upperReg.match(s) is not None
#Alphabetic characters
alphaReg = re.compile(r'^[a-zA-Z]+$')
def isalpha(s):
    return alphaReg.match(s) is not None
#Half-width numbers
digitReg = re.compile(r'^[0-9]+$')
def isdigit(s):
    return digitReg.match(s) is not None
#Half-width alphanumeric characters
alnumReg = re.compile(r'^[a-zA-Z0-9]+$')
def isalnum(s):
    return alnumReg.match(s) is not None
#Half-width alphanumeric characters or underscore
alnum_Reg = re.compile(r'^[a-zA-Z0-9_]+$')
def isalnum_(s):
    return alnum_Reg.match(s) is not None
#Half-width symbol
symbolReg = re.compile(r'^[!-/:-@[-`{-~]+$')
def issymbol(s):
    return symbolReg.match(s) is not None
#ASCII characters
asciiReg = re.compile(r'^[!-~]+$')
def isascii(s):
    return asciiReg.match(s) is not None
isalnum('abc123')  #--> True
isalnum(u'Ah')  #--> False
r'^ [...] + $' to r'^ [...] * $'.I felt that there was not enough information, so make a note of it.
...
It doesn't seem to have anything to do with this, but Python3 isn't currently available on Google App Engine (up to Python 2.7). Therefore, there are two types of strings, Unicode and str (to be exact, byte strings), which is inconvenient ...
I had you point out in the comment.
It seems that the isalnum method doesn't work for unicode strings. Encoding to UTF8 works as expected.
In addition to the isalnum method, you can use isalpha (alphabetic), isdigit (number), islower (lowercase), and isupper (uppercase).
There are no methods for other judgments (such as ASCII judgments), so use regular expressions in the body.
For str strings the isalnum method works correctly.
However, if you handle double-byte characters such as Japanese in the str character string, it is more convenient to use the unicode character string because you cannot get the number of characters as you want. In that case, convert it to str and check.
u'Ah'.encode('utf-8').isalnum()  #--> False
In the Python 3.x series, str and unicode are integrated, and although it is named str, it is actually unicode. Therefore, even ordinary string literals need to be encoded in utf-8.
'Ah'.encode('utf-8').isalnum()  #--> False
*** Added above ***
Recommended Posts