[PYTHON] [Note] Significance that __unicode__ is necessary when defining a Django model class

background

I'm making a blog by practicing Django, and I defined the following model according to the textbook. At that time, I didn't understand the significance of unicode, so I summarized it.

class Blog(models.Model):
	title = models.CharField(max_length=100, unique=True)
	slug = models.SlugField(max_length=100, unique =True)
	body = models.TextField()
	posted = models.DateField(db_index=True,auto_now_add=True)
	category = models.ForeignKey('blog.Category')

	def __unicode__(self):
		return '%s' % self.title

What is unicode?

It seems that many people know it, but of course I am a beginner and I don't know. I'm a self-proclaimed "Guguru Kas", so Kas googled like Kas.

Unicode is one of the means to handle millions of languages in a unified way on a computer.

There are many ways computers can understand natural language, to give an example:

I'm confused.

So, these methods are like encoding, which is a device for the computer to understand, but it seems that it depends on each method.

So, of course, if you handle a character string with ordinary python, it will be encoded, but since there are many backgrounds of people who read on the web, and each is exchanged with a computer in natural language using different methods, It seems that it will be messed up depending on the environment of the reader.

A little more detail

For example, suppose the letter A is encoded in ASCII and stored on a computer. Then, let's think about what happens when the caller's environment is utf-8.

Character A → ASCII → character code= 123456

It is assumed that it is saved with the code such as.

So, if the caller's environment is utf-8, ...

Character code= 123456 → utf-8 → Letter B

A different character string is returned. In the end, the same character string does not exist, and the entire web page becomes ??????????????.

What are the problems with unicode?

I came to this point and thought, "Should I handle all character codes with unicode?", But it seems that the world is not so convenient. Heaven may give two things, but apparently the human world is always a trade-off.

unicode has abandoned the ability to ** show to humans ** in exchange for gaining the ability to be a unified standard for computers

It looks like. Or rather, if it doesn't, Tsuji will not match, so I decided to understand it tentatively. (So the title is a memo.)

How are character codes handled by Django?

Let's get back to Django.

In Django, all the information seems to be exchanged in unicode within the framework. So, unicode is a little skipped when showing it to humans, so in order to fix it, it is necessary to write unicode in the model.

For the time being, the article is over in a situation full of rushes, but for the time being, I will make a paragraph and move on.

I know what I need to know (should)

Recommended Posts

[Note] Significance that __unicode__ is necessary when defining a Django model class
A note when gcloud is broken
Note that admin.py is not reflected immediately when running Django with WSGIDaemonProcess
Create a fake class that also cheats is instance
The story of a Django model field disappearing from a class
When using property, use a class that inherits object (new-style class)