[PYTHON] Is the space replaced by a plus sign or% 20 in percent-encoding processing?

Overview

We investigated whether spaces were replaced with a plus sign or % 20 in percent-encoding in some languages.

When seeking the query string of a URL or the body of ʻapplication / x-www-form-urlencoded`, the HTML5 and URL Standard specifications require spaces to be converted to a plus sign (+).

On the other hand, the HTTP server treats both + and % 20 as spaces because there was no clear specification for percent-encoding. Server-side languages do not provide encoding methods or options for ʻapplication / x-www-form-urlencoded` and may substitute encoding methods according to the RFC 3986 specification.

In most cases, you don't need to know if you're replacing spaces with (+) or percent-encoding (% 20), but with OAuth it's a problem.

For OAuth 1.0, percent encoding (RFC 3986) is required for spaces, which results in an error for libraries that employ the plus sign encoding method. Twitter has published a commentary article about percent-encoding.

JavaScript

ʻURLSearchParams` is based on URL Standard. Spaces are replaced with a plus sign.

const params = new URLSearchParams();
params.append("msg", "hello world");
console.log("msg=hello+world" === params.toString());

ʻURLSearchParams also treats% 20` as a space. The results of the analysis are as follows.

const params2 = new URLSearchParams("msg=hello+world");
console.log("hello world" === params2.get("msg"));

const params3 = new URLSearchParams("msg=hello%20world");
console.log("hello world" === params3.get("msg"));

For ʻencodeURIComponent, replace the space with% 20`.

console.log("hello%20world" === encodeURIComponent("hello world"));

If you want to replace the space with a plus sign, add a call to replace.

const ret = encodeURIComponent("hello world").replace(/%20/g, '+')
console.log("hello+world" === ret);

decodeURIComponent replaces% 20 with a space, but not the plus sign.

console.log("hello world" === decodeURIComponent("hello%20world"));
console.log("hello+world" === decodeURIComponent("hello+world"));

Node.js

It's better to prefer ʻURLSearchParams`, which is supported from Node.js v7.0. Shows the previous method before v7.0.

The standard querystring.stringify module replaces spaces with% 20.

const querystring = require('querystring');

const ret = querystring.stringify({ msg: "hello world" });
console.log("msg=hello%20world" === ret);

querystring.parse replaces both+and% 20 with spaces.

const querystring = require("querystring");

console.log("hello world" === querystring.parse("msg=hello+world")["msg"]);
console.log("hello world" === querystring.parse("msg=hello%20world")["msg"]);

If you need an encoding that conforms to the RFC 3986 specification, introduce ljharb / qs.

const qs = require("qs");

console.log("msg=hello%20world" === qs.stringify({ msg: "hello world" }));
console.log("hello world" === qs.parse("msg=hello+world")["msg"]);
console.log("hello world" === qs.parse("msg=hello%20world")["msg"]);

Python 3

Use ʻurllib. ʻUrlencode defaults to a space + Replace with.

>>> from urllib.parse import urlencode
>>> urlencode({"msg": "hello world"})
'msg=hello+world'

If you specify quote_via = quote, the space will be replaced with% 20.

>>> from urllib.parse import urlencode, quote
>>> urlencode({"msg": "hello world"}, quote_via=quote)
'msg=hello%20world'

Both parse_qs and parse_qsl replace + and % 20 with spaces.

>>> from urllib.parse import parse_qs
>>> parse_qs("msg=hello+world")
{'msg': ['hello world']}
>>> parse_qs("msg=hello%20world")
{'msg': ['hello world']}

Go

Use the net / url module. ʻEncode of type ʻurl.Values replaces spaces with+. ParseQuery replaces+and% 20 with spaces.

package main

import (
	"fmt"
	"net/url"
)

func main() {
	v := url.Values{}
	v.Set("msg", "hello world")
	fmt.Println(v.Encode())
  // msg=hello+world


  m, _ := url.ParseQuery("msg=hello+world")
  fmt.Println(m["msg"][0])
  // hello world

  m2, _ := url.ParseQuery("msg=hello%20world")
  fmt.Println(m2["msg"][0])
}

QueryEscape converts spaces to+, while PathEscape converts them to % 20. In addition, PathUnescape does not convert+to spaces.

func main() {
  fmt.Println("QueryEscape")
  fmt.Println(url.QueryEscape("hello world"))
  // hello+world
  fmt.Println(url.QueryUnescape("hello+world"))
  // hello world
  fmt.Println(url.QueryUnescape("hello%20world"))
  // hello world

  fmt.Println()
  fmt.Println("PathEscape")
  fmt.Println(url.PathEscape("hello world"))
  // hello%20world
  fmt.Println(url.PathUnescape("hello+world"))
  // hello+world
  fmt.Println(url.PathUnescape("hello%20world"))
  // hello world
}

Recommended Posts

Is the space replaced by a plus sign or% 20 in percent-encoding processing?
Check if the string is a number in python
It is blocked by Proxy, a connection error occurs in Python or pip, and it is retried.
__init__ called by wxPython or Tkinter was a __init__ call of the inheriting class in Python
Natural language processing with Word2Vec developed by a researcher in the US google (original data)
Analysis by Bayesian inference (1) ... Which is better, A or B?
Is there a bias in the numbers that appear in the Fibonacci numbers?
Create a new list by combining duplicate elements in the list
The image is a slug
Python will fail if there is a space after the backslash
Play a sound in Python assuming that the keyboard is a piano keyboard
Anyway, the fastest serial communication log is left in a file
Delete a particular character in Python if it is the last
Read the standard output of a subprocess line by line in Python
A function that measures the processing time of a method in python
Animate what happens in frequency space when the Nyquist frequency is exceeded
[Golang] Check if a specific character string is included in the character string
How to input a character string in Python and output it as it is or in the opposite direction.