When I was looking it up, it turned out to be a miscellaneous memo. Various character string division.
It is a miscellaneous sentence.
See the URL for how to use it. https://kagamihoge.hatenablog.com/entry/2017/01/03/225054 I thought about speed measurement and comparing the coding contents, but since it already existed in Convert CamelCase and snake_case to each other, refer to that. Therefore, refer only to the source code below.
Google Guava
https://github.com/google/guava/blob/master/guava/src/com/google/common/base/CaseFormat.java#L84
private static String firstCharOnlyToUpper(String word) {
return word.isEmpty()
? word
: Ascii.toUpperCase(word.charAt(0)) + Ascii.toLowerCase(word.substring(1));
}
Commons Lang
https://github.com/apache/commons-lang/blob/master/src/main/java/org/apache/commons/lang3/StringUtils.java#L7539
private static String[] splitByCharacterType(final String str, final boolean camelCase) {
if (str == null) {
return null;
}
if (str.isEmpty()) {
return ArrayUtils.EMPTY_STRING_ARRAY;
}
final char[] c = str.toCharArray();
final List<String> list = new ArrayList<>();
int tokenStart = 0;
int currentType = Character.getType(c[tokenStart]);
for (int pos = tokenStart + 1; pos < c.length; pos++) {
final int type = Character.getType(c[pos]);
if (type == currentType) {
continue;
}
if (camelCase && type == Character.LOWERCASE_LETTER && currentType == Character.UPPERCASE_LETTER) {
final int newTokenStart = pos - 1;
if (newTokenStart != tokenStart) {
list.add(new String(c, tokenStart, newTokenStart - tokenStart));
tokenStart = newTokenStart;
}
} else {
list.add(new String(c, tokenStart, pos - tokenStart));
tokenStart = pos;
}
currentType = type;
}
list.add(new String(c, tokenStart, c.length - tokenStart));
return list.toArray(ArrayUtils.EMPTY_STRING_ARRAY);
}
ModeShape
https://github.com/HexarA/Json2Pojo/blob/master/src/org/jboss/dna/common/text/Inflector.java#L325
public String underscore( String camelCaseWord,
char... delimiterChars ) {
if (camelCaseWord == null) return null;
String result = camelCaseWord.trim();
if (result.length() == 0) return "";
result = result.replaceAll("([A-Z]+)([A-Z][a-z])", "$1_$2");
result = result.replaceAll("([a-z\\d])([A-Z])", "$1_$2");
result = result.replace('-', '_');
if (delimiterChars != null) {
for (char delimiterChar : delimiterChars) {
result = result.replace(delimiterChar, '_');
}
}
return result.toLowerCase();
}
Java Case Converter
https://github.com/toolpage/java-case-converter https://en.toolpage.org/cat/case-converter https://github.com/toolpage/java-case-converter/blob/master/src/org/toolpage/util/text/CaseConverter.java#L121
public static String convertToSnakeCase(String value) {
String throwAwayChars = "()[]{}=?!.:,-_+\\\"#~/";
value = value.replaceAll("[" + Pattern.quote(throwAwayChars) + "]", " ");
value = CaseConverter.convertToStartCase(value);
return value.trim().replaceAll("\\s+", "_");
}
Netbeans Case Converter NetBeans plugin https://github.com/eviweb/netbeans-case-converter
When I was researching English-speaking traditions, I suddenly remembered Creating a perfect Yubaba with Name Divider as a catalyst.
NameDivider https://internet.watch.impress.co.jp/docs/yajiuma/1289735.html https://github.com/rskmoi/namedivider-python https://github.com/rskmoi/NameDivider
example:
-----------------------------------------------------
>>> namedivider = NameDivider()
>>> divided_name = namedivider.divide_name("Yoshihide Suga")
>>> print(divided_name)
"Yoshihide Suga"
>>> print(divided_name.to_dict())
{'family': 'Suga', 'given': 'Yoshihide', 'separator': ' ', 'score': 0.6328842762252201, 'algorithm': 'kanji_feature'}
-----------------------------------------------------
"""
name-divider When I was looking at Github, a few similar things came out, so here too. https://github.com/iszk/name-divider
When I thought about it, I wondered if anyone was trying to find a Japanese address, but it already existed. There are pioneers everywhere.
Extreme sports that divide the address into "prefectures/cities/after" with as short a regular expression as possible Divided the address into "prefectures", "city", and "afterwards" [City with the characters "prefectures" (https://uub.jp/zat/todofukenmoji.html)
First, put the final result for those who are "difficult to read"
(...??[Prefectures])((?:Asahikawa|Date|Ishikari|Morioka|Oshu|Tamura|Minamisoma|Nasushiobara|Higashimurayama|Musashimurayama|Hamura|Tokamachi|Joetsu|Toyama|Nonoichi|Omachi|Gamagori|Yokkaichi|Himeji|Yamatokoriyama|Hatsukaichi|Kudamatsu|Iwakuni|Tagawa|Omura)city|.+?county(?:Tamamura|Omachi|.+?)[Towns and villages]|.+?city.+?Ward|.+?[cityWardTowns and villages])(.+)
After investigating the above, I wondered if it is also in Chinese, but at most I found it below.
In the Shoji Ruins, Chinese surname, surname, Japanese name, last name, Japanese first name. (Split the Chinese name in the address book into last name and first name and save it as last name and first name.)
https://github.com/chengyin/chinese-contact-name-separator
mingpipe is the name matcher of the Chinese name. Take two names and predict if they can refer to the same entity (person, organization, or location).
Example: 轛 罗 伦 萨 (Florence) Jade Cold Midori (Philippines) true
https://github.com/hltcoe/mingpipe
Today's survey is over with the following as a punch line.
https://github.com/derek73/python-nameparser https://github.com/derek73/python-nameparser/issues/83
The parser seems to parse incorrectly for Chinese names in English. (below uses Malaysia's Chinese name) Names without nickname. Current:
>>> name = HumanName('Tham Jun Hoe')
>>> name
<HumanName : [
title: ''
first: 'Tham'
middle: 'Jun'
last: 'Hoe'
suffix: ''
nickname: ''
]>
Expected:
>>> name = HumanName('Tham Jun Hoe')
>>> name
<HumanName : [
title: ''
first: 'Jun Hoe'
middle: ''
last: 'Tham'
suffix: ''
nickname: ''
]>
Recommended Posts