This is a summary of the challenge records of 100 Language Processing Knock 2015.
: warning: ** This is not a challenge record of 100 Language Processing Knock 2020. The old 2015 version is the target. Please note: bangbang: **
Ubuntu 16.04 LTS + Python 3.5.2 : : Anaconda 4.1.1 (64-bit). (Only Problem 00 and Problem 01 are Python 2.7.)
Review some advanced topics in programming languages while working on subjects dealing with text and strings.
Link to post | What I learned mainly, what I learned in the comments, etc. |
---|---|
Problem 00 | slice,print() |
Problem 01 | slice |
Problem 02 | Anaconda、zip() 、itertools.zip_longest() ,Beforeiterable* Ifyouadd,itwillbeseparatedintoarguments,str.join() 、functools.reduce() |
Problem 03 | len() 、list.append() 、str.split() 、list.count() |
Problem 04 | enumerate() 、Python3.Hashes are randomized by default after 3 |
Problem 05 | n-gram、range() |
Problem 06 | set() 、set.union() 、set.intersection() 、set.difference() |
Problem 07 | str.format() 、string.Template 、string.Template.substitute() |
Problem 08 | chr() 、str.islower() 、input() , Ternary operator |
Problem 09 | Typoglycemia、random.shuffle() |
Experience useful UNIX tools for research and data analysis. Through these reimplements, you will experience the ecosystem of existing tools while improving your programming skills.
Link to post | What I learned mainly, what I learned in the comments, etc. |
---|---|
Problem 10 | [UNIXcommands]man Japaneselocalization,open() , Shell script,[UNIX commands]wc ,chmod , File execute permission |
Problem 11 | str.replace() 、[UNIX commands]sed 、tr 、expand |
Problem 12 | io.TextIOBase.write() 、[UNIX commands]cut ,diff 、UNIX commandsの短いオプションと長いオプション |
Problem 13 | [UNIXcommands]paste 、str.rstrip() , Python definition of "whitespace" |
Problem 14 | [UNIX commands]echo ,read ,head |
Problem 15 | io.IOBase.readlines() 、[UNIX commands]tail |
Problem 16 | [UNIXcommands]split 、math.ceil() 、str.format() 、// Can be truncated and divided by |
Problem 17 | set.add() 、[UNIX commands]cut ,sort ,uniq |
Problem 18 | Lambda expression |
Problem 19 | Listcomprehension,itertools.groupby() 、list.sort() |
By applying regular expressions to the markup description on Wikipedia pages, various information and knowledge can be extracted.
Link to post | What I learned mainly, what I learned in the comments, etc. |
---|---|
Problem 20 | JSONmanipulation,gzip.open() 、json.loads() |
Problem 21 | Regularexpression,rawstringnotation,raise 、re.compile() 、re.regex.findall() |
Problem 22 | [Regular expressions]Greedy match,Non-greedy match |
Problem 23 | [Regular expressions]Back reference |
Problem 24 | |
Problem 25 | [Regularexpressions]Affirmativelook-ahead,sorted() |
Problem 26 | re.regex.sub() |
Problem 27 | |
Problem 28 | |
Problem 29 | Useofwebservices,urllib.request.Request() 、urllib.request.urlopen() 、bytes.decode() |
Apply the morphological analyzer MeCab to Natsume Soseki's novel "I Am a Cat" to obtain the statistics of the words in the novel.
Link to post | What I learned mainly, what I learned in the comments, etc. |
---|---|
Problem 30 | conda、pip、apt、[MeCab]Installation,How to use, morphological analysis, generator,yield |
Problem 31 | [Morphological analysis]Surface type |
Problem 32 | [Morphological analysis]Prototype / basic form, list comprehension |
Problem 33 | [Morphological analysis]Noun of s-irregular connection, inclusion notation of double loop list |
Problem 34 | |
Problem 35 | [Morphological analysis]Noun articulation |
Problem 36 | collections.Counter 、collections.Counter.update() |
Problem 37 | [matplotlib]Installation,bar graph,Japanese display,Axis range,Grid display |
Problem 38 | [matplotlib]histogram |
Problem 39 | [matplotlib]Scatter plot, Zipf's law |
Apply the dependency analyzer CaboCha to "I am a cat" and experience the operation of the dependency tree and syntactic analysis.
Link to post | What I learned mainly, what I learned in the comments, etc. |
---|---|
Problem 40 | [CaboCha]Installation,Howtouse,__str__() 、__repr__() 、repr() |
Problem 41 | [Dependency analysis]Phrase and dependency |
Problem 42 | |
Problem 43 | |
Problem 44 | [pydot-ng]Installation,How to check the source of directed graphs and modules made in Python |
Problem 45 | [Dependency analysis]Case,[UNIX commands]grep |
Problem 46 | [Dependency analysis]Case frame / case grammar |
Problem 47 | [Dependency analysis]Functional verb |
Problem 48 | [Dependency analysis]Path from noun to root |
Problem 49 | [Dependency analysis]Dependency path between nouns |
An overview of various basic technologies of natural language processing through English text processing using Stanford Core NLP.
Link to post | What I learned mainly, what I learned in the comments, etc. |
---|---|
Problem 50 | generator |
Problem 51 | |
Problem 52 | Stem, stemming, snowball stemmer: how to use |
Problem 53 | [StanfordCoreNLP]Installation,Howtouse,subprocess.run() ,XMLparsing,xml.etree.ElementTree.ElementTree.parse() 、xml.etree.ElementTree.ElementTree.iter() |
Problem 54 | [StanfordCoreNLP]Partofspeech,Lemma,XMLparsing,xml.etree.ElementTree.Element.findtext() |
Problem 55 | [StanfordCoreNLP]Namedentity,XPath,xml.etree.ElementTree.Element.iterfind() |
Problem 56 | [Stanford Core NLP]Co-reference |
Problem 57 | [Stanford Core NLP]Dependent,[pydot-ng]Directed graph |
Problem 58 | [Stanford Core NLP]subject,predicate,Object |
Problem 59 | [StanfordCoreNLP]Phrasestructureanalysis,S-expression,recursivecall,sys.setrecursionlimit() 、threading.stack_size() |
Learn how to build and search databases using Key Value Store (KVS) and NoSQL. We will also develop a demo system using CGI.
Link to post | What I learned mainly, what I learned in the comments, etc. |
---|---|
Problem 60 | [LevelDB]Installation,Howtouse,str.encode() 、bytes.decode() |
Problem 61 | [LevelDB]Search,Unicodecodepoint,ord() |
Problem 62 | [LevelDB]Enumeration |
Problem 63 | JSONmanipulation,json.dumps() |
Problem 64 | [MongoDB]Installation,How to use,Interactive shell,Bulk insert,index |
Problem 65 | [MongoDB]Search,Handling of types not found in ObjectId and JSON format conversion tables |
Problem 66 | |
Problem 67 | |
Problem 68 | [MongoDB]sort |
Problem 69 | Webserver,CGI,HTMLescaping,html.escape() 、html.unescape() 、[MongoDB]Search for multiple conditions |
Build a reputation analyzer (positive / negative analyzer) by machine learning. In addition, you will learn how to evaluate the method.
Link to post | What I learned mainly, what I learned in the comments, etc. |
---|---|
Problem 70 | [Machine learning]Automatic classification,label,Supervised learning / unsupervised learning |
Problem 71 | Stopwords, assertions,assert |
Problem 72 | [Machine learning]Feature |
Problem 73 | [NumPy]Installation,Matrix operation,[Machine learning]Logistic regression,Vectorization,Hypothetical function,Sigmoid function,Objective function,The steepest descent method,Learning rate and number of repetitions |
Problem 74 | [Machine learning]Forecast |
Problem 75 | [Machine learning]The weight of the feature,[NumPy]Get index of sorted results |
Problem 76 | |
Problem 77 | Correct answer rate, precision rate, recall rate, F1 score |
Problem 78 | [Machine learning]5-fold cross-validation |
Problem 79 | [matplotlib]Line graph |
Find the word context co-occurrence matrix from a large corpus and learn the vector that represents the meaning of the word. The word vector is used to find the similarity and analogy of words.
Link to post | What I learned mainly, what I learned in the comments, etc. |
---|---|
Problem 80 | Wordvectorization,bz2.open() |
Problem 81 | [Word vector]Dealing with compound words |
Problem 82 | |
Problem 83 | Objectserialization/serialization,pickle.dump() 、pickle.load() |
Problem 84 | [Wordvector]Wordcontextmatrix,PPMI(PositiveMutualInformation),[SciPy]Installation,Treatment of sparse matrices,Serialization,collections.OrderedDict |
Problem 85 | Principalcomponentanalysis(PCA),[scikit-learn]Installation,PCA |
Problem 86 | |
Problem 87 | Cosine similarity |
Problem 88 | |
Problem 89 | Additive composition, analogy |
Use word2vec to learn the vector that represents the meaning of the word, and evaluate it using the correct answer data. In addition, you will experience clustering and vector visualization.
Link to post | What I learned mainly, what I learned in the comments, etc. |
---|---|
Problem 90 | [word2vec]Installation,How to use |
Problem 91 | |
Problem 92 | |
Problem 93 | |
Problem 94 | |
Problem 95 | Spearman's rank correlation coefficient, dynamic member addition to instances,** Exponentiation |
Problem 96 | |
Problem 97 | Classification, clustering, K-Means、[scikit-learn]K-Means |
Problem 98 | Hierarchical clustering, Ward's method, dendrogram,[SciPy]Ward method,Dendrogram |
Problem 99 | t-SNE、[scikit-learn]t-SNE、[matplotlib]Labeled scatter plot |
It took 8 months, but I managed to withstand 100 knocks. I am very grateful to Dr. Okazaki for publishing such a wonderful issue with a data corpus.
Also, I was really encouraged by the comments, editing requests, likes, stocks, follow-ups, and introductions on blogs and SNS. Thanks to everyone for continuing to the end. Thank you very much.
I hope that the article you posted will be helpful to those who follow.
Recommended Posts