I tried to develop a web application using Watson Developer Cloud as a material for the in-house presentation, so it will be a summary and a memorandum.
If you enter a text or talk and ask a question, Watson will introduce you to the recommended restaurants from among the restaurants with a radius of about 1 km from a certain point. You can ask questions such as "I want to eat karaage set meal" and "Fish-based izakaya with a budget of up to 4000 yen".
The shops are displayed in a list format. The more ★, the more recommended restaurants. You can pin the shops you are interested in to the list, so decide which shop to visit while checking the information on the Gurunavi website and the distance and directions from your current location.
If you go to the store (I've been there a long time ago, but it doesn't matter), please comment on your impressions. Search results will be improved by analyzing and learning the content of comments, such as shops that can be recommended to other people if they are "very delicious" and shops that cannot be recommended if "the clerk was having a quarrel".
At first, I thought about building everything on Bluemix, but the free frame of ClearDB wasn't there (5MB), and it also meant to prepare the development environment for my personal PC. ..
The REST API prepared on Node-RED is called from the client JS as appropriate to display search results and evaluate user comments.
Imagine the following operations.
Retrieve and Rank -I just used it the other day I'm on the right track, so I accidentally started a high availability cluster (= specify the size when creating the cluster) Created) I have developed it. ――After working for a few days, I noticed it in a hurry and recreated it, but I was charged about 8,000 yen. --Create it as follows (empty cluster_size)
curl -k -X POST -u "**username:**password**" "https://gateway.watsonplatform.net/retrieve-and-rank/api/v1/solr_clusters" -d "{\"cluster_size\":\"\",\"cluster_name\":\"WatsonRestaurantCluster\"}"
--The collection is as follows. --I'm just imitating my predecessors, but when I looked at the Tutorial when writing the article, I found that it was written differently (https://www.ibm.com/watson/developercloud/doc/retrieve-rank/configure.shtml). ) Was illustrated, so it may be wrong ...
schema.xml
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<field name="shop_id" type="string" indexed="false" stored="true" required="true" multiValued="false" />
<field name="vote_id" type="string" indexed="false" stored="true" required="true" multiValued="false" />
<field name="shop_name" type="string" indexed="false" stored="true" required="true" multiValued="false" />
<field name="shop_name_kana" type="string" indexed="false" stored="true" required="true" multiValued="false" />
<field name="menu_name" type="string" indexed="false" stored="true" required="true" multiValued="false" />
<field name="menu_name_kana" type="string" indexed="false" stored="true" required="true" multiValued="false" />
<field name="latitude" type="string" indexed="false" stored="true" required="true" multiValued="false" />
<field name="longitude" type="string" indexed="false" stored="true" required="true" multiValued="false" />
<field name="shop_url" type="string" indexed="false" stored="true" required="true" multiValued="false" />
<field name="image_url" type="string" indexed="false" stored="true" required="true" multiValued="false" />
<field name="pr_text" type="string" indexed="false" stored="true" required="true" multiValued="false" />
<field name="shop_text" type="watson_text_ja" indexed="false" stored="true" required="true" multiValued="false" />
<field name="budget" type="int" indexed="true" stored="true" required="true" multiValued="false" />
schema.xml
<fieldType name="watson_text_ja" indexed="true" stored="true" class="com.ibm.watson.hector.plugins.fieldtype.WatsonTextField">
<analyzer type="index">
<tokenizer class="solr.JapaneseTokenizerFactory" mode="search" userDictionary="lang/userdict_ja.txt" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true" tokenizerFactory="solr.JapaneseTokenizerFactory" userDictionary="lang/userdict_ja.txt"/>
<filter class="solr.JapaneseBaseFormFilterFactory"/>
<filter class="solr.CJKWidthFilterFactory"/>
<filter class="solr.JapaneseKatakanaStemFilterFactory" minimumLength="4"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.JapaneseTokenizerFactory" mode="search" userDictionary="lang/userdict_ja.txt" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true" tokenizerFactory="solr.JapaneseTokenizerFactory" userDictionary="lang/userdict_ja.txt"/>
<filter class="solr.JapaneseBaseFormFilterFactory"/>
<filter class="solr.JapanesePartOfSpeechStopFilterFactory" tags="lang/stoptags_ja.txt"/>
<filter class="solr.CJKWidthFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ja.txt"/>
<filter class="solr.JapaneseKatakanaStemFilterFactory" minimumLength="4"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
-Generate about 1500 cases based on GourNavi API
rar_documents.json
{
"id" : "5497472",
"shop_id" : "5497472",
"vote_id" : "",
"shop_name" : "Tenkaippin Gotanda store",
"shop_name_kana" : "Tenkaippinn Gotandaten",
"menu_name" : "",
"menu_name_kana" : "",
"latitude" : "35.624377",
"longitude" : "139.723394",
"shop_url" : "http://r.gnavi.co.jp/b5tzzw2g0000/",
"image_url" : "",
"pr_text" : "It is a soup made by boiling chicken and several kinds of ingredients over time. It is full of collagen that is good for beauty and health.",
"shop_text" : "Tenkaippin Gotanda store. Tenkaippinn Gotandaten. Ramen noodle dishes and others. Rich in collagen, which is good for beauty and health, it is a soup that you can never taste anywhere else. In addition, a set meal that includes half fried rice and Chinese soba. Gyoza set meal that includes gyoza, half rice, and Chinese soba. We also have a wide variety of menus such as half-fried rice, gyoza, and service set meals that include Chinese soba.",
"budget" : -1
}
--Create about 750 items at the initial migration stage (no learning data generated when the user uses the system)
rar_training.csv
"%E3%83%AF%E3%83%83%E3%83%91%E3%83%BC","6364602.681550","2"
"%E3%82%BF%E3%82%A4%E3%82%AC%E3%83%91%E3%82%AA","7255599","1","7255599.4618610","3"
"%E5%A1%A9%E3%83%AC%E3%83%A2%E3%83%B3%E3%82%AC%E3%83%91%E3%82%AA","e584801.1192601","2","6408790.4601796","2","geyc200.4614278","4","g044108.4609358","4","6085706.1451291","1"
--I used Kuromoji neologd to generate the dictionary, but I can recognize various words compared to Kuromoji, and this is I felt that it could be used for free. ――The name of the shop was obtained from the Gurunavi API along with the reading, so I am using it.
userdict_ja.txt
Tenkaippin,Tenkaippin,Tenkaippin,Custom noun
Chicken taste bird,Chicken taste bird,Toridori Midori,Custom noun
Fishmaker Gotanda store,Fishmaker Gotanda store,Wosho Gotandaten,Custom noun
〆 Mackerel,〆 Mackerel,〆 Mackerel,Custom noun
Rafute,Rafute,Rafute,Custom noun
synonyms.txt
organic=>Medicinal food Organic food Vegetable food Organic
Creative creative cuisine=>Creative Japanese cuisine Creative cuisine Fusion cuisine
Italian=> Italianイタリア料理 パスタ ピザ
French=> Frenchフランス料理 ビストロ
Natural Language Classifier
――It was much easier to use than R & R. (I'm not saying that it will be judged properly)
--A total of about 1600 training data are generated for the two classes, positive and negative. ――I had the impression that R & R did not pick up negative words well (eg, "delicious" is caught in the search for "not delicious"), but it is judged correctly even in complicated terms (if covered by learning data). I found it interesting.
nlc_training.csv
"Satisfied with the stomach","yummy"
"My son's favorite food","yummy"
"The dignity of the fresh craftsmen is also pleasant","yummy"
"Big tummy","yummy"
"Nostalgic taste","yummy"
"It was a terrible store","yacky"
"The freshness of the sashimi was bad","yacky"
"The quality is poor. subtle","yacky"
"Halfway and there is no good point","yacky"
"Not tasty. Cheesy","yacky"
――It took more time to organize fonts and icons than R & R and NLC, and to understand bootstrap and Google Maps API introduced with interest. ――We narrowed down the search by budget by getting the amount from the search text and using Solr's filter function.
Budget judgment
function getBudgets(query) {
var B_MIN = 0;
var B_MAX = 100000;
var B_RANGE = 500;
var budgets = null;
query = query.replace(/0-9/g, function(s) {
return String.fromCharCode(s.charCodeAt(0) - 0xFEE0);
});
var matches = query.match(/\d+(?=Circle)|(From|that's all)|(Until|Within|Less than|Less than|Not enough)/g);
if (matches) {
var yens = matches.filter(function(y) {
if (isFinite(y)) {
return y > B_MIN && y < B_MAX
} else {
return false;
}
});
if (yens.length == 1) {
var condition;
try {
condition = matches[matches.indexOf(yens[0]) + 1];
} catch (e) {
condition = null;
}
yens = yens.map(function(y) {
return Number(y.replace(/^0+/g, ""));
});
if (condition) {
if (/From|that's all/g.test(condition)) {
budgets = {
budget_min: yens[0],
budget_max: B_MAX
};
} else {
budgets = {
budget_min: B_MIN,
budget_max: yens[0]
};
}
} else {
budgets = {
budget_min: yens[0] - B_RANGE,
budget_max: yens[0] + B_RANGE
};
}
} else if (yens.length >= 2) {
budgets = {
budget_min: Math.min(yens[0], yens[1]),
budget_max: Math.max(yens[0], yens[1])
};
}
}
if (budgets) {
budgets.budget_min = Math.max(B_MIN, budgets.budget_min);
budgets.budget_max = Math.min(B_MAX, budgets.budget_max);
return budgets;
} else {
return null;
}
}
--Node-RED looks like the following. ―― ~~ R & R didn't work properly unless it was an HTTP node and NLC was a dedicated node ... ~~ I wrote the credentials in the node form in the first place, but when I checked it now, if I set the service connection on the Bluemix side, will it work as it is on the dedicated node?
――Java wasn't touched much at work, but it was fun to be able to do various things such as REST calls, JSON → DB, DB → JSON or CSV. --Eclipse Maven gives an error every time a reference is added or deleted, and when I think about it, it suddenly heals and I'm not sure. ――I feel that about 3/5 of the stress generated during work is caused by Maven ...
--When generating learning data for Ranker creation, if you try to add or subtract points for a document that is not returned for the search result of a certain search statement, an error will occur in train.py, so Java will search Solr and return it. We are selecting the documents to be used.
――It takes about 4 to 5 man-days to complete the year-end and New Year holidays. ――If you don't consider the free quota, is the maintenance fee about 5,000 yen / month? (Ranker1 instance 1000 yen + NLC1 instance 2000 yen + Node-RED ~ 1500 yen + learning API) --There is a free frame & if this is the only app you are creating, the maintenance fee can be reduced to 0 yen (should?) --It is said to be here during production operation, [Using a high availability cluster ...](https://www.google.co.jp/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8# q = (0.3 + * + 24 + * + 30)% E3% 83% 89% E3% 83% AB +% E6% 97% A5% E6% 9C% AC% E5% 86% 86) ――I'm not sure about the essential learning part (the mechanism of Ranker and NLC, how to make it better). ――Especially for R & R, the search results did not seem to be convincing. I think it's because Ranker generation is poor ... ――I would like to repair it soon if possible while paying attention to the charge.
I published it on Github.
Recommended Posts