[PYTHON] Can artificial intelligence predict the price of Yu-Gi-Oh! Cards?

"The card predicted"

Overview

--Machine learning predicted the price of Yu-Gi-Oh!'S monster card --58% probability of predicting within an error of 200 yen --Market information> Card information> Deck information is important in this order

Preface

Recently, Yu-Gi-Oh has suddenly become a boom around me. Meanwhile, people around me went to a card shop and bought expensive cards, and I was wondering, "Why is the price of a card so expensive?" "Is a strong card expensive?" "How is the card price determined?" If it can be evaluated only by the strength of the card, I started thinking "Isn't it possible to predict the card price only from the card information?" If so, can it be predicted by machine learning? so

** Can artificial intelligence predict the price of Yu-Gi-Oh! Cards? ** **

Advance preparation

It's easy to collect card information, but it's a card that has been used recently. It is difficult when it comes to. I think. Therefore, I refer to the site "Kiriburo!". We mainly collect card information from recent deck recipes on this site. This time, I referred to

-[April 2017 environment] Yu-Gi-Oh! Tournament championship / winning deck recipe summary [new restrictions new rules] -[Yu-Gi-Oh! Deck Recipe] Tournament Winner / Winning Deck Recipe Summary [January-March 2017] -[Shinryu] Deck Recipe: Tournament Winner / Winning Deck Summary -[Yu-Gi-Oh! Dinosaur Tribe] Deck Recipe: 5 tournament winning and winning decks have been compiled! -[Kozmo] Deck Recipe: Tournament Winner / Winning Deck Summary, -[Junishi] Deck Recipe: Tournament Winner / Winning Deck Summary -[Inferno] Deck Recipe: Tournament Winner / Winning Deck Summary

I got the deck recipe from these 7 URLs. The scraping script looks like this.

yugioh.py


#coding:utf-8

from pyquery import PyQuery as pq
import pandas as pd
import time
import os.path

def extract_card(url):
    price_df = pd.DataFrame(
        extrat_card_price(url+"&Price=1"),
        columns=[
            "name",
            "price",
            "shop_num"
        ]
    )

    detail_df = pd.DataFrame(
        extract_card_detail(url),
        columns=[
        "num",
        "name",
        "limit",
        "kind",
        "rarity", "variety", "race",
        "attack",
        "defense"
    ])

    return pd.merge(detail_df,price_df,left_on="name",right_on="name")


def extract_card_detail(url):
    for table_tag in pq(url).find("table").eq(2):
        for tr_tag in pq(table_tag).find("tr"):
            td_tag = pq(tr_tag).find("td")
            if td_tag.eq(4).attr["colspan"] is None:
                record = pq(td_tag)
                name=record.eq(2).find("a").eq(0).text().strip()

                if len(name) != 0:
                    if record.eq(7).text()=="?" or record.eq(8).text()=="?":
                        continue

                    num = int(record.eq(0).text())
                    limit = record.eq(1).text()
                    limit = "Usually" if len(limit)==0 else limit
                    kind = record.eq(3).text()
                    rarity = int(record.eq(4).text())
                    variety = record.eq(5).text()
                    race = record.eq(6).text()
                    attack = int(record.eq(7).text())
                    defense = int(record.eq(8).text())
                    
                    yield num,name,limit,kind,rarity,variety,race,attack,defense

def extrat_card_price(url):
    for table_tag in pq(url).find("table").eq(3):
        for tr_tag in pq(table_tag).find("tr"):
            td_tag = pq(tr_tag).find("td")

            if td_tag.eq(7).text().isdigit():
                record = pq(td_tag)
                name = record.eq(2).text().strip()
                price = int("".join(filter(unicode.isdigit,record.eq(6).text())))
                shop_num = int(record.eq(7).text())

                yield name,price,shop_num


def dumpData(fulldata_filename):
    entry_points = [
        "http://yugioh-resaler.com/2017/03/20/post-15584/",
        "http://www.yugioh-resaler.com/2017/02/16/post-13395/",
        "http://www.yugioh-resaler.com/2017/02/16/post-13502/",
        "http://www.yugioh-resaler.com/2017/02/27/post-14493/",
        "http://www.yugioh-resaler.com/2017/02/15/post-13443/",
        "http://www.yugioh-resaler.com/2017/02/16/post-13514/",
        "http://www.yugioh-resaler.com/2017/02/15/post-13399/"
    ]

    stats = dict()
    history = set()
    all_df = None

    for entry_point in entry_points:
        print entry_point
        dom = pq(entry_point)

        count = 0
        for a_tag in  dom.find("a"):
            link_url = a_tag.get("href")

            #Extract only deck URL
            if "https://ocg.xpg.jp/deck/deck.fcgi" in link_url:
                filename = "data_%s" % link_url.split("=")[-1]

                if filename in history:
                    print "skip deck"
                    continue
                else:
                    history.add(filename)

                if os.path.exists(filename):
                    print "filename:" + filename + " read:" + link_url
                    df = pd.read_pickle(filename)
                else:
                    print "filename:" + filename + " write:" + link_url 
                    df = extract_card(link_url)
                    df.to_pickle(filename)
                    time.sleep(1)

                print df
                del df["num"] 

                for idx,row in df.iterrows():
                    name = row["name"]
                    if stats.has_key(name):
                        stats[name] += 1
                    else:
                        stats[name] = 1

                if all_df is None:
                    all_df = df
                else:
                    all_df = pd.concat([all_df,df]).drop_duplicates()

                print len(all_df)
                print entry_point

                count += 1

    hist_df = pd.DataFrame(
        stats.items(),
        columns=[
            "name",
            "hist"
        ]
    )

    return pd.merge(all_df,hist_df,left_on="name",right_on="name")

Basically, the data format of pandas is routed. Also, I use a library called pyquery for scraping. Until now, I used to use BeautifulSoup, but it seems that pyquery is also used, so I tried using it. It seems that "pyquery can be scraped like jquery", but I have never used jquery. However, pyquery itself is easy enough to use. The rest is pickle so as not to put a load on the server. Also, this time, we are paying attention only to ** monster cards **. The reason is that it is difficult to make a model that handles magic, traps, and monster cards at the same time. Also, monsters with offensive power and defensive power are excluded. This is also the same reason, because it is difficult to make a model. As a result of scraping

--Number of decks 274 --Number of cards 506

Data has been collected.

** "I collected card information" **

Collected data

** Card information (Basic) **

--Level

Very ordinary card information. However, in reality, some adjustments have been made. For example, "attribute" is in the format of OneHotVector, and if it is a dark attribute, is_Dark = 1, If it is a light attribute, it is expressed as is_Light = 1. The same applies to the types of monsters. For Xyz, is_Xyz = 1, and for fusion, is_Fusion. In addition, the effect is not considered because it is difficult to express.

** Market Information (Market) **

--Number of decks adopted --Number of stores

I thought it was impossible to "predict the price from the card information" that I first came up with. I thought that the reason was that "the price of a card does not depend on the strength of the card. It is expressed only by the balance between supply and demand." This is the so-called economic point of view. Therefore, we hypothesized that "demand = number of decks adopted" and "supply = number of stores handling". Therefore, I thought that "cards with a large demand (number of decks adopted) are high" and "cards with a large supply (number of stores) are cheap."

** Deck Information (Deck) **

--Conditional probability

This is a feature adopted based on the hypothesis that "cards that are often used in decks with a certain pattern are expensive?" For example, if you want to use the "Blue-Eyes Ultimate Dragon", you will need three "Blue-Eyes White Dragons". In this way, you need a card called B to use a card called A. Therefore, if the price of a strong card A is high, the price of the card B required for it will also be high. I adopted it to express the content. A expressed here is "Blue-Eyes Ultimate Dragon" and B is "Blue-Eyes White Dragon". I will explain how to calculate concretely with a simple example. Consider the following decks A and B.

Deck A: "Blue-Eyes White Dragon" "Blue-Eyes Ultimate Dragon" "Dragon's Hallows" Deck B: "Blue-Eyes White Dragon" "Blue-Eyes Ultimate Dragon" "Rush"

In this, we will consider all the card pairs and count them.

Blue-Eyes White Dragon Blue-Eyes Ultimate Dragon Dragon's treasure Rush
Blue-Eyes White Dragon 0 2 1 1
Blue-Eyes Ultimate Dragon 2 0 1 1
Dragon's treasure 1 1 0 0
Rush 1 1 0 0

For example, the pair of "Blue-Eyes White Dragon" and "Blue-Eyes Ultimate Dragon" is on both Deck A and Deck B, so the value "2" is entered. On the other hand, "Blue-Eyes White Dragon" and "Rush" exist only in Deck B, so they are 1. This value is finally normalized to the column.

Blue-Eyes White Dragon Blue-Eyes Ultimate Dragon Dragon's treasure Rush
Blue-Eyes White Dragon 0.00 0.50 0.25 0.25
Blue-Eyes Ultimate Dragon 0.50 0.00 0.25 0.25
Dragon's treasure 0.50 0.50 0.00 0.00
Rush 0.50 0.50 0.00 0.00

This matrix is used as the value. This is the so-called conditional probability.

Verification method

The main libraries used are Pandas and scikit-learn. Internally, I also use numpy from time to time.

I won't go into too much detail, but I used Lasso regression as the machine learning method. The reason is that the performance of Lasso and Ridge was good when the regression analysis was performed with Linear, Ridge, and Lasso. Lasso was adopted because it was easy to understand the explanation of variables.

Also, for deck information, the above-mentioned matrix is not used as it is. It is reduced to 50 dimensions by SVD. The reason is that there are too many explanatory variables, and when dimensionality reduction is actually performed with SVD, the total sum of explained_variance_ratio_ in 50 dimensions exceeds 90%, so that value was used.

The model itself is cross-validated in 3 divisions. Hyperparameters are described in scikit-learn's Lasso page, alpha The parameters of are adjusted from 0.1 to 1.0 in 0.1 increments, and Gridsearch CV is used to search whether the normalize parameters are True or False. Among them, the model with the best R ^ 2 coefficient is selected. After that, for verification, all training data is applied to the model with the best R ^ 2 coefficient, and the subsequent verification is performed.

inspection result

This time, we performed three types of verification. Although we call it "1 feature model" here, we are returning by selecting only one from the three types of "card information (Basic)", "market information (Market)", and "deck information (Deck)". .. The "two-feature model" is a model in which two features are taken out from three types of features and combined, such as "card information (Basic)" + "market information (Market)". Finally, we are verifying with the "3 feature model" that uses all of them.

1 Feature model

Feature R^2 Percentage of error less than 50 yen(%) Percentage of error less than 100 yen(%) Percentage of error less than 150 yen(%) Percentage of error less than 200 yen(%)
Basic 0.00782 9.09 20.75 34.78 51.19
Market 0.03645 11.46 23.72 36.96 51.78
Deck -0.03684 8.89 16.80 27.47 38.34

There are three things that can be seen from this result. All models have a low R ^ 2. It doesn't seem to fit very well. The second is that the accuracy was good when the price was predicted using only the market information (Market) information. about it. R ^ 2 is about 0.03, but the fact that there are 11% of cards that can be predicted with an error of less than 50 yen is about 2% higher than when other features are selected, so there seems to be some influence on regression. is. Third, the model regressed by the deck information (Deck) is a model with a large error. about it. The R ^ 2 value is the lowest. In addition, the ratio of error less than 50 yen is almost the same as Basic, but other than that, it is different by 4% to 10% or more, so it seems that the error is quite large.

From this content, it can be seen that the important factors for price forecasting are market information> card information> deck information in that order.

2 feature model

Feature R^2 Percentage of error less than 50 yen(%) Percentage of error less than 100 yen(%) Percentage of error less than 150 yen(%) Percentage of error less than 200 yen(%)
Basic + Market 0.08830 12.85 28.46 44.27 58.89
Basic + Deck -0.00915 9.49 20.55 35.38 55.53
Market + Deck 0.03582 12.45 24.90 40.12 54.55

There are two things that can be seen from this model. R ^ 2 of card information (Basic) + market information (Market) is about 0.08, which is the highest. Also, considering that R ^ 2 of the 1-feature model Market was 0.03, it has increased significantly. In addition, all the verification items of the two-feature model have higher prediction accuracy. The second thing to understand is that the model including "deck information" does not have good performance. about it. When comparing "Market + Deck" and "Basic + Deck", "Basic + Market"> "Market + Deck" is set for all evaluation items. This is the same evaluation as the one-feature model, and it can be read from here that the important thing in prediction is card information> deck information. But it turns out to be interesting. "Ratio of error less than 150 yen" and "Ratio of error less than 200 yen" are not so bad. about it. There was a gap of about 7% between Basic and Deck's "ratio of less than 150 yen error" in the 1-feature model. However, of the two-feature model, The gap between "Basic + Market" and "Market + Deck" with an error of less than 150 yen is about 4%. Especially remarkable is the gap of "the ratio of error less than 200 yen", which was 13% in the 1-feature model, but 4% in the 2-feature model. Therefore, the deck information also seems to contribute to reducing the error as a whole.

3 feature model

Feature R^2 Percentage of error less than 50 yen(%) Percentage of error less than 100 yen(%) Percentage of error less than 150 yen(%) Percentage of error less than 200 yen(%)
Basic + Market + Deck 0.07645 13.83 27.47 44.47 58.30

Finally, there is a 3-feature model. What should be noted is the "ratio (%) of the error of less than 50 yen". This is about 1% better than the "Basic + Market" model. Since the total number of cards is about 500, it seems that about 5 have improved. From this, deck information may contribute to improving the accuracy of prediction. about it.

image

Coefficient survey

Next, I investigated the coefficient when the Lasso regression was performed. The value uses that of the three-element model. The coefficient starting with svd is the value obtained by reducing the above-mentioned deck information by SVD. Although it is an element of the card, whether it is Xyz or not seems to affect the price considerably. Also, it seems that the recognition that "vanilla cards are cheap" is correct, as a matter of course. On the other hand, it is a little doubtful that Pendulum has a negative coefficient. "Cards with a lot of demand (number of decks adopted) are high." "Cards with a lot of supply (number of stores) are cheap." The first hypothesis seems to be generally correct. The value of the coefficient of the number of decks adopted is positive, and the number of stores handling it is negative. Also, as races, dragons and sea dragons tend to be expensive. Another thing that is surprising is that the coefficient for attributes is not 0. about it. I haven't analyzed it, but the attributes of high-priced and easy-to-use monsters may be biased. It seems that. I haven't looked at the number of decks in detail, but even if I return, including 50 elements, I actually use only 8 elements. I felt that. To say a few words here, I think there are some attributes and races that are not available. The reason is that the cards originally obtained by scraping are biased, and the list of attributes and races is extracted from those data.

element Coefficient value
level 0.000
Offensive power 0.000
Defensive power 0.006
vanilla -37.348
effect 0.000
fusion 0.000
tuner 0.000
Synchro 0.000
Xyz 139.618
Pendulum -2.289
Special summon 0.000
reverse 0.000
Union 0.000
dual 0.000
Ritual 0.000
Ban 0.000
Limits 0.000
No limit 0.000
Plant family 0.000
Birds and beasts 0.000
Flame tribe 0.000
Machine family 0.000
Warrior tribe 0.000
Devil tribe 0.000
Dinosaur tribe 0.000
Psychic 0.000
Aquarium 0.000
Beast Warrior -2.044
Angels 3.232
Wizards 0.000
Rock tribe 0.000
Undead 0.000
Reptiles 0.000
Dragon tribe 139.839
Fish family 0.000
Beast tribe 0.000
Sea Dragon 190.029
Phantom Dragon 0.000
Thunder tribe 0.000
Insect tribe -32.625
Darkness attribute 0.000
Light attribute 62.002
Wind attribute 12.426
Flame attribute 0.000
Earth attribute -8.750
Water attribute 0.000
Number of stores -4.901
Number of decks adopted 2.901
svd_0 0.000
svd_1 0.000
svd_2 0.000
svd_3 136.713
svd_4 0.000
svd_5 0.000
svd_6 0.000
svd_7 0.000
svd_8 0.000
svd_9 0.000
svd_10 81.461
svd_11 306.910
svd_12 0.000
svd_13 286.363
svd_14 0.000
svd_15 0.000
svd_16 0.000
svd_17 406.618
svd_18 0.000
svd_19 0.000
svd_20 0.000
svd_21 0.000
svd_22 0.000
svd_23 0.000
svd_24 971.848
svd_25 0.000
svd_26 565.228
svd_27 0.000
svd_28 0.000
svd_29 0.000
svd_30 0.000
svd_31 0.000
svd_32 0.000
svd_33 0.000
svd_34 0.000
svd_35 0.000
svd_36 0.000
svd_37 0.000
svd_38 0.000
svd_39 0.000
svd_40 0.000
svd_41 0.000
svd_42 0.000
svd_43 0.000
svd_44 0.000
svd_45 0.000
svd_46 0.000
svd_47 385.641
svd_48 0.000
svd_49 0.000

Was the deck information effective?

Was the deck information effective in the story so far? I think there was a question. In the original plan, I included this information from the hypothesis that "Isn't the card often used in a certain pattern of decks expensive?" I will introduce it because I got interesting data in it.

Frog deck

It is a deck that fights around monsters called frogs, but I think this is an example of the effect of deck information. It can be seen that the error is smaller when the deck information is added than the result predicted only by the card information and the market information. Moreover, there is an interesting fact that both positive and negative data are available. For example, the demon frog was expected to be 136 yen, but by adding deck information, it is 171 yen. Therefore, by adding deck information, the expected price will rise and approach the original price. On the other hand, if you look at the mochi frog, the price will drop and approach the original price by adding deck information. In this way, the predicted price fluctuates depending on the deck information, and it seems that the deck information is effective.

card name price Basic+Market Basic+Market+Deck
Demon frog 197 136.98 171.25
Stylish frog 444 202.34 229.41
Machi Frog 319 228.81 253.91
Mochi frog 162 253.00 230.94

Moonlight deck

Quote

I'm comparing the prices of monsters called moonlights, which is not a very good result. In a sense, it's as expected, but it's basically above. In particular, the rising width of the moonlight black sheep is close to 50 yen, which is far from the original 35 yen. Perhaps the price is up because of the moonlight theme. I predict that.

card name price Basic+Market Basic+Market+Deck
Moonlight tiger 29 149.88 179.03
Moonlight black sheep 35 87.14 134.17
Moonlight dance leopard princess 165 90.09 114.96
Moonlight Dance Lion Princess 95 92.06 116.14
Moonlight Aya Hina 271 123.68 148.71
Moonlight blue cat 39 132.29 156.05
Moonlight wolf 28 158.15 185.12

Super frequent card group

It is a group of cards used in 80% to 90% or more of the deck. As you can see from this, the price is lowered by adding deck information. The reason is simple: it doesn't depend on a specific deck, so it loses its features and there is no price increase due to deck information. Therefore, I think the price has dropped. As you can see by comparing the coefficients, the deck information basically contributes to the increase in price. Therefore, without that effect, the price will be lower than if there was no deck information.

card name Number of decks adopted price Basic+Market Basic+Market+Deck
Proliferating G 283 791 954.94 948.52
Ash flow back 265 1845 1039.92 984.87
Ghost Rabbit 238 1546 1032.86 966.43

Consideration about unpredictable cards

High-priced card

This time I couldn't predict the expensive card. Below are various statistics on prices. There are two things to look at, the average and 50%. The average is 312 yen, while 50% is 151 yen. In fact, it is pulled by a small number of high-priced cards and the overall average is rising. As you can see, the number of high-priced cards is overwhelmingly small, and the price itself is difficult to predict.

item value
average 312.31
Distributed 413.44
minimum 23.00
25% 48.25
50% 151.50
75% 400.75
max 3547.00

Cards with varying prices

Actually, the price of the "Blue-Eyes White Dragon" card cannot be predicted.

card name price 3 Predicted value of feature model
Blue-Eyes White Dragon 142 587.06

Certainly, there is a gap between the predicted value and the price. I was curious and investigated the price of "Blue-Eyes White Dragon".

Blue-Eyes White Dragon

Rarity price
Normal 155
Super 238
Ultra 325
Secret 2136
Parallel 226
Ultimate 13835
Holographic 4846
N-Parallel 145
Millennium 1032
KC 862
KC-Ultra 1397
Holo-Parallel 2754

As written in the table, there are various rarities in the Yu-Gi-Oh card, and the price goes up and down depending on the difference in rarity. This is a very extreme example, but there is a price difference of nearly 100 times between Normal and Ultimate. Therefore, how much is the price of "Blue-Eyes White Dragon"? However, there is a fact that it is difficult to say at one price because the rarity must be taken into consideration. Since we have used the average value here, it seems that we are affected by the price fluctuations around here. Therefore, it is limited to "Normal" at the initial data preparation stage. I think it would have been better to give conditions such as. I think. However, that would be annoying because Xyz and Synchro, which are only rare cards, would not come into the data.

Cards with many themes to combine

Although it is still in the investigation stage, the prediction accuracy is not high for "cards with many themes to be combined". It is the result. The following are the results of a survey on Shinryu.

card name price Basic+Market Basic+Market+Deck
True Dragon Machine Soldier Darth Metatron 76 152.08 141.94
True Dragon Knight Dryas III 26 117.07 120.43
True Dragon Sword Emperor Master P 711 506.38 469.86
True Dragon Swordsman Master P 67 263.12 286.72
Shinryu Kenshi Dynamite K 29 323.41 324.99
True Dragon Emperor V.F.D. 214 328.58 330.36
True Dragon Emperor Agni Mazud V 329 235.98 242.48
True Dragon Emperor Baharstos F 588 187.68 170.31
True Dragon Emperor Lithos Azim D 1234 329.10 326.80
True Dragon Warrior Ignis H 28 250.82 247.79
True Dragon Guru Majesty M 30 287.44 265.60
Shinryu Mariamne 290 218.55 202.92

Even if you add deck information, the price does not move in the direction of good. I think this is because in the current situation of Yu-Gi-Oh, considering the deck configuration that makes use of frogs, there are fewer patterns that can be used than before, valid cards are fixed, and the data set makes it easy to predict prices. However, regarding Shinryu,

--Dinosaur Shinryu --Life-shaving true dragon --Shinryu Metal Kozmo --Shinryu Kozmo --Shinryu WW (Shinryu Summon Beast WW) --True Dragon Summon Beast

And, there are many deck composition patterns even with a little research. Probably because of that, the deck information doesn't work well and I think it's unpredictable. Similar results can be seen with "Kozmo", "Summon Beast", "Inferno", etc. Since these themes are probably new forms, it is thought that one of the causes is that the standard cards have not been decided yet at the trial and error stage. This is consistent with the reason that "the price of super-frequent cards cannot be predicted".

Summary

--Machine learning card price forecast by combining card information, market information, and deck information --58% probability of predicting within an error of 200 yen --Card price is difficult to express as one average value due to rarity --If you add deck information, there are some themes (frogs) that can be predicted well and some that cannot (true dragons). --It's hard to predict the unstable themes of standard cards --Cards in the high price range of 200 yen or more have little data and are difficult to predict

Impressions

It was harder than I expected. It was difficult to predict with "card information", but it was quite difficult to get only reasonable accuracy even if "market information" was added. There were too many blanks, and the lack of knowledge about Yu-Gi-Oh added to the difficulty of prediction. After returning by combining "card information" and "market information", I haven't been able to make a big prediction, so "Hmm. Whether or not" blue-eyed "is included in the name of the card." But, it's a very artifact tuning, "or" Analyzing the natural language of the monster effect? Isn't it too heavy for a GW task? " Meanwhile, when I was looking at the list of unpredictable cards, I saw the lineup of "Legendary Akaishi," "Red-Eyes Black Flame Dragon," and "Red-Eyes Evil Star Dragon-Meteor Dragon." When I thought, "I have to take the combination system into consideration," I remembered, "Yes," distribution hypothesis. " There was such a story around word2vec in a natural language processing story, so I thought it would be quite interesting to include the co-occurrence probability of the card, so I incorporated the information as deck information. As a result, it didn't work very well, but it's quite interesting analytically. I thought. I'd like to do something interesting with machine learning, but I'm sick of thinking, "I wish I could get a little more precision." difficult. After all, if you write an article about Yu-Gi-Oh, after all [Slow live commentary] Can a person duel with just the cards he picks up? [Yu-Gi-Oh] I wanted to homage. I am satisfied with that. so,

** "Artificial intelligence can predict the price of Yu-Gi-Oh! Card" **

** "But it's difficult, so if you do it, get serious!" ** image

Recommended Posts

Can artificial intelligence predict the price of Yu-Gi-Oh! Cards?
I tried to predict the price of ETF
Predict candlesticks with artificial intelligence
Use PyCaret to predict the price of pre-owned apartments in Tokyo!
The first artificial intelligence. How to check the version of Tensorflow installed.
Predict the rise and fall of BTC price using Qore SDK
Can the Kalman filter be used to predict stock price trends?
Predict the second round of summer 2016 with scikit-learn
Here is one of the apps with "artificial intelligence" that I was interested in.