[PYTHON] Talking about the features that pandas and I were in charge of in the project

Introduction

This time, in 6 weeks, I participated in a project from team building to product release. Please see the link below for the entire project. I would like you to read the articles and past articles about me, but in short, I am inexperienced and looking for a job as an engineer.

Click here for the project

In this article, I will output what I learned in implementing the functions I was in charge of.

About pandas

A type of python module. If you put this in, you will be able to use various functions related to data analysis and analysis. For more information, see Official Documents and This Article. ..

It wasn't until the production was decided at the first MTG that I decided to use pandas, but since I was new to pandas, I asked them to use one of the six weeks of the project as pre-learning, so the basics. Learn the target, After that, I used it while making it and checking it. However, I was not able to use it in earnest, mainly

--Read CSV and convert to DataFrame --Read multiple CSV files, combine them, and export --Get only the information of the field you want from CSV and return it --Edit CSV

That is the basic part. Let's look at each one.

What is a CSV file in the first place?

→ The quick story is the format used for spreadsheet software with rows and columns such as EXCEL and spreadsheets. You can use pandas to read and process it, and convert python data types to this format. By the way, the DataFrame type is what pandas converted so that CSV can be handled by python.

#Price Selling quantity Inventory rate
#Apple 300 10 80%
#Mandarin orange 100 2 20%
#Melon 900 6 60%

If you display the DataFrame with print () etc., it will be displayed as above, for example. You've seen this table.

Read CSV file

Let's actually touch pandas. First, read the CSV file. By the way, what is a csv file?


client_id,client_name
1,Sample B Co., Ltd.
2,Sample I Co., Ltd.
3,Sample F Co., Ltd.
4,Sample D Co., Ltd.
5,Sample E Co., Ltd.
6,Sample C Co., Ltd.
7,Sample G Co., Ltd.
8,Sample A Co., Ltd.
9,Sample H Co., Ltd.

It will be a file like this. Define the field names on the first line, separated by commas. You can see that the second and subsequent lines represent the data that will be inserted into the field. To actually read it, write as follows.

python


import glob
import pandas as pd

sample_data = pd.read_csv('sample.csv')

#If there are multiple
df = pd.DataFrame() #Create an empty instance
file_list = glob.glob(CSV_PATH) #Specify the file path. By using the glob module`sample/data*`If you set an argument such as, all csv files containing the character string "data" under that directory will be targeted.


for file_count in file_list:
    dfn = pd.read_csv(file_count)
    df = pd.concat([df, dfn])

The read_csv () method is the method that reads the CSV, specifically the method that converts the CSV file to the DataFrame type. The argument contains the file path. Remember that pandas basically imports with ʻimport pandas as pd` and uses it in the form of a pd. Method. And the important thing is that if you want to read multiple CSV files at the same time, for example, you may want to combine multiple CSV files one by one into one file as described above. That's why we're doing that. What you are doing

  1. Create an empty instance
  2. Specify the file path of the directory containing the CSV file in the variable
  3. Combine them in order with a for statement

It means that. Since concat () is a method used to combine CSV files, you can see that CSV is combined one after another with empty DataFrame instances to make one CSV. Specify a list as an argument. The file path is entered in file_list, but since sample / data * is specified, the file with the file name containing the character string data under sample will be entered repeatedly in ascending order. For example


sample
  |
  |-----data1.csv
  |-----data2.csv
  |-----data3.csv

In such a case, it will be processed in order from data1.csv. Since concat () cannot be combined at once, it is a process like this. By the way, there is also a method called merge () for merging, but this seems to be used for merging files that have the same primary key but different Columns.

Try to make the CSV file a python dictionary type

Next, let's make the read CSV editable with python.


import pandas as pd

CSV_COLUMN = ['date','sales','client_id','staff_id']

#Define an empty list
data = []
field_names = CSV_COLUMN
#Read the file without writing.
with open(file_path, 'r', encoding='utf-8') as f:
    #Get each field in dictionary type and add it to data line by line
    for row in csv.DictReader(f, fieldnames=field_names):
        data.append(row)


The field name defined is assigned to CSV_COLUMN. Then use with open () as f: to read the CSV file read-only. This syntax is not limited to CSV, it is a syntax used when reading a file, and the previous one was to convert it to a DataFrame type, but this time it will simply get the contents of the file. The argument contains the file path, options, and encoding settings, and the read contents are assigned to the f variable. This time, as mentioned above, it is read as read-only in utf-8 format. Then, this time, the contents of the read file are extracted with a for statement. csv.DictReader () is a method to get a CSV file as a dictionary. An example of processing is as follows.

#If you read such a CSV file

date,sales,client_id,staff_id
2020-01-01,8390,8,9
2020-01-02,8712,1,8
2020-01-03,8146,6,8

#It will be stored in data like this

[([('date', '2020-01-01'), ('sales', '8390'), ('client_id', '8'), ('staff_id', '9')]),
([('date', '2020-01-02'), ('sales', '8712'), ('client_id', '1'), ('staff_id', '8')]),
([('date', '2020-01-03'), ('sales', '8146'), ('client_id', '6'), ('staff_id', '8')])]


Try to edit the dictionary type data

Let's add a field to the previous data.

#The first line is treated as a field name, so there`tax_salse`Add a field called
data[0]['tax_sales'] = 'tax_sales'

#Next, the second and subsequent lines(=The real value part of each field)Enter the value in the newly added field with the for statement.
for row in data[1:]:
    #row row tax_Insert the value obtained by multiplying the sales field by the tax rate in the salse field.
    row['tax_sales'] = int(row['sales']) * (1.0 + TAX_RATE)

Try to merge different CSV files with merge and output CSV

import pandas as pd

CSV_COLUMN_ADD = ['date','sales','tax_sales','client_id', 'client_name','staff_id','staff_name']

#Convert each dictionary type data to DataFrame type.
sales_data = pd.DataFrame(sales_data)
client_data = pd.DataFrame(client_data)
staff_data = pd.DataFrame(staff_data)

#merge
merge1 = pd.merge(sales_data, client_data, how='outer')
merge2 = pd.merge(merge1, staff_data, how='outer')
#Argument CSV_COLUMN_Sort fields in ADD order
merge_data_result = merge2.loc[:,CSV_COLUMN_ADD]
#Delete index
merge_data_result = merge_data_result.drop(merge_data_result.index[0])


Like concat (), merge () also specifies two files that you want to merge into arguments. how is an option to specify how to join. how ='outer' looks at thekey in the outer join specification and joins so that all the data that is only in one table is left. Also, when merge () is done, the field name is added as index on the 2nd line and it may be duplicated with the 0th line, so this time it is deleted by the drop () method. You can refer to the column name witharray name.loc [], and by applying this to :, CSV_COLUMN_ADD` this time, all columns are sorted in the order of CSV_COLUMN_ADD. ..

This is the very basic usage of pandas and python.

Advanced version

Now, let's see how to use Django while actually interacting with the DB. For example, if you have the following model and want to perform this kind of processing accordingly.

Model that stores brand information (Mutual Fund model)

from django.db import models
from accounts.models import CustomUser


class MutualFund(models.Model):
    class Meta:
        db_table = 'MutualFund'
        verbose_name_plural = 'Investment trust information'

    DELETE_FLAG = ((0, 'Not deleted'), (1, 'Delete'))

    # id = AutoField(primary_key=True)  #No definition required as it is added automatically
    url = models.CharField('Fund URL', max_length=255, null=True, blank=True)
    fund_name = models.CharField(
        'Fund name', max_length=255, null=True, blank=True)
    company = models.CharField('company name', max_length=255, null=True, blank=True)
    category_obj = models.ForeignKey(
        Category,
        verbose_name='Category',
        on_delete=models.CASCADE
    )
    rate = models.IntegerField('Overall rating', null=True, blank=True)
    return_percent = models.FloatField('Return rate(3 years)', null=True, blank=True)
    risk = models.FloatField('Risk value(3 years)', null=True, blank=True)
    fee = models.FloatField('Trust fees, etc. (tax included)', null=True, blank=True)
    net_assets = models.IntegerField('Net assets (million yen)', null=True, blank=True)
    delete_flag = models.IntegerField('Delete flag', choices=DELETE_FLAG, default=0)

    def __str__(self):
        return self.fund_name

Model that refers to information from related models (Portfolio model)

from django.db import models
import sys
import pathlib
# base.Get the absolute path of the directory where py is
# current_dir = pathlib.Path(__file__).resolve().parent
# #Add path with module
# sys.path.append( str(current_dir) + '/../' )

# print(sys.path)
from accounts.models import CustomUser
from fund.models import MutualFund


#Information is referenced from each model.
# customuser_obj and mutual_fund_obj is packed with referrer information.

class Portfolio(models.Model):
    customuser_obj = models.ForeignKey(CustomUser, on_delete=models.CASCADE)
    mutual_fund_obj = models.ForeignKey(MutualFund, on_delete=models.CASCADE)
    amount = models.IntegerField(null=True, blank=True)
Processing code


# risk_col and return_Tentative definition of col
risk_col = 0
return_col = 0

#Function to find the risk difference


def risk_differ(x):
    return risk_col - x.loc["risk"]


#Function to find the return difference


def return_differ(x):
    return return_col - x.loc["return_percent"]


def find_fund_near_risk(fund_id, num_fund_obj):
    """
Convert the acquired record to DataFrame, create a new field, store the risk difference with the specified stock as an absolute value, sort and return it

    Arguments:
        fund_id : str
trading name.
        num_fund_obj : int
Number of acquisitions.

    Returns:
        brand_risk_near : DataFrame
    """
    #Get records in dictionary type.
    brand_info = MutualFund.objects.values(
        "id", "company", "fund_name", "risk")

    #Convert to DataFrame
    brand_info_df = pd.DataFrame(brand_info)

    #Extract risk fields of designated stocks from DF
    find_obj = brand_info_df[brand_info_df["id"] == fund_id]
    risk_col = find_obj["risk"]

    #Create a field to enter the calculation result of the risk difference
    brand_info_df["differ"] = np.nan

    #Store the value of the risk difference in the differ field and make the value absolute.
    brand_info_df["differ"] = brand_info_df.apply(risk_differ, axis=1).abs()

    #Delete the line of information of the brand specified by the argument
    deleterow = brand_info_df.index[brand_info_df["id"] == fund_id]
    brand_info_df = brand_info_df.drop(deleterow)

    #Sort in ascending order and remove difference and id fields
    brand_info_df = brand_info_df.sort_values("differ")
    brand_info_df = brand_info_df.drop(columns=["id", "differ"])

    #Number limit
    brand_risk_near = brand_info_df.head(num_fund_obj)

    return brand_risk_near

However, it's hard to understand suddenly, so let's first check how to operate the DB in Django. For details and more complicated specifications, please refer to the document or reference article as it was organized in an easy-to-understand manner. I'll take a brief look here.

Get

There are two patterns for acquisition: acquisition of all items and acquisition by searching. For all records, models.objects.all () will get all the records for that model. Next is the pattern to search and get, but you will use the get () method or the filter () method, but basically you will use them properly because the return value is different. get () returns only one record that matches the argument as a ** object **. So, for example, models.objects.get (pk = 1), which is used when you want to get only specific data. In contrast, filter () returns the matched records as a ** list of objects **. So, basically, it is used when fetching records that are expected to be registered multiple times, such as models.objects.filter (name ='sample'). By the way, as a matter of course, since the return value is a list of objects, it cannot be processed as it is, so the values () method described later will be used. If you want to do something like get () with filter (), follow filter () with the first () method. first () returns the first one of the retrieved query sets as an object.

Convert the acquired record

You may want some of the acquired records to be dictionary type or list. In that case, use the values () method and its derivative values_list () method.

For example, use it as follows.


#Extract the record whose name field value is sample from the model
query = models.objects.filter(name='sample')

#Get the value of the id field from the extracted record in a list
models_column_id_list = query.values_list('id')

#Get records of id, name, email fields directly from the model in a list of dictionaries
models_column_dict_list = models.objects.values("id", "name", "email")


Illustration

Now let's go back to the code at the beginning. This time, I set the risk or return difference between stocks as an absolute value and sort in ascending order. I implemented the function to enumerate the stocks whose risk (return) is similar to the specified stock, but that is the code below at the beginning.


from fund.models import MutualFund
from portfolio.models import Portfolio
import pandas as pd
import numpy as np
from django.db.models import Count


# risk_col and return_Tentative definition of col
risk_col = 0
return_col = 0

#Function to find the risk difference


def risk_differ(x):
    return risk_col - x.loc["risk"]


#Function to find the return difference


def return_differ(x):
    return return_col - x.loc["return_percent"]


def find_fund_near_risk(fund_id, num_fund_obj):
    """
Convert the acquired record to DataFrame, create a new field, store the risk difference with the specified stock as an absolute value, sort and return it

    Arguments:
        fund_id : str
trading name.
        num_fund_obj : int
Number of acquisitions.

    Returns:
        brand_risk_near : DataFrame
    """
    #Get records in dictionary type.
    brand_info = MutualFund.objects.values(
        "id", "company", "fund_name", "risk")

    #Convert to DataFrame
    brand_info_df = pd.DataFrame(brand_info)

    #Extract risk fields of designated stocks from DF
    find_obj = brand_info_df[brand_info_df["id"] == fund_id]
    risk_col = find_obj["risk"]

    #Create a field to enter the calculation result of the risk difference
    brand_info_df["differ"] = np.nan

    #Store the value of the risk difference in the differ field and make the value absolute.
    brand_info_df["differ"] = brand_info_df.apply(risk_differ, axis=1).abs()

    #Delete the line of information of the brand specified by the argument
    deleterow = brand_info_df.index[brand_info_df["id"] == fund_id]
    brand_info_df = brand_info_df.drop(deleterow)

    #Sort in ascending order and remove difference and id fields
    brand_info_df = brand_info_df.sort_values("differ")
    brand_info_df = brand_info_df.drop(columns=["id", "differ"])

    #Number limit
    brand_risk_near = brand_info_df.head(num_fund_obj)

    return brand_risk_near


#Get records in dictionary type.
    brand_info = MutualFund.objects.values(
        "id", "company", "fund_name", "risk")

    #Convert to DataFrame
    brand_info_df = pd.DataFrame(brand_info)


In this part, the record is fetched by specifying the field to be fetched from DB, and it is converted to DataFrame type after making it as a list of dictionary. Then, in brand_info_df, the columns will be a table that represents records in each field and row. This time, we have to calculate the difference in risk (return) between the designated stock and other stocks, so we will acquire all of them for the time being.

python



 #Extract risk fields of designated stocks from DF
    find_obj = brand_info_df[brand_info_df["id"] == fund_id]
    risk_col = find_obj["risk"]

Then edit the DataFrame. This time, the id field contains the value of the primary key, which means that the id field of brand_info_df identifies the line that matches the fund_id. And this time, I want the information of the risk field from the specified line, so I will extract it further.

python



    #Create a field to enter the calculation result of the risk difference
    brand_info_df["differ"] = np.nan

    #Store the value of the risk difference in the differ field and make the value absolute.
    brand_info_df["differ"] = brand_info_df.apply(risk_differ, axis=1).abs()

    #Delete the line of information of the brand specified by the argument
    deleterow = brand_info_df.index[brand_info_df["id"] == fund_id]
    brand_info_df = brand_info_df.drop(deleterow)

    #Sort in ascending order and remove difference and id fields
    brand_info_df = brand_info_df.sort_values("differ")
    brand_info_df = brand_info_df.drop(columns=["id", "differ"])

    #Number limit
    brand_risk_near = brand_info_df.head(num_fund_obj)

    return brand_risk_near


Here, we will perform DataFrame type operations in earnest. What you have to implement with this function is to ** calculate the absolute value of the specified issue and other issues registered in the record, and sort in ascending order **. However, I couldn't write the process only with the query set (spell that pulls data from DB), so I'm implementing it in this way using Pandas. So maybe this isn't better, so don't take it too much, and if you have code that can express that process with just the queryset, it would be helpful.

Quiet talk, let's take a look at the code.

First of all, since the original model does not have a field that represents the absolute value, we will add it in the DataFrame. Create a new field by using np.nan of the library called numpy and putting missing values in the created field for the time being. Next, replace the missing value with an absolute value. The ʻapply ()method is a method that applies a function to a DataFrame. ʻApply (function to apply, where to apply)is an argument. As for the place to apply, you can specify the element, column, and row. For example, if ʻaxis = 1 like this time, it will be applied to each row of DataFrame. ʻAbs () is a method that returns an absolute value. Regarding risk_differ

python



def risk_differ(x):
    return risk_col - x.loc["risk"]

Is defined as. Since risk_col contains the risk value of the specified stock, it means that you should subtract the risk value of the record of each row from it. This will insert an absolute value in brand_info_df ["differ "] on each line.

At this point, all you have to do is arrange the appearance. First of all, if this is left as it is, the designated stocks will still be included in the table. This time, the difference in risk (return) from the designated stock is small = the process is to list stocks with similar risk (return) to the designated stock. The line of the specified brand will be deleted. First, substitute the extracted row of the specified brand into the delete row variable. This time, in the get_loc () method, use ʻindexto get the line number of the specified brand line and substitute it. Then, assigning it to the argument of thedrop ()` method completes the deletion.

Next is sorting. There are various ways to sort a DataFrame, but this time we want to sort by referring to the values, so we will use the sort_values () method that sorts the elements. The argument specifies the field you want to sort. By the way, if you want to sort in descending order, return ʻascending = Falsein the second argument. After that, the unnecessary fields when displaying on the actual screen are deleted by the previousdrop ()method. Finally, use thehead ()` method to get the specified number of items from the beginning and return it with return. After that, if you make a return version of this as it is, the work is completed.

In this project, I learned that it is possible to convert the information acquired from the DB into a DataFrame and return the processed one.

Application 2

Up to this point, we have processed the DataFrame, but if you refer to the field for the acquired record and perform the aggregation process, you can also do it in the category of DB operation. As an example, the processing is as follows.

python



def find_fund_popular_user(fund_id, num_fund_obj):
    """
Extract the other stocks that the person who has the stock of the argument has

    Arguments:
        fund_id : int
        num_fund_obj : int

    Returns:
        fund_list : list
    """
    #Search the Portfolio model from the argument brand and extract the users who have the argument brand
    query = Portfolio.objects.filter(mutual_fund_obj__id__exact=fund_id)

    #User ID(customuser_obj)Only extract in list
    query = query.values_list('customuser_obj', flat=True)

    #Extract all records that contain the extracted ID
    customuser_obj_id_list = Portfolio.objects.filter(customuser_obj__in=query)

    #Play the argument brand
    customuser_obj_id_list = Portfolio.objects.exclude(
        mutual_fund_obj__id__exact=fund_id)

    #This time fund_id(mutual_fund_obj)Extraction
    mutual_fund_obj_list = customuser_obj_id_list.values('mutual_fund_obj')

    #Aggregate the number of occurrences and get the number of arguments from the top
    fund_count = mutual_fund_obj_list.annotate(portfolio_fund_count=Count(
        expression='mutual_fund_obj'))

    fund_count_list = fund_count.order_by(
        '-portfolio_fund_count')[0:num_fund_obj]

    #using for, fund_count_Get MutualFund objects corresponding to the number of list, store in empty list and return
    fund_list = []
    for fund_record in fund_count_list:
        fund = MutualFund.objects.get(pk=fund_record['mutual_fund_obj'])
        fund_list.append(fund)

    return fund_list


python



    #Search the Portfolio model from the argument brand and extract the users who have the argument brand
    query = Portfolio.objects.filter(mutual_fund_obj__id__exact=fund_id)

    #User ID(customuser_obj)Only extract in list
    query = query.values_list('customuser_obj', flat=True)

    #Extract all records that contain the extracted ID
    customuser_obj_id_list = Portfolio.objects.filter(customuser_obj__in=query)

    #Play the argument brand
    customuser_obj_id_list = Portfolio.objects.exclude(mutual_fund_obj__id__exact=fund_id)

    #This time fund_id(mutual_fund_obj)Extraction
    mutual_fund_obj_list = customuser_obj_id_list.values('mutual_fund_obj')


First of all, this part has a similar flow as before. The difference is that the information is obtained from the related model. The Portfolio model is related to the MutualFund model and the CustomUser model, which is not referenced in this process. The Portfolio model is the side that references each model. See the image below for what happens.

2020-06-27_03h02_47.png

As you can see from this, the Customuser obj and Mutual fund obj fields are pull-down menus. What this means is that the Customuser obj field contains the record information of the CustomUser model referenced by the foreign key, and the Mutual fund obj field also refers to the MutualFund by the foreign key. It shows that all the information of the record of is packed. In other words, if you know the foreign key, you can pull the information of the MutualFund model from the Portfolio model. That's what the first line of code above does.

This time, first of all, I want to extract the users who have the brand of pk = fund_id in the MutualFund model that has the brand of the argument = registered brand. It means to filter by the value of id in the Mutual fund obj field. At this time, if you want to refer to the value of the referenced field, It takes the form of field name containing the referenced information__ the field name you want to refer to __exact = ~. Since ʻexactis an option called exact match, this time we will extract the one that exactly matches fund_id. Then, the users who have the specified brand will be extracted, so this time, use thevalue_list ()method to extract theCustomuser obj` that is full of user information in the list.

At this point, I want to extract the brands owned by the extracted users, so I will filter the Portfolio model again. Here, I extracted Customuser obj from the list earlier to use it as a filter search condition. By applying reference notation to the field name __in = ~, which is filled with the information of the reference destination, you can set the search condition to when the reference destination field contains ~. .. Of course, if you extract it, the issue of the argument is also included, so exclude it with the method ʻexclude (). At this stage, we were able to extract issues other than those specified by users with the specified issue, but since we only need the mutual_fund_objfield for aggregation. Finally, convert themutual_fund_obj` field to a dictionary type.

When you come to this point, the rest is the aggregation process.

There are several ways to aggregate queries and query sets, but the ʻannotate () used this time is a method that aggregates against query sets. There are other methods that aggregate against query sets, ʻaggregate (), but ʻannotate ()will be used for models that have relationships. Specifically, it seems to be used when you want to aggregate for the field that contains the information of the referenced model. This time, I want to implement a function that ** sorts and displays other brands owned by users with the specified argument in order of popularity (most users have that brand) **. As we have prepared in the process so far, we want to perform aggregation processing onmutual_fund_obj, so we will use ʻannotate (). The actual code part is as follows, let's check it.

python



    #Aggregate the number of occurrences and get the number of arguments from the top
    fund_count = mutual_fund_obj_list.annotate(portfolio_fund_count=Count(
        expression='mutual_fund_obj'))

    fund_count_list = fund_count.order_by(
        '-portfolio_fund_count')[0:num_fund_obj]

    #using for, fund_count_Get MutualFund objects corresponding to the number of list, store in empty list and return
    fund_list = []
    for fund_record in fund_count_list:
        #I want the data of the brand, so fund_count_fund with for stored in list_Mutual retrieved to record_fund_Get against MutualFund using obj()multiply
        fund = MutualFund.objects.get(pk=fund_record['mutual_fund_obj'])
        fund_list.append(fund)

    return fund_list


What was mutual_fund_obj_list like?


    #Play the argument brand
    customuser_obj_id_list = Portfolio.objects.exclude(mutual_fund_obj__id__exact=fund_id)

    #This time fund_id(mutual_fund_obj)Extraction
    mutual_fund_obj_list = customuser_obj_id_list.values('mutual_fund_obj')

It was like this. Based on that, if you explain ʻannotate ()`, first

mutual_fund_obj_list.annotate(portfolio_fund_count=Count(expression='mutual_fund_obj'))

In this part, create a portfolio_fund_count field in the Portfolio object, and think that the processing is done such that the execution result of theCount ()method is inserted there. I did the process of creating a new field with DataFrame and inserting the calculation result into it, but the image is similar. Count (expression ='mutual_fund_obj') is the process of counting how many mutual_fund_objs have appeared. At this time, the information that is aggregated is the information that refers to the contents of mutual_fund_obj. So to put it very roughly

Brand A is owned by 5 users
Brand B is owned by 3 users

……


Will be aggregated. (The above example is just an image) After that, you can sort the aggregated results in ascending order by ʻorder_by. For the part of [0: num_fund_obj], the number of acquisitions is entered in num_fund_obj`.

At this point, in order to pass the data to the template, use the for statement for the final fund_count_list and store the data in an empty list to complete the work.

(pk=fund_record['mutual_fund_obj'])

This part specifies pk with the information of mutual_fund_obj extracted by the for statement. That's because mutual_fund_obj naturally contains information about the primary key of the brand.

Creating a process (Views.py) to pass data to a template

Once the data is ready, we will write the process of passing it to the template for display on the screen. The code is below.


class FundDetail(LoginRequiredMixin, View):

    """
-Obtain information on stocks with similar risks and stocks with similar returns.
・ Extract other brands that the person who owns the argument brand has
    """

    def get(self, request, fund_id, *args, **kwargs):

        #Acquisition number setting
        num_fund_obj = 5

        #Substitute a list of stocks with similar risks and returns into variables
        brand_risk_near_list = find_fund_near_risk(fund_id, num_fund_obj)
        brand_return_near_list = find_fund_near_return(fund_id, num_fund_obj)

        #DataFrame header information
        brand_risk_near_header = brand_risk_near_list.columns.tolist()
        brand_return_near_header = brand_return_near_list.columns.tolist()

        #DataFrame content information
        brand_risk_near_contents = brand_risk_near_list.values.tolist()
        brand_return_near_contents = brand_return_near_list.values.tolist()

        #Get the other brands that the user who has the argument brand has from the top of the registered number
        fund_popular_list = find_fund_popular_user(fund_id, num_fund_obj)

        context = {
            'risk_header': brand_risk_near_header,
            'risk_contents': brand_risk_near_contents,
            'return_header': brand_return_near_header,
            'return_contents': brand_return_near_contents,
            'popular_user_funds': fund_popular_list,
        }
        return render(request, 'fund/fund_detail.html', context=context)



This time, we will pass DataFrame type data and object, but the passing method is different for each. First, in the case of DataFrame type, data will be passed separately for header (field information) and content (value of each field) in views.py. In the case of an object, it will be passed as it is. After storing these data in variables, define them so that they can be called by the template in context. For example, if you want to call'brand_risk_near_headeron the template, write'risk_header'. After reaching this point, use the render method to passcontext to the template and retrieve it with for`.

The template is as follows.



<div class="row">
    {% comment %}List of stock information with similar risks{% endcomment %}
    <div class="col">
        <h5>Stocks with similar risk to this stock</h5>
        <div class="">
            <table class="table">
                    <thead>
                    <tr>{% for row in risk_header %}
                        <th>{{ row }}</th>{% endfor %}</tr>
                    </thead>
                    <tbody>
                    {% for i in risk_contents %}
                        <tr>{% for row1 in i %}
                            <td>{{ row1 }}</td>{% endfor %}</tr>
                    {% endfor %}
                    </tbody>
                </table>
            </div>
        </div>
            {% comment %}List of stock information with similar returns{% endcomment %}
        <div class="col">
            <h5>Stocks with similar returns to this stock</h5>
            <div class="">
                <table class="table">
                    <thead>
                    <tr>{% for row in return_header %}
                        <th>{{ row }}</th>{% endfor %}</tr>
                    </thead>
                    <tbody>
                    {% for i in return_contents %}
                        <tr>{% for row1 in i %}
                            <td>{{ row1 }}</td>{% endfor %}</tr>
                    {% endfor %}
                    </tbody>
                </table>
            </div>
        </div>
    </div>
            <div class="">
            <h5>People who buy this brand also buy this brand</h5>
            <div class="">
                <table class="table">
                    <thead>
                    <tr>
                        <th>Fund name</th>
                        <th>return</th>
                        <th>risk</th>
                    </tr>
                    </thead>

                    <tbody>
                    {% for fund in popular_user_funds %}
                        <tr>
                            <th>
                                <a href=" /fund/detail/{{ fund.id }} ">{{ fund.fund_name }}</a>
                            </th>
                            <th>{{ fund.return_percent }}</th>
                            <th>{{ fund.risk }}</th>
                        </tr>
                    {% endfor %}

                    </tbody>
                </table>
            </div>
        </div>
    </div>

One thing to be careful about is how to retrieve risk_contents. This will contain the value of each field in the DataFrame, which is converted to a list with values.tolist () and passed. The data exchanged this time has rows and columns as shown below ... That's okay.

#       col1  col2  col3
# row1     0     1     2
# row2     3     4     5

#Df the above DataFrame.values.tolist()When stored in, it becomes as follows

# [[0, 1, 2], [3, 4, 5]]

#By the way, columns in this DataFrame.tolist()Is stored as follows

# # ['col1', 'col2', 'col3']

Then, for example, I want to retrieve the data of row1! Even if it becomes, there are 3 types of col1 ~ 3, and conversely, if you want to retrieve the data of col1, you may have row1 ~ 2. In that case, the way to store in the list is not the primary array like the result in columns.tolist (), but it can not be retrieved well unless it is a secondary array, so store it as in the above example. Imagine that you are.

How to retrieve an object in a template is the same as a normal for statement. First, {% for fund in popular_user_funds%} passes each mutual_fund_obj to the fund variable. After that, as I mentioned repeatedly, the mutual_fund_obj of the Portfolio model passed this time is related to MutualFund, so it is OK if you write it in the form of fund. Information of the field of the MutualFund model you want to extract. Will be.

reference

Get row number and column number of pandas.DataFrame sort_values, sort_index to sort pandas.DataFrame, Series [Flask] Pass pandas.Series and pandas.DataFrame to the client side Convert pandas.DataFrame, Series and Python standard list to each other Frequent Pandas basic operations in data analysis Summary of Django database operations Django Document 1 Django Document 2 Django Reverse Lookup Cheat Sheet (QuerySet) Aggregate with Django

Recommended Posts

Talking about the features that pandas and I were in charge of in the project
About the features of Python
I want to know the features of Python and pip
Features of pd.NA in pandas 1.0.0 (rc0)
About Boxplot and Violinplot that visualize the variability of independent data
Note that I understand the algorithm of the machine learning naive Bayes classifier. And I wrote it in Python.
I compared the performance of Vaex, Dask, and Pandas in CSV, Parquet, and HDF5 formats (for single files).
The Python project template I think of.
Features of regular expression modules that I often use personally in Python
I made a mistake in fetching the hierarchy with MultiIndex of pandas
Project Euler # 1 "Multiples of 3 and 5" in Python
Linux is something like that in the first place
This and that of the inclusion notation.
Talking about the features that pandas and I were in charge of in the project
12. Save the first column in col1.txt and the second column in col2.txt
[Introduction to Python] Summary of functions and methods that frequently appear in Python [Problem format]
A collection of Numpy, Pandas Tips that are often used in the field
I compared the speed of regular expressions in Ruby, Python, and Perl (2013 version)
Find the index of items that match the conditions in the pandas data frame / series
Note that I understand the least squares algorithm. And I wrote it in Python.
I compared the moving average of IIR filter type with pandas and scipy
I tried the pivot table function of pandas
About the behavior of copy, deepcopy and numpy.copy
About the difference between "==" and "is" in python
Header shifts in read_csv () and read_table () of Pandas
I touched some of the new features of Python 3.8 ①
[Note] About the role of underscore "_" in Python
About the behavior of Model.get_or_create () of peewee in Python
I read and implemented the Variants of UKR
About the * (asterisk) argument of python (and itertools.starmap)
[Linux] I learned LPIC lv1 in 10 days and tried to understand the mechanism of Linux.
About the matter that the contents of Python print are not visible in docker logs
What I thought about in the entrance exam question of "Bayesian statistics from the basics"
I compared the speed of the reference of the python in list and the reference of the dictionary comprehension made from the in list.
Here is one of the apps with "artificial intelligence" that I was interested in.
I used Python to find out about the role choices of the 51 "Yachts" in the world.
I investigated the calculation time of "X in list" (linear search / binary search) and "X in set"
A story about trying to introduce Linter in the middle of a Python (Flask) project
I tried to compare the processing speed with dplyr of R and pandas of Python
About import error of numpy and scipy in anaconda
Think about the next generation of Rack and WSGI
About testing in the implementation of machine learning models
About the inefficiency of data transfer in luigi on-memory
I checked out the versions of Blender and Python
About the uncluttered arrangement in the import order of flake8
I checked the default OS and shell of docker-machine
I tried using the API of the salmon data project
Summary of things that were convenient when using pandas
Personal notes about the integration of vscode and anaconda
A reminder about the implementation of recommendations in Python
Extract the maximum value with pandas and change that value
A server that returns the number of people in front of the camera with bottle.py and OpenCV
About the matter that nosetests does not pass when __init__.py is created in the project directory
I was in charge of maintaining the Fabric script, but I don't know.> <To those who
About MultiIndex of pandas
Code that sets the default value in case of AttributeError
I participated in the translation activity of Django official documents
I tried the accuracy of three Stirling's approximations in python
Mezzanine introduction memo that I got stuck in the flow
About the garbled Japanese part of pandas-profiling in Jupyter notebook
I wrote the basic operation of Seaborn in Jupyter Lab
What I'm glad I studied in 2015 and what I'm thinking of learning in 2016
I tried to summarize the code often used in Pandas
I tried to illustrate the time and time in C language
I tried programming the chi-square test in Python and Java.