[Python] How to add rows and columns to a table (pandas DataFrame)

[Python] How to add rows and columns to a table (pandas DataFrame)

Some ways to add new columns and rows to an created table.

There are other functions and methods that can be used, but for the time being, only the basics.


**table of contents**
  1. [Original table](# 1 Original table)
  2. [Add Column](# Add Column 2)
    1. df['A']
  3. Add Column (list) (#Add Column list)
  4. [Add column (number)](#Add column)
  5. [Add Column (Table)](#Add Column Table)
  6. [Add Column (Formula)](#Add Column Formula)
  7. [assign method](#assign method)
  8. [join method](#join method)
  9. [Add table with all same row names](# Add table with all same row names No duplicate column names)
  10. [Add table with non-existing row names](#Add table with non-existing row names)
  11. [When column names are duplicated](# When column names are duplicated)
  12. [When some column names are duplicated](#When some column names are duplicated)
  13. [concat function](#concat function sequence)
  14. [Join table](# join table column)
  15. [Presence / absence of axis = 1](presence / absence column of # axis1)
  16. [sort option](#sort option column)
  17. [join options](#join option column)
  18. [Add line](# 3 line added)
  19. [loc method](#loc method)
  20. [Add line (list)](# Add line list)
  21. [Add line (numerical value)](#Additional number of line)
  22. [Add Row (Table)](#Add Row Table)
  23. [Add Row (Formula)](#Add Row Formula)
  24. [concat function](#concat function line)
  25. [Join table](# join table row)
  26. Default
  27. [sort option](#sort option line)
  28. [join option](#join option line)

## 1. 1. Original table Add rows and columns based on a 3-by-3 table .. Stored in variable df.

image.png

Base table creation


row0 = [1, 2, 3]
row1 = [10, 20, 30]
row2 = [100, 200, 300]

df = pd.DataFrame([row0,row1,row2], columns=['col0','col1','col2'])
df.index = ['row0', 'row1', 'row2']

df

## 2. Add column Intuitively easy to use are `df ['A']` and ʻassign`.

・ Df ['A'] -Assign method ・ Join method ・ Concat function

■df['A'] df['A'] = B └ "df": Original table └ "A": Name of the column to add └ "B": What to add

The content to be added can be specified relatively freely. Numerical values, values, table data, formulas, lists, etc. can be used.

** ▼ Case Study ** ① Add column (list) ② Addition of columns (numerical value) ③ Add column (table) ④ Add column (formula)


#### ① Add column (list)

Add column (list)


df['col3'] = [4, 40, 400]
df

image.png


#### ② Addition of columns (numerical value)

Add column (number)


df['col3'] = 4
df

image.png


#### ③ Add column (table)

Add column (table)


df['col3'] = df['col2']
df

image.png


#### ④ Add column (formula)

Add column (formula)


df['col3'] = df['col2'] * 100
df

image.png


## ■ assign method You can also add by specifying the column name with the assign method. `df.assign(A=[a,b,,], B=[c,d,,],,)` └ "A" "B": Column name to add └ "a, b ,," "c, d ,,": Contents of each column

・ Not overwritten ・ Column names do not require quotation marks -If the element is specified by list, it must match the number of rows in the base table.


> Error example ・ There is a quotation mark in the column name SyntaxError: expression cannot contain assignment, perhaps you meant "=="?

・ No column name TypeError: assign() takes 1 positional argument but 2 were given

・ The number of elements in list does not match ValueError: Length of values does not match length of index


** ▼ 1 column added **

1 row added


df.assign(A=[1,2,3])

image.png


** ▼ 1 column added (table data / formula) **

1 column added (table data / formula)


df.assign(A=df['col0']*100)

image.png


** ▼ 2 columns added **

2 columns added


df.assign(A=[1,2,3], B=100)

image.png

If you add "column name = contents", the number of columns will increase.


## ■ join method The join method can be used to join tables. It cannot be used for things that do not have a line number (index) such as list and numbers.

join(dfA, rsuffix='_a') └ "dfA": Table to join └ "rsuffix ='_a'": If the column name is duplicated, add "_a" (optional) to the added column name.


** ▼ Case Study ** Check the processing for each pattern of the table to be added.

(1) Add a table with the same row name (no duplicate column name) (2) Add a table with row names that do not exist ③ When the column name is duplicated ④ When some column names are duplicated


#### (1) Add a table with the same row name (no duplicate column name)

The row name matches the row name of the base table and the column name does not duplicate an existing one.

Table with the same row name


dfA = pd.DataFrame([100,200,300])
dfA.index = ['row0', 'row1', 'row2']
dfA

image.png

** ▼ Add **

add to


df.join(dfA)

image.png

A new column has been added.


#### (2) Add a table with row names that do not exist

If there is a row name that does not exist in the base table └ ** No new rows are added to the base table **. └ ** Data whose row names do not match will be NaN ** (missing value).


** ▼ Table to add ** Only row0 exists. XXX and YYY are not in the base table.

Table with row names that do not exist


dfB = pd.DataFrame([100,200,300])
dfB.index = ['row0', 'XXX', 'YYY']
dfB

image.png

** ▼ Add **

add to


df.join(dfB)

image.png


#### ③ When the column name is duplicated `join(dfA, rsuffix='_a')`

If the column name is duplicated, specify the last character to be added to the duplicate column name with "rsuffix ='_a'".

** ▼ Options ** -"Rsuffix ='_a'": ** Add the specified character to the duplicate column name ** to be added (abbreviation of right suffix)

-"Lsuffix ='_a'": ** Add the specified character to the existing duplicate column name ** (abbreviation of left suffix)


** If not specified, an error ** will occur.
** ▼ Table to add ** A table that contains the same column name (col0) as an existing table.

Duplicate column name


dfC = pd.DataFrame([100,200,300], columns=['col0'])
dfC.index = ['row0', 'row1', 'XXX']
dfC

image.png

** ▼ Add (for rsuffix) **

rsuffix


df.join(dfC, rsuffix='_@')

image.png


** ▼ Add (for lsuffix) **

rsuffix


df.join(dfC, lsuffix='_@')

image.png


#### ④ When some column names are duplicated The process is the same as when the column name in ③ is duplicated.

-Duplicate columns have the suffix specified in the column name. -Column names that do not overlap are added as they are.


** ▼ Table to add ** A table that contains the same column names (col0, col1) as an existing table.

Partially duplicate column names


list1 = [100,200,300]
list2 = ['A','B','C']
list3 = ['AAA','BBB','CCC']
list4 = ['10A','20B','30C']

dfD = pd.DataFrame([list1,list2, list3, list4], columns=['col0', 111,'col1'])
dfD.index = ['row0', 'row1', 'XXX', 'YYY']
dfD

image.png


** ▼ Add **

rsuffix


df.join(dfD, rsuffix='_@')

image.png

--The specified character string is added to the duplicate column name --Rows that do not exist in the base table are not added --In the column to be added, NaN will be entered in the row that did not exist


## ■ concat function (column) Join the tables using the concat function.

pd.concat([df, dfA], axis=1) └ "df": Base table └ "dfA": Table to add └ "axis = 1": Instruction to add column

-Add as it is even if the column name is duplicated -If the line name is different, add a new line (default: join ='outer') -With join ='inner' option, leave only those with matching line names


① Table to join ② Presence or absence of axis = 1 ③ sort option ④ join option
> ・ List and numerical values cannot be combined -TypeError: cannot concatenate object of type ''; only Series and DataFrame objs are valid

・ Meaning of "concat" concatenate: concatenate. connect.


### ① Table to join (column) Use the table "dfA" in 3 rows and 3 columns below.

image.png

-Columns "col0" and "col1" overlap with the base table. -The row "xxx" is not in the base table

Table to join


list1 = [1,100,'AAA']
list2 = [2,200,'BBB']
list3 = [3,300,'CCC']

dfE = pd.DataFrame([list1,list2, list3], columns=['col0', 'col1', 'aaa'])
dfE.index = ['row0', 'row1', 'XXX']
dfE

### ② Presence / absence of axis = 1 (column) `pd.concat([df, dfA], axis=1)`

** "axis = 1" is required when adding columns **. If it is not attached, it will be added in the row direction and the result will be significantly different.


** ▼ There is "axis = 1" **

「axis=1 ”Yes


pd.concat([df, dfA], axis=1)

image.png

・ Add columns -Add as it is even if the column name is duplicated (different from join method) -The value of the row that is not in the table to be added is NaN. -Added new rows that are not in the base table.


** ▼ "axis = 1" None ** The form in which "axis = 0" is omitted. It will be combined in the row direction.

「axis=1 "none


pd.concat([df, dfA])

image.png

-If the column names do not match, a new column is added. -Even if the line name is duplicated, all are newly added.

** * Columns are added, but the rows are not integrated, so the content is different from what you want to execute. ** **


### ③ sort option (column) If you join the tables without the sort option, they will be automatically sorted by row name. └ Default: `sort = True`
** ▼ sort = True (default) **

sort=True


pd.concat([df, dfA], axis=1)

image.png

The line "XXX" comes up with automatic sorting.


**▼sort=False**

sort=False


pd.concat([df, dfA], axis=1, sort=False)

image.png

If you write "sort = False", the added line will be added at the end.


** ▼ When the base table is replaced **

sort=False


pd.concat([dfA, df], axis=1, sort=False)

image.png

A line that does not exist in the base "dfA" is added to the end.


### ④ join option (column) Determine line processing (when axis = 1)

join='outer' └ Default setting. └ Leave a line that does not exist.

join='inner' └ Leave only duplicate lines.


**▼join='outer'**

join='outer'


pd.concat([dfA, df], axis=1)

image.png

The line "XXX" that does not exist before and after the merge is added.


**▼join='inner'**

join='outer'


pd.concat([dfA, df], axis=1, join='inner')

image.png

Deleted rows "row2" and "XXX" that exist only in either one.


## 3. 3. Add line The loc method is intuitively easy to use.

・ Loc method ・ Concat function

■ loc method

df.loc['A']=B └ "df": Original table └ "A": Name of the line to add └ "B": What to add

  • If you specify an existing line name, it will be overwritten.

The content to be added can be specified relatively freely. Numerical values, values, table data, formulas, lists, etc. can be used.

① Add line (list) ② Add line (numerical value) ③ Add row (table) ④ Add line (formula)


#### ① Add line (list)

Add line (list)


df.loc['AAA'] = [4, 40, 400]
df

image.png


#### ② Add line (numerical value)

Add line (number)


df.loc['AAA'] = 4
df

image.png


#### ③ Add row (table)

Add row (table)


df.loc['AAA'] = df.loc['row2']
df

image.png image.png


#### ④ Add line (formula)

Add line (formula)


df.loc['AAA'] = df.loc['row2'] * 100
df

image.png


## ■ concat function (line) Join the tables using the concat function.

pd.concat([df, dfA]) └ "df": Base table └ "dfA": Table to add

-Add as it is even if the line name is duplicated -Add a new column if the column name is different (default: join ='outer') -With join ='inner' option, leave only those with matching column names


① Table to join ② Default ③ sort option ④ join option
> ・ List and numerical values cannot be combined -TypeError: cannot concatenate object of type ''; only Series and DataFrame objs are valid

・ Meaning of "concat" concatenate: concatenate. connect.


### ① Table to join (row) Use the table "dfA" in 2 rows and 3 columns below.

image.png

Table to join


list1 = [1,100,'AAA']
list2 = [2,200,'BBB']

dfA = pd.DataFrame([list1,list2], columns=['col0', 'col1', 'aaa'])
dfA.index = ['row0', 'XXX']
dfA

#### ② Default `pd.concat([df, dfA])`

Default


pd.concat([df, dfA])

image.png

-Even if the line name is duplicated, all are newly added. -A new column is added for columns that do not match. ・ Not applicable cells are filled with NaN (missing value)


### ③ sort option (row) If you join the tables without the sort option, they will be automatically sorted by column name. └ Default: `sort = True`
** ▼ sort = True (default) **

sort=True


pd.concat([df, dfA])

image.png

The "aaa" column added later comes to the top by automatic sorting.


**▼sort=False**

sort=False


pd.concat([df, dfA], sort=False)

image.png

The "aaa" column added later is joined at the end.


### ④ join option (line) Decide how to handle the columns.

join='outer' └ Default setting. └ Leave a column that does not exist.

join='inner' └ Leave only duplicate columns.


**▼join='outer'**

join='outer'(Default)


pd.concat([df, dfA])

image.png

The column "aaa" that does not exist in the table before and after the join remains.


**▼join='inner'**

join='inner'


pd.concat([df, dfA], join='inner')

image.png

Only the columns that exist in both the tables before and after the join remain.


Back to top (#python how to add rows and columns to a table pandas dataframe)

Recommended Posts