It's 2020, so I'm going to stop VBA and use Pandas & python to manage Excel in the company. The following is for those who have read the introductory Python article, so detailed explanations are omitted. There is a lot of information about Pandas and python on the web, so if you're a good boy who wants to try Pandas & python, if you have a problem, check it out.
In a so-called user company where many people know the name in TV commercials, the query in Heisei's ancient Acesss issues a query to some database and then files containing personal information. Output to Excel ... After that, there is office work such as cutting and pasting locally and using it for some other purpose. In the case of listed companies, it is often said that such work is compliance-wise (an example of compliance-like request is to clarify who has seen personal information). In such a case, if you hurry up and run Access on Windows on a server such as AWS and complete the processing on the server side of the CSV output by scheduled execution, compliance is okay for the time being. In AWS, I run a script in Systems Manager and put a file in S3 ... etc. However, in a company with old Access, there is a VB script attached to it, VBA, which is painful, so it is said that it should be replaced with Pandas & python.
$ pip install xlwt
$ pip install openpyxl
If you are wondering what pnp is, please google various things from around nkmk here, the iron plate that you do not remove in the Japanese explanation of Pandas & python:
https://note.nkmk.me/python-pandas-to-excel/
It would be convenient if you could read any one of the Excel sheets quickly. With Pandas, you can quickly convert an Excel sheet to something called DataFrame (df).
n=0 #In pandas, 0 from the right of the Excel sheet,1,2...Specify.
#Example of reading the nth sheet with the file name target
df= pd.read_excel(target, sheet_name=n)
, The Excel sheet is read as a DataFrame (df) where you can extract only a part of it, or delete or add a specific column.
Like this. Note that for serial numbers, Excel is a culture that starts at 1 and pandas is a culture that starts at 0. Please specify the column name you want to assign serial numbers as appropriate.
# new_index_col column(The default value is"Serial number")に1からSerial numberを付与する関数
def getPartDf(df,start, step=5, debug= True, new_index_col="Serial number"):
dfx = df[start:start+step].reset_index()
dfx[new_index_col] = dfx.index +1 #The serial number is given again. If you set this to something other than 1,
dfy = dfx.set_index(new_index_col)
return dfy if debug else dfy.drop("index", axis=1)
If you use a DataFrame read from a ridiculous excel file (like me), you'll want to make sure you've read it, so we also have a debug mode that keeps the original index (Note).
Use the range function to rotate the loop at regular steps (step_num).
core_name = """File after cutting out"""
step_num= 10000
debug = False
# step_Processing to cut out Excel lines in separate files by num.
for i in range(0, len(df), step_num):
print(f"{i}that's all{i+step_num}Less than")
savedDf = getPartDf(df,i, step_num, debug)
if (debug):
print(savedDf.head())
fn = f"{out}/{core_name}{i}_{i+step_num -1}.csv"
savedDf.to_csv(fn, encoding="cp932")
Fractions are handled nicely by pandas.
If debug = False
Screen output:
0 or more and less than 10000
10000 or more and less than 20000
20000 or more and less than 30000
30,000 or more and less than 40,000
40,000 or more and less than 50,000
50,000 or more and less than 60,000
60,000 or more and less than 70,000
(Abbreviation)
len 836543
Part of the output excel file (what if the extension is not displayed ...):
By the way, it takes a lot of time to process with a big Excel partner. This article is also said to have been written during that waiting time. ..
Recommended Posts