Python3 Engineer Certification Data Analysis Exam Self-made Questions

Overview

This is a collection of self-made questions that I made as one of the study methods in the Python3 engineer certification data analysis test that I took in November 2020. I hope it will help those who are going to take the exam.

The experience report is summarized in this article ↓ https://qiita.com/pon_maeda/items/a6c008fb3d993278fccb

important point

――This collection of questions is created in the form of questions such as answering each question and filling in the blanks so that you can easily solve it in the gap time. -** Please note that the actual test is a four-choice format (as of November 15, 2020) . - It is a little more difficult than the actual exam. ** ** ――Since it was roughly created for personal use, it may not be a problem statement. Please forgive me.

Exercise books

1. Role of data analysis engineer

Machine learning is roughly divided into three. There are three types: () learning, () learning, and () learning.

Answer
--Supervised learning --Unsupervised learning --Reinforcement learning

The () variable, also known as the correct label, is used only for () learning.

Answer
--Objective variable --Supervised learning

The method used when this correct label is a continuous value is (), and the method used when it is another value is ().

Answer
Continuous value: Regression Other values: Classification

What are the two main methods of unsupervised learning?

Answer
--Clustering --Dimensionality reduction

2. Python and environment

venv is a tool that allows you to use different versions of Python. (Yes / No)

Answer
No Since venv is built under Python, you can't version control Python itself.

A function that allows you to specify a file name with a wildcard in Python.

Answer
glob function

3. Foundations of mathematics

Japanese reading of sin, cos, and tan.

Answer
sin: sine con: cosine tan: tangent

How many Napiers are there?

Answer
2.7182…

What is the logarithm of 1?

Answer
0

The factorial of 1 is.

Answer
1

Suppose you are told that if you roll a hexahedral dice once, you will get an odd number, although the number of rolls is unknown. The probability in this case is called the () probability, which is the basis of the () theorem.

Answer
--Conditional probability --Bayes' theorem

4. Practice of analysis by library

4.1. NumPy

4.1.1. Overview of NumPy

NumPy has a type for arrays () and a type for matrices ().

Answer
For arrays: ndarray For matrix: matrix * In the data analysis test, ndarray plays a leading role

One of the features of ↑ is that you can use multiple types or make one type.

Answer
Must be one type. This is the difference from DataFrame.

4.1.2. Handle data with NumPy

Function to check the size in an array

Answer
shape function

The ravel function returns (), while the flatten function returns ().

Answer
ravel function: returns a reference (or a shallow copy) flatten function: returns a (deep) copy

Function to check the type of array

Answer
dtype function

Function to convert array type

Answer
astype function

A function that generates a uniform random number of integers

Answer
np.random.randint function * Generated in the range of {{first argument}} or more and less than {{second argument}} * If you pass a tuple as the third argument, it will be generated with that matrix size.

A function that generates a uniform random number of decimals

Answer
np.random.uniform function * Arguments are the same as the np.random.randint function

A function that creates a random number from a standard normal distribution of integers

Answer
np.random.randn function

Is the standard normal distribution the mean () or variance () distribution?

Answer
Distribution of mean 0, variance 1

What is the function to generate a normal distribution random number by specifying the mean and standard deviation?

Answer
np.random.normal function

A function that creates an identity matrix with the specified diagonal elements

Answer
np.eye function With np.eye (3), you can do something like this array([[1., 0., 0.], [0., 1., 0.], [0., 0., 1.]])

A function that creates an array of specified values for all elements

Answer
np.full function Example: np.full ((2, 4), np.pi)

A function that creates an evenly divided array in a specified range

Answer
np.linspace function Example: np.linespace (0, 1, 5) // → array ([0., 0.25, 0.5, 0.75, 1.0])

A function that allows you to see the differences between the elements of an array

Answer
np.diff function
a = [1, 2, 3]
b = [4, 5, 6]
np.concatnate([a, b])

Then, which of the following is possible?

  1. [1, 2, 3, 4, 5, 6]
  2. [[1, 2, 3],[4, 5, 6]]
  3. [1, 2, 3, [4, 5, 6]]
Answer
1. `[1, 2, 3, 4, 5, 6]`

The np.concatnate function is (row or column) directional concatenation in the case of concatenation between one-dimensional arrays.

Answer
Connected in the column direction. (Same behavior as hstack function)

The np.concatnate function is concatenated in the (row or column) direction by default when concatenating two-dimensional arrays.

Answer
Concatenated in the row direction. (Same behavior as vstack function)

If the argument axis = 1 is specified for this function, it becomes () direction concatenation.

Answer
Connected in the column direction. (Same behavior as hstack function)

A function that divides a two-dimensional array in the column direction.

Answer
np.hsplit function Example) first, second = np.hsplit (hoge_array, [2]) # → Split in the 3rd column

A function that splits a two-dimensional array in the row direction

Answer
np.vsplit function Example) first, second = np.vsplit (hoge_array, [2]) # → Split at 3rd line

What does transpose of a two-dimensional array mean?

Answer
Swap rows and columns

If you have a two-dimensional array called a, how do you transpose it?

Answer
a.T

What is a function that increases the dimension of a one-dimensional array without specifying the number of elements?

Answer
np.newaxis function * If you can specify the number of elements, you can also use the reshape function.
a = np.array([1, 5, 4])
# array([[1, 5, 4]])

How can I use the above function to increase the dimensions as described above?

Answer
a[np.newaxis, :]
a = np.array([1, 5, 4])
# array([[1],
         [5],
         [4]])

How can I use the above function to increase the dimensions as described above?

Answer
a[:, np.newaxis]

What is the function that generates the grid data?

Answer
np.meshgrid function
np.arange(1, 10, 3)

What will happen to this result?

Answer
array([1, 4, 7]) 1 or more and less than 10 (that is, up to 9) are divided into 3 equal parts.

4.1.3. NumPy features What is NumPy's convenience function group that converts array elements such as sin () and log () at once?

Answer
Universal function

A function that returns the absolute value of an array element

Answer
np.abs function
a = np.array([0, 1, 2])
b = np.array([[-3, -2, -1],
              [0, 1, 2]])
a + b

As mentioned above, what is the sum of the two-dimensional array and the one-dimensional array?

Answer
array([[-3, -1, 1], [0, 2, 4]]) It is added to b as if a became two lines.

What does it mean to be able to compute a scalar on an array?

Answer
broadcast

What does the @ operator mean?

Answer
Neutral operator for matrix multiplication
A_matrix @ B_matrix

In a different way.

Answer
np.dot(A_matrix, B_matrix) Or A_matrix.dot (B_matrix)

A function that calculates the number of True in an array of truth.

Answer
np.count_nonzero function Or the np.sum function

--np.count_nonzero method --A function that outputs the number of non-zero elements. --Python treats False as 0, so it counts the number of True. --np.sum function --Function to add in elements --Python treats True as 1, so the number of True is calculated as a result.

A function that finds whether True is included in an array of truth.

Answer
np.any function

A function that finds whether all elements are True in an array of truth.

Answer
np.all function

4.2. pandas

4.2.1. Overview of pandas

With df.head () and df.tail (), output only the () line at the beginning and end of the DataFrame.

Answer
5 lines

Function to know the size of df

Answer
df.shape

How to get two pieces of information from df, A column and B column

Answer
`df[“A“, “B“]` Or `df.loc [:, ["A "," B "]]` etc.

4.2.2. Reading / writing data

4.2.3. Data shaping

How to extract only records with 10,000 steps or more, assuming that there is a df that is a data frame of the number of steps and calories ingested

Answer
`df [df [“steps ”]> = 10000]`

Or df [df.loc [:,“ steps ”]> = 10000] df.query ('steps> = 10000') etc.

How to sort in descending order of steps, assuming there is df which is a DataFrame of steps and calories ingested

Answer
df.sort_values (by = ”steps”, ascending = False)

One-hot encode the motion index column containing the three values High, Mid, and Low, adding "exercise" to the prefix.

Answer
df.get_dummies (df.loc [:, “exercise index“], prefix = ”exercise”)

4.2.4. Time series data

How to create an array of dates from 2020-01-01 to 2020-10-01.

Answer
pd.date_range(start=”2020-01-01”, end=”2020-10-01”)

Create an array of dates for 100 days from 2020-01-01.

Answer
pd.date_range(start=”2020-01-01”, period=100)

Create an array only for Saturday among the dates from 2020-01-01 to 2020-10-01.

Answer
pd.date_range(start=”2020-01-01”, end=”2020-10-01”, freq=”W-SAT”)

Group the time series data df into monthly data and use the average value.

Answer
`df.groupby(pd.Grouper(freq='M')).mean()`

Or df.resample ('M'), mean () etc.

4.2.5. Missing value processing

Argument used when you want to fill Nan with the previous value in the fillna function.

Answer
`df.fillna(method='ffill')`

If it is a DataFrame, fill it with the value one line above. If it is bfill, it will be filled with the value one line below.

What if you want to give the median value to the argument of the fillna function?

Answer
`df.fillna(df.median())` * Note that it is not `method ='median'`

4.2.6. Data consolidation

Create df_merge by concatenating df_1 and df_2 in the column direction.

Answer
df_merge = pd.concat([df_1, df_2], axis=1)

4.2.7. Handling of statistical data

Function to check the mode

Answer
mode function

Function that gives the median

Answer
median function

A function that yields the standard deviation (sample standard deviation)

Answer
std function

Functions and arguments that give the standard deviation (population)

Answer
Pass the ddof = 0 argument to the std function

4.3. Matplotlib

Where is the pie chart placed?

Answer
Placed from above

The pie chart is arranged around (clockwise or counterclockwise).

Answer
clockwise

For pie charts, pass the () argument to the () method to implement it clockwise.

Answer
In the `pie method`, pass` counterclock = False`. Somehow, I write it on the world's website in reverse. why. Lol The default is counterclock = True

To specify where to start drawing the graph in a pie chart, pass the () argument to the () method.

Answer
`startangle = {{angle where you want to start output}}` The default value is None, which is drawn from the 3 o'clock position. It will be from 12 o'clock by specifying 90 degrees.

4.4. scikit-learn

4.4.1. Preprocessing

Missing value

What class is used to complement the data if there are missing values?

Answer
Imputer class

About the value passed to the strategy argument in the above class.

mean = ①、median = ②、most_frequent = ③

Answer
1. Average 2. Median 3. Mode
Category variable encoding

What is the class that encodes categorical variables?

Answer
LabelEncoder class

What is the attribute that confirms the original value after encoding?

Answer
.classes_ attribute

Along with the encoding of categorical variables, what is the major processing method?

Answer
`One-hot encoding` If you have 4 blood types, add 4 columns to make it a flag.

Another way to call this encoding.

Answer
Dummy variable

What do you call a matrix with many components 0 and a matrix with many non-zero components?

Answer
Sparse and dense matrices
Feature normalization

Distributed normalization is the process of converting features so that the mean of the features is () and the standard deviation is ().

Answer
Feature `mean is 0`,` standard deviation is 1`

What is the class that performs distributed normalization?

Answer
StanderdScaler class

Minimum / maximum normalization is the process of converting features so that the minimum value of the feature is () and the maximum value is ().

Answer
The `minimum value of the feature is 0` and the` maximum value is 1`.

What is the class that performs minimum / maximum normalization?

Answer
MinMaxScaler class

4.4.2. Classification

Classification is a typical task of supervised learning.

Answer
Supervised learning Classification uses known data as a teacher and learns a model that distributes each data to classes.

The above uses the correct label, which is called the () variable.

Answer
Objective variable

Three typical classification algorithms

Answer
--Support vector machine --Decision tree --Random forest
Flow of classification model construction

To build a classification model, the data at hand is ().

Answer
Divide into a training dataset and a test dataset.

"Learning" in classification refers to building a classification model using () datasets.

Answer
Training dataset

What is the ability to respond to unknown data calculated from predictions for the test data set of the constructed model?

Answer
Generalization ability

What is the function that separates each dataset?

Answer
model_selection.train_test_split function
scikit-learn uses the () function for learning and the () function for prediction.
Answer
Learning: fit function Prediction: predict function
Support vector machine

Support vector machines are algorithms that can be used not only for classification and regression, but also for ().

Answer
Outlier detection

When considering 2D data belonging to two classes, what is the data closest to the boundary among the data of each class?

Answer
Support vector

When considering 2D data belonging to two classes, draw a straight line in () so that the distance between the support vectors is the largest ().

Answer
--Large (far) --Decision boundary

The distance between this straight line and the support vector is called ().

Answer
margin
Random forest

What is the data of randomly selected samples and features (explanatory variables) used in Random Forest?

Answer
Bootstrap data

Random forest is a set of decision trees, and what is learning using multiple learning machines in this way?

Answer
Ensemble learning

4.4.3. Regression

Regression is the task of explaining () variables with () variables represented by features.

Answer
--Objective variable --Explanatory variable

In linear regression, when the explanatory variable is one variable, it is called (), and when there are two or more variables, it is called ().

Answer
--Simple regression --Multiple regression

4.4.4. Dimensionality reduction

A task that () data data without damaging the information it has.

Answer
compression
Principal component analysis

In scikit-learn, which class of which module is used for principal component analysis.

Answer
decomposition.PCS class

4.4.5. Model evaluation

Category classification accuracy

Four indicators that quantify the extent to which data categories have been assigned.

() Rate, () Rate, () Rate, () Value

Answer
--Compliance rate - Recall --F value --Correct answer rate

In addition, these indicators are calculated from the () matrix.

Answer
Confusion matrix

There is a trade-off between the () rate and the () rate.

Answer
--Compliance rate - Recall
Prediction probability accuracy

The () curve and () calculated from it are used as indicators to quantify the accuracy of the prediction probability for the data.

Answer
--ROC curve - AUC

4.4.6. Hyperparameter optimization

Hyperparameters have values (determined or undetermined) during training.

Answer
Not decided. Apart from learning, the user needs to specify the value.

Two typical methods for optimizing hyperparameters.

Answer
--Grid search --Random search

finally

It's a poor problem, but I hope it helps someone. If you make any mistakes, I would be grateful if you could comment on them. Thank you until the end.

Recommended Posts

Python3 Engineer Certification Data Analysis Exam Self-made Questions
Python 3 Engineer Certification Data Analysis Exam Pre-Exam Learning
Have passed the Python Engineer Certification Data Analysis Exam
python engineer certification exam
Python 3 Engineer Certified Data Analysis Exam Preparation
[Examination Report] Python 3 Engineer Certified Data Analysis Exam
[Python3 engineer certification data analysis test] Examination / passing experience
Data analysis python
Take the Python3 Engineer Certification Basic Exam
(Maybe) This is all you need to pass the Python 3 Engineer Certification Data Analysis Exam
Data analysis with python 2
Data analysis using Python 0
Data analysis overview python
Programming beginner Python3 engineer certification basic exam record
Python data analysis template
Data analysis with Python
[For beginners] How to study Python3 data analysis exam
Is the Python 3 Engineer Certification Basic Exam Really Easy?
Impressions of taking the Python 3 Engineer Certification Basic Exam
My python data analysis container
Python for Data Analysis Chapter 4
[Python] Notes on data analysis
Python data analysis learning notes
Python for Data Analysis Chapter 2
Data analysis using python pandas
Python for Data Analysis Chapter 3
How to study Python 3 engineer certification data analysis test by Python beginner (passed in September 2020)
Preprocessing template for data analysis (Python)
Data analysis starting with python (data visualization 1)
Logistic regression analysis Self-made with python
Data analysis starting with python (data visualization 2)
I studied four libraries of Python 3 engineer certified data analysis exams
How to pass and study the Python 3 Engineer Certification Basic Exam
I passed the python engineer certification exam, so I released the study method
Python3 Engineer Certification Basic Exam-I tried to solve the mock exam-
A memorandum regarding the acquisition of the Python3 engineer certification basic exam
Python visualization tool for data analysis work
[Python] First data analysis / machine learning (Kaggle)
Data analysis starting with python (data preprocessing-machine learning)
I did Python data analysis training remotely
How to study Python 3 engineer certification basic exam by Python beginner (passed in August 2020)
A story about a liberal arts programming amateur getting a Python 3 engineer certification basic exam
[CovsirPhy] COVID-19 Python Package for Data Analysis: Data loading
Python3 Engineer Certification Basic Exam-Notes and Problem Trends-
[Python] Data analysis, machine learning practice (Kaggle) -Data preprocessing-
Data analysis in Python: A note about line_profiler
[Python] Flow from web scraping to data analysis
A well-prepared record of data analysis in Python
Data analysis Titanic 1
How an "amateur banker" passed the Python 3 Engineer Certification Basic Exam in a week
Data analysis Titanic 3
[python] Read data
[Python] [Word] [python-docx] Simple analysis of diff data using python
Reading Note: An Introduction to Data Analysis with Python
Data analysis environment construction with Python (IPython notebook + Pandas)
Challenge principal component analysis of text data with Python
List of Python code used in big data analysis
[CovsirPhy] COVID-19 Python package for data analysis: SIR-F model
[CovsirPhy] COVID-19 Python package for data analysis: S-R trend analysis
[CovsirPhy] COVID-19 Python Package for Data Analysis: SIR model
[CovsirPhy] COVID-19 Python Package for Data Analysis: Parameter estimation