Until now, as a plotting using pandas + matplotlib, [various data plotting with pandas + matplotlib]( I have introduced things such as http://qiita.com/ynakayama/items/68eff3cb146181329b48) and Data visualization method by matplotlib (+ pandas).
Extract and Process When looking down on the data, the flow up to visualization is renewed. Let's organize and follow.
First you'll bring the dataset to the world of pandas, which has two main streams.
Of these, 1. is used when there is already structured data that can be used as is in an external file. For example, if you have a file called iris.csv, make it a pandas object as follows.
df = pd.read_csv("iris.csv")
Regarding 2., use it when you want to handle the data generated when extracting or processing with Python code to some extent with pandas. You may want to refer to pandas as it has rich documentation. The from_dict function converts the dictionary object as is into a data frame. If you want to specify the index explicitly, it is convenient to use the from_records function.
df = pd.DataFrame.from_records(my_dic, index=my_array)
In datasets, the X and Y axes are often the opposite perspective for the observer. Even in such a case, if it is a pandas data frame, it is always easy to use the .T method [Transpose Matrix](http://ja.wikipedia.org/wiki/%E8%BB%A2%E7%BD%AE%E8% You can get A1% 8C% E5% 88% 97). This is a very common use and should be remembered.
dft = df.T
In pandas textbooks, it seems that df = df.T is often set, but I prefer non-destructive conversion as above. ..
Writing code that uses matplotlib also requires trial and error. At this time, it is efficient to repeat the steps of quickly drawing and checking the data frame diagram on IPython.
The ipython -i option allows you to specify a Python script as an argument, which allows you to work with the interactive shell while running this script. This is very convenient.
For example, if you have a class like this:
class MyClass:
def __init__(self, args):
self.my_var = args[1]
self.my_array = []
self.my_dic = {}
def my_method(self):
...
If you start the shell as ipython -i my_class.py, MyClass will be loaded and you can retrieve the object as follows.
my_instance = MyClass()
arr = my_instance.my_array
dic = my_instance.my_dic
If you used my_method to store data in an instance variable such as self.my_dic, you can retrieve the data from this instance variable as above, and plot from here for interactive visualization.
In the first place, it is the usual two-dimensional data when it can be converted to a data frame, so it can be said that the work to be done has become apparent to some extent if explained so far.
Here are some visualization methods to try first.
The well-known Iris is used as the data set.
We have already introduced the details of shapes many times, so please refer to Past Articles.
First is the standard scatter plot matrix.
plt.figure() #Prepare the canvas
from pandas.tools.plotting import scatter_matrix
scatter_matrix(df) #Draw a scatterplot matrix
plt.show() #When displaying images interactively
plt.savefig("1.png ") #When outputting to an image file
This is a Suguremono that gives you a bird's eye view of the correlation between each column and each row. If you can stabilize your mind by looking at the scatter plot matrix, you will be used to it.
After that, the step of preparing the canvas and the step of outputting the image will be omitted.
df.plot(legend=True)
As I've mentioned many times, pandas defaults to True for legend. If you can't see the figure well because of the explanation, you can set legend = False.
If you try to plot the first 10 data frames, it will look like this.
df10 = df.head(10)
df10.plot(kind='barh', stacked=True, alpha=0.5, legend=True)
Bar graphs are useful for visualizing by narrowing down to a one-dimensional vector space.
df['sepal width'].hist()
This is useful for tracking changes in multiple data over time.
df.plot(kind='area', legend=True)
How about. As you get used to it, you will unknowingly use the interactive shell to plot when facing the data. You can see the power of IPython, which allows quick trial and error, and pandas + matplotlib, which can be used seamlessly with Python, as productive tools.
Recommended Posts