Geographic information visualization of R and Python that can be expressed in Power BI

This article participates in Power BI Advent Calendar 2019 by Prince @ yugoes1021.

I don't have any more material

I've written various geography-related series in Power BI, but from the point of view of simply visualizing them on a map, there isn't much material left, so it tends to be an article that pokes the corner of a heavy box. (However, I don't have the resources or time to touch the Web version or Embedded)

Geographical analysis with Power BI (basic) Geographical analysis with Power BI (Application 1) Geographical analysis with Power BI (Application 2) US map with Power BI 2017 Advent Calendar Geographical analysis with Power BI (2018 summary) 2018 Advent Calendar

So I settled on the possibility that it would be a map using R or Python on Power BI Desktop. Extensions in R and Python are introduced in various places, so here we will only use the official links.

Create Power BI visuals using R (https://docs.microsoft.com/en-us/power-bi/desktop-r-visuals) Run Python scripts in Power BI Desktop (https://docs.microsoft.com/en-us/power-bi/desktop-python-scripts)

As an issue, we will use the point data visualization that we have used as a benchmark as it is. We are evaluating using the same Uber open data as before. San Francisco taxi probe data.

Basically, for the purpose of simple display, we are focusing on how to call R, Python in Power BI, how to display point data in each library, how to dynamically change the display range, etc. Each library has a very different idea, so it would be nice if you could convey that as well, but it's not possible with this size. .. ..

R

R has more variations. Power BI has more history than Python. A little stumbling block is the R version and installation location used by Power BI. You can set it on the option page below, so specify the R interpreter you want to use and always use. This will save you the trouble of installing the library.

image.png

However, the same interpreter may save the environment in the user folder, in which case you will need to install it in the global interpreter environment.

library(maps)

It's an old library. Basically, it displays various blank maps and displays the data on it. (The ggmap in the code is only used for the convenience function to get the bounding box) You can superimpose points with the with function.

library(maps)
library(ggmap)
sbbox <- make_bbox(lon = dataset$longitude, lat = dataset$latitude, f = 0)
map('usa', col = "grey", fill = TRUE, bg = "white", border = 0, 
  xlim = c(sbbox[1], sbbox[3]), ylim = c(sbbox[2], sbbox[4]))
with(dataset, points(longitude, latitude, pch = 1, col = 'blue', cex = .2))

image.png

library(sf)

A library for handling spatial data appropriately. You need to convert it to an sf format data frame once. You can plot the data frame directly.

library(sf)
library(sp)
dfsf <- dataset %>% st_as_sf(coords = c('longitude', 'latitude'), crs = 4236)
plot(dfsf, col = "blue", pch = 21)

image.png

library(tmap)

It is a library that allows you to draw various thematic maps relatively easily. It is convenient because you can switch between the normal plot mode and the view mode that launches the Leaflet viewer. Like others, Power BI couldn't do anything like launching a browser on Leaflet's external screen. However, as you can see below, the basemap cannot be pasted in plot mode. Sorry.

library(tmap)
library(dplyr)
library(sf)
library(sp)
dfsf <- dataset %>% st_as_sf(coords = c('longitude', 'latitude'), crs = 4236)
tmap_mode("plot")
map <- tm_shape(dfsf, name = "uber") +
    tm_symbols(shape = 21, col = "blue", size = 0.05) +
    tm_basemap("Stamen.Watercolor")
map

image.png

library(ggplot2)

The ability to draw maps is integrated into ggplot. Perhaps the most common data processor is usually the one that works best for you.

library(ggplot2)
library(mapproj)
library(ggmap)
sbbox <- make_bbox(lon = dataset$longitude, lat = dataset$latitude, f = 0)
usmap <- map_data("state") 
ggplot() +
    geom_polygon(data = usmap, aes(x = long, y = lat, group = group), fill = "grey", alpha = 0.5) +
    geom_point(data = dataset, aes(x = longitude, y = latitude)) +
    theme_void() + coord_map(xlim = c(sbbox[1], sbbox[3]), ylim = c(sbbox[2], sbbox[4]))

image.png

library(ggmap)

If you want a more detailed background map, this is it. It is necessary to register the API Key, probably because the restrictions of the Google Maps API have become tight. Also, be sure to get the following development version with useful registration functions.

If you install the latest version by the following method in your R environment, a function called register_google that allows key setting will be included, so upgrade it.

devtools::install_github("dkahle/ggmap")
library(ggplot2)
library(mapproj)
library(ggmap)
register_google(key = "YOUR_API_KEY")
sbbox <- make_bbox(lon = dataset$longitude, lat = dataset$latitude, f = 0)
map <- get_stamenmap(bbox = sbbox, zoom = 13, maptype = "toner-lite")
ggmap(map) +
    geom_point(aes(x = longitude, y = latitude), color = "blue" ,data = dataset, alpha = .5)

image.png

Python

Python has a full-fledged map-based visualization library such as Folium and Shapely, and a geographic data processing library such as geo pandas that is very easy to handle, but when I tried it on Power BI, it did not work easily. did. There was also a person who wanted to run Folium, but as shown below, it seems that only a limited library works with the current Power BI, so I decided to give up obediently. .. ..

Help to implement Python Script - Microsoft Power BI Community

The following Python packages (non-Intel MKL) are currently supported for use in your Power BI reports. Reference: Python packages and versions

  • Matplotlib

Python also sets the interpreter below. I think it will be Anaconda, but please note that even if you install a new library, it will not work in Power BI.

image.png

Matplotlib

Among them, Matplotlib seems to have a library called mpl_toolkits: basemap. It is not a Matplotlib standard and must be installed. Currently, pip installation is not supported and conda etc. is used.

conda install -c anaconda basemap

After installing with, it can be used in the Anaconda environment.

import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
m = Basemap(llcrnrlon=BBox[0],llcrnrlat=BBox[2],urcrnrlon=BBox[1],urcrnrlat=BBox[3])
m.drawcoastlines()
x, y = m(dataset.longitude, dataset.latitude)
m.plot(x, y, 'o')
plt.show()

Results in VS Code: image.png

However, it didn't work in Power BI because it wasn't a library other than Matplotlib in the first place. orz

Annual performance evaluation

We are using the same data as before, so let's compare it with the standard library. I tried to narrow down the number of records in advance with the query editor. Python makes tea muddy and simply displays a 2D graph.

1,000 records

It is displayed without any problem including the standard map. It's just the number displayed. image.png

10,000 records

The standard map will give a message that all points are not displayed. It seems that there are no major omissions as I saw others. The speed doesn't change much either.

image.png

100,000 records

ArcGIS has begun to play. The standard map seems to be randomly sampled, and the range of appearance has not changed much. I don't know that other libraries are running on Power BI, and they seem to be able to see them all. (really?) It doesn't change much except that tmap and ggmap are a little slow. You won't have to wait a minute.

image.png

1,000,000 records

At this point, it seems that data is being thinned out for R visuals as well. Also, in the Uber data, there is a car that goes to Las Vegas, so ggmap takes time to display the whole (map enlargement ratio needs to be adjusted)

image.png

Summary

In such a simple map, it is not meaningful to use R code to visualize it, but if you need special drawing or calculation, you can embed a library firmly in R and use it. , I thought there might be a turn.

Recommended Posts

Geographic information visualization of R and Python that can be expressed in Power BI
Power BI visualization of Salesforce data entirely in Python
Investigation of DC power supplies that can be controlled by Python
Summary of statistical data analysis methods using Python that can be used in business
Set up an FTP server that can be created and destroyed immediately (in Python)
Plot geographic information in Python
Processing of python3 that seems to be usable in paiza
Scripts that can be used when using bottle in Python
Evaluation index that can be specified in GridSearchCV of sklearn
[Python] A program to find the number of apples and oranges that can be harvested
I bought and analyzed the year-end jumbo lottery with Python that can be executed in Colaboratory
This and that of python properties
Hashing data in R and Python
Python standard input summary that can be used in competition pro
I wrote a tri-tree that can be used for high-speed dictionary implementation in D language and Python.
Easy padding of data that can be used in natural language processing
Maximum number of function parameters that can be defined in each language
"Manim" that can draw animation of mathematical formulas and graphs with Python
Basics of Tableau Basics (Visualization Using Geographic Information)
Project Euler # 1 "Multiples of 3 and 5" in Python
Article that can be a human resource who understands and masters the mechanism of API (with Python code)
[Python] The movement of the decorator that can be understood this time ② The decorator that receives the argument
A class for PYTHON that can be operated without being aware of LDAP
I want to create a priority queue that can be updated in Python (2.7)
A personal memo of Pandas related operations that can be used in practice
Easy program installer and automatic program updater that can be used in any language
I made a familiar function that can be used in statistics with Python
List of tools that can be used to easily try sentiment analysis of Japanese sentences in Python (try with google colab)
Functions that can be used in for statements
Building Sphinx that can be written in Markdown
Real-time visualization of thermography AMG8833 data in Python
[Python] I examined the practice of asynchronous processing that can be executed in parallel with the main thread (multiprocessing, asyncio).
Make a joyplot-like plot of R in python
Comparison of R and Python writing (Euclidean algorithm)
Explanation of edit distance and implementation in Python
Overview and useful features of scikit-learn that can also be used for deep learning
[Introduction to Python] Summary of functions and methods that frequently appear in Python [Problem format]
[Python] Introduction to web scraping | Summary of methods that can be used with webdriver
Morphological analysis and tfidf (with test code) that can be done in about 1 minute
Notes on how to use StatsModels that can use linear regression and GLM in python
In Python3.8 and later, the inverse mod can be calculated with the built-in function pow.
A mechanism to call a Ruby method from Python that can be done in 200 lines
Simple statistics that can be used to analyze the effect of measures on EC sites and codes that can be used in jupyter notebook
"Linear regression" and "Probabilistic version of linear regression" in Python "Bayesian linear regression"
Full-width and half-width processing of CSV data in Python
Basic algorithms that can be used in competition pros
Calculation of standard deviation and correlation coefficient in Python
Japanese can be used with Python in Docker environment
Difference between Ruby and Python in terms of variables
Python knowledge notes that can be used with AtCoder
One liner that outputs 1000000 digits of pi in Python
ANTs image registration that can be used in 5 minutes
Can be used in competition pros! Python standard library
Overview of generalized linear models and implementation in Python
Non-linear simultaneous equations can be easily solved in Python.
Sample of getting module name and class name in Python
Summary of date processing in Python (datetime and dateutil)
Install Mecab and CaboCha on ubuntu16.04LTS so that it can be used from python3 series
How to set up a simple SMTP server that can be tested locally in Python
Can be used with AtCoder! A collection of techniques for drawing short code in Python!
[Django] Field names, user registration, and login methods that can be used in the User model