I’m gobsmacked. It’s breathtakingly simple to begin accessing datasets in Google Cloud’s BigQuery RDBMS. No convoluted connection configuration. As simple as a few lines of code and a normal Google login.
This is the primary set of instructions to follow:
https://cloud.google.com/vertex-ai/docs/workbench/user-managed/use-r-bigquery
However, you must first create your own project in Google Cloud (which is mentioned in the instructions too):
https://cloud.google.com/vertex-ai/docs/workbench/user-managed/create-new
Now, you must create a personal project, even when access the Google Cloud Public Data project (permissions and all that).
So, for some example code to run, try the following in RStudio:
install.packages('bigrquery')
library(bigrquery) # Google Cloud BigQuery query utility
# Load data from Google Cloud Public Data - NOAA GSOD weather data.
projectid = "YOUR_PROJECT_HERE"
# Test data extract from GSOD weather data
sql <- "SELECT DISTINCT stn FROM `bigquery-public-data.noaa_gsod.gsod1929`"
# Run the query
tb <- bq_project_query(projectid, sql)
# Store the first 10 rows of the data in a tibble
sample <-bq_table_download(tb, n_max = 10)
# Print the 10 rows of data
sample
Now one thing that isn’t explained is that you may face a prompt that hasn’t been mentioned in the instructions, but it’s very simple to deal with.
> tb <- bq_project_query(projectid, sql)
Selection: 1
Just type 1 and hit Enter. It will open a web page in your browser with a Google login prompt. Just login to Google as you would with any other page, and then tick the boxes when prompted to grant the Tidyverse API access to your Google Cloud project.
And that’s it! No convoluted setup of connection strings or anything. I couldn’t believe it.
Explore the freely available Google Cloud BigQuery public datasets here:
Leave a comment