R, Google Cloud: Uploading data from R to BigQuery

This is a rather simple task in practice, but the documentation (or lack thereof) made it too hard to figure out for a number of people. So here’s the gist.

  1. You need to have an existing project and dataset in BigQuery (dataset being analogous to a schema or database, depending on your DBMS background).
    ###
    bq_dataset_create("project.dataset")
    ###
  2. Best to create table first before you upload, based on the dataframe definiton (because it seems if you let the upload function create it for you it uses a sample instead of matching the dataframe definition, so it can assume integer for a character field etc.
    ###
    bq_table_create("project.dataset.table_name", as_bq_fields(predictions))
    ###
  3. This is the confusing part. x is actually a bq_table object, which is a reference to the table, not the table itself. Everyone (including me) seems to get stuck trying to pass the dataframe as x but actually it’s just the table reference, and the dataframe is passed as values. So it’s very easy, because a standard string of the table reference (the same as in the FROM statement in SQL) is coercible to a bq_table object.
    ###
    bq_table_upload("project.dataset.table_name", predictions)
    ###

Done!

https://stackoverflow.com/questions/73722424/how-to-upload-table-data-frame-from-r-to-bigquery/77315474#77315474

Leave a comment