This is a rather simple task in practice, but the documentation (or lack thereof) made it too hard to figure out for a number of people. So here’s the gist.
- You need to have an existing project and dataset in BigQuery (dataset being analogous to a schema or database, depending on your DBMS background).
######
bq_dataset_create("project.dataset") - Best to create table first before you upload, based on the dataframe definiton (because it seems if you let the upload function create it for you it uses a sample instead of matching the dataframe definition, so it can assume integer for a character field etc.
######
bq_table_create("project.dataset.table_name", as_bq_fields(predictions)) - This is the confusing part. x is actually a
bq_tableobject, which is a reference to the table, not the table itself. Everyone (including me) seems to get stuck trying to pass the dataframe as x but actually it’s just the table reference, and the dataframe is passed as values. So it’s very easy, because a standard string of the table reference (the same as in the FROM statement in SQL) is coercible to abq_tableobject.
######
bq_table_upload("project.dataset.table_name", predictions)
Done!
Leave a comment