I have some data that I would like to partition by date, and also partition by an internally-defined client id.
Currently, we store this data uses the table-per-date model. It works well, but querying individual client ids is slow and expensive.
We have considered creating a table per client id, and using date partitioning within those tables. The only issue here is that would force us to incur thousands of load jobs per day, and also have the data partitioned by client id in advance.
Here is a potential solution I came up with: -Stick with the table-per-date approach (eg log_20170110) -Create a dummy date column which we use as the partition date, and set that date to -01-01 (eg for client id 1235, set _PARTITIONTIME to 1235-01-01)
This would allow us to load data per-day, as we do now, would give us partitioning by date, and would leverage the date partitioning functionality to partition by client id. Can you see anything wrong with this approach? Will BigQuery allow us to store data for the year 200, or the year 5000?
PS: We could also use a scheme that pushes the dates to post-zero-unixtime, eg add 2000 to the year, or push the last two digits to the month and day, eg 1235 => 2012-03-05.