Search for large data sources for performance profiling #975
aaronsteers
started this conversation in
Ideas
Replies: 2 comments 7 replies
-
Something in 20 GB range, show extraction on record by record, SDK-based batching, native batching on both. Seems like a good resource https://www.reddit.com/r/bigquery/wiki/datasets |
Beta Was this translation helpful? Give feedback.
5 replies
-
Would getting good metrics logging in the SDK help with this? I imagine it would at least help to compare record-by-record vs batch by looking at a record count timeseries in e.g. Prometheus. For example, backpressure would become apparent. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Wanted to open this discussion to locate large data sources for performance benchmarking.
Related to:
A few known options to kick off the discussion:
tap-socrata
to access one ore more large datasets in the public domain.tap-smoke-test
to fabricate large randomized datasets on-demand.Beta Was this translation helpful? Give feedback.
All reactions