Skip to content

Commit 34c30af

Browse files
fixup! Creates error handling section
1 parent 2351632 commit 34c30af

File tree

1 file changed

+52
-46
lines changed

1 file changed

+52
-46
lines changed

content/bigquery.textile

Lines changed: 52 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -3,64 +3,68 @@ title: BigQuery rule
33
meta_description: "Stream realtime event data from Ably into Google BigQuery using the Firehose BigQuery rule. Configure, and analyze your data efficiently."
44
---
55

6-
Stream events published to Ably directly into a table in BigQuery for analytical or archival purposes. Typical use cases include:
6+
Stream events published to Ably directly into a table in "BigQuery":https://cloud.google.com/bigquery?utm_source=google&utm_medium=cpc&utm_campaign=emea-es-all-en-dr-bkws-all-all-trial-e-gcp-1707574&utm_content=text-ad-none-any-dev_c-cre_574561258287-adgp_Hybrid+%7C+BKWS+-+EXA+%7C+Txt+-+Data+Analytics+-+BigQuery+-+v1-kwid_43700072692462237-kwd-12297987241-userloc_1005419&utm_term=kw_big+query-net_g-plac_&&gad_source=1&gclid=Cj0KCQiAwtu9BhC8ARIsAI9JHanslQbN6f8Ho6rvEvozknlBMbqaea0s6ILK-VA9YpQhRr_IUrVz6rYaAtXeEALw_wcB&gclsrc=aw.ds&hl=en for analytical or archival purposes. General use cases include:
77

88
* Realtime analytics on message data.
99
* Centralized storage for raw event data, enabling downstream processing.
10-
* Historical auditing of messages with at least one delivery guarantee.
10+
* Historical auditing of messages.
1111

1212
<aside data-type='note'>
13-
<p>Ably's BigQuery integration rule for Firehose is in development status.</p>
13+
<p>Ably's BigQuery integration rule for "Firehose":/integrations/streaming is in development status.</p>
1414
</aside>
1515

1616
h3(#create-rule). Create a BigQuery rule
1717

18-
Create a BigQuery rule using the Ably Dashboard or the Control API.
19-
20-
Before creating the rule in Ably, ensure the following:
18+
Set up the necessary BigQuery resources, permissions, and authentication to enable Ably to securely write data to a BigQuery table:
2119

2220
* Create or select a BigQuery dataset in the Google Cloud Console.
2321
* Create a BigQuery table in that dataset:
24-
** Use the JSON schema provided below.
25-
** For large volumes of data, partition the table (recommended daily partitioning by ingestion time).
26-
* Create a GCP service account with the minimal required BigQuery permissions:
27-
** *@bigquery.tables.get@* to read table metadata.
28-
** *@bigquery.tables.updateData@* to insert records.
29-
* Add table-level access control to grant the service account permission on the specific table.
30-
* Generate and securely store the JSON key file for the service account. Ably requires this key file to authenticate and write data for your table.
31-
32-
33-
h4(#dashboard). Create a BigQuery rule in the Dashboard
34-
35-
* Log in to the Ably Dashboard and select the application from which you want to stream data.
36-
* Navigate to the *Integrations* tab.
37-
* Click *New Integration Rule*.
38-
* Select *Firehose*.
39-
* Choose *BigQuery* from the list of available Firehose integrations.
40-
* Configure the rule settings as described below.Then, click *Create*.
22+
** Use the "JSON schema":#schema.
23+
** For large datasets, partition the table by ingestion time, with daily partitioning recommended for optimal performance.
24+
* Create a Google Cloud Platform (GCP) "service account":https://cloud.google.com/iam/docs/service-accounts-create?utm_source=chatgpt.com with the minimal required BigQuery permissions.
25+
* Grant the service account table-level access control to allow access to the specific table.
26+
** @bigquery.tables.get@: to read table metadata.
27+
** @bigquery.tables.updateData@: to insert records.
28+
* Generate and securely store the JSON key file for the service account.
29+
** Ably requires this key file to authenticate and write data to your table.
4130

4231
h3(#settings). BigQuery rule settings
4332

4433
|_. Section |_. Purpose |
4534
| *Source* | Defines the type of event(s) for delivery. |
46-
| *Channel Filter* | A regular expression to filter which channels to capture. Only events on channels matching this regex are streamed into BigQuery. |
35+
| *Channel filter* | A regular expression to filter which channels to capture. Only events on channels matching this regex are streamed into BigQuery. |
4736
| *Table* | The full destination table path in BigQuery, typically in the format @project_id.dataset_id.table_id@. |
48-
| *Service Account Key* | A JSON key file Ably uses to authenticate with Google Cloud. You must upload or provide the contents of this key file. |
37+
| *Service account Key* | A JSON key file Ably uses to authenticate with Google Cloud. You must upload or provide the contents of this key file. |
4938
| *Partitioning* | _(Optional)_ The table must be created with the desired partitioning settings in BigQuery before making the rule in Ably. |
5039
| *Advanced settings* | Any additional configuration or custom fields relevant to your BigQuery setup (for future enhancements). |
5140

52-
h4(#api-rule). Creating a BigQuery rule using the Control API
41+
h4(#dashboard). Create a BigQuery rule in the Dashboard
42+
43+
The following steps to create a BigQuery rule using the Ably dashboard:
44+
45+
* Log in to the "Ably dashboard":https://ably.com/accounts/any and select the application you want to stream data from.
46+
* Navigate to the *Integrations* tab.
47+
* Click *New integration rule*.
48+
* Select *Firehose*.
49+
* Choose *BigQuery* from the list of available Firehose integrations.
50+
* Configure the rule settings as described below.Then, click *Create*.
51+
52+
h4(#api-rule). Create a BigQuery rule using the Control API
5353

54-
Follow a similar process to other Firehose rules. When calling the Control API, specify:
54+
The following steps to create a BigQuery rule using the Control API:
5555

56-
* *ruleType*: @bigquery@
57-
* The correct settings, for example the destination table or service account credentials.
56+
* Using the required "rules":/control-api#examples-rules to specify the following parameters:
57+
** @ruleType@: Set this to "bigquery" to define the rule as a BigQuery integration.
58+
** destinationTable: Specify the BigQuery table where the data will be stored.
59+
** @serviceAccountCredentials@: Provide the necessary GCP service account JSON key to authenticate and authorize data insertion.
60+
** @channelFilter@ (optional): Use a regular expression to apply the rule to specific channels.
61+
** @format@ (optional): Define the data format based on how you want messages to be structured.
62+
* Make an HTTP request to the Control API to create the rule.
5863

59-
See the Control API Rules endpoint documentation for examples of creating and managing Firehose rules.
6064

6165
h3(#schema). JSON Schema
6266

63-
Ably recommends creating your BigQuery table using the schema below, which separates standard message fields from the raw payload:
67+
You can run queries directly against the Ably-managed BigQuery table. For example, if the message payloads are stored as raw JSON in the data column, you can parse them using the following query:
6468

6569
```[json]
6670
{
@@ -71,16 +75,9 @@ Ably recommends creating your BigQuery table using the schema below, which separ
7175
}
7276
```
7377

74-
Ably transports arbitrary message payloads (JSON, text, or binary). Storing data in a @BYTES@ column ensures all message content is captured. Use the *content_type* field to understand how to interpret the payload.
75-
76-
h3. Data insertion and semantics
77-
78-
* *Protocol:* Ably uses the BigQuery Storage Write API over gRPC.
79-
* *Delivery guarantee:* At-least-once. You may see duplicate messages in BigQuery under high-throughput or transient failure conditions. You can de-duplicate using the unique *id* in an ETL process or query logic.
80-
8178
h3(#queries). Direct queries
8279

83-
You can run queries directly against the Ably-managed table. For instance, to parse JSON payloads stored in @data@:
80+
Run queries directly against the Ably-managed table. For instance, to parse JSON payloads stored in @data@:
8481

8582
```[sql]
8683
SELECT
@@ -89,14 +86,23 @@ FROM project_id.dataset_id.table_id
8986
WHERE channel = “my-channel”
9087
```
9188

92-
However, JSON parsing at query time can be expensive for large datasets.
89+
The following explains the components of the query:
90+
91+
|. Query Function |. Purpose |
92+
| CAST(data AS STRING) | Converts the data column from BYTES (if applicable) into a STRING format. |
93+
| PARSE_JSON(…) | Parses the string into a structured JSON object for easier querying. |
94+
| WHERE channel = “my-channel” | Filters results to retrieve messages only from a specific Ably channel. |
95+
96+
<aside data-type='note'>
97+
<p>Parsing JSON at query time can be computationally expensive for large datasets. If your queries need frequent JSON parsing, consider pre-processing and storing structured fields in a secondary table using an ETL pipeline for better performance.</p>
98+
</aside>
9399

94-
h4(#etl). ETL (recommended)
100+
h4(#etl). Extract, Transform, Load (ETL)
95101

96-
For large-scale analytics, consider an ETL pipeline to move data from the Ably-managed table to a secondary table with a more specific schema:
102+
ETL is recommended for large-scale analytics and performance optimization, ensuring data is structured, deduplicated, and efficiently stored for querying. Transform raw data (JSON or BYTES) into a more structured format, remove duplicates, and write it into a secondary table optimized for analytics:
97103

98-
* Convert data from raw @BYTES@/JSON into structured columns (for example, geospatial columns, numeric fields).
99-
* Write these transformed records into a new table optimized for your queries.
100-
* Use the unique *id* field to eliminate duplicates.
101-
* Use BigQuery scheduled queries or an external workflow to automate these steps periodically.
104+
* Convert data from raw (BYTES/JSON) into structured columns for example geospatial data fields or numeric data types, for detailed analysis.
105+
* Write transformed records to a new optimized table tailored for query performance.
106+
* Deduplicate records using the unique ID field to ensure data integrity.
107+
* Automate the process using BigQuery scheduled queries or an external workflow to run transformations at regular intervals.
102108

0 commit comments

Comments
 (0)