You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/bigquery.textile
+52-46Lines changed: 52 additions & 46 deletions
Original file line number
Diff line number
Diff line change
@@ -3,64 +3,68 @@ title: BigQuery rule
3
3
meta_description: "Stream realtime event data from Ably into Google BigQuery using the Firehose BigQuery rule. Configure, and analyze your data efficiently."
4
4
---
5
5
6
-
Stream events published to Ably directly into a table in BigQuery for analytical or archival purposes. Typical use cases include:
6
+
Stream events published to Ably directly into a table in "BigQuery":https://cloud.google.com/bigquery?utm_source=google&utm_medium=cpc&utm_campaign=emea-es-all-en-dr-bkws-all-all-trial-e-gcp-1707574&utm_content=text-ad-none-any-dev_c-cre_574561258287-adgp_Hybrid+%7C+BKWS+-+EXA+%7C+Txt+-+Data+Analytics+-+BigQuery+-+v1-kwid_43700072692462237-kwd-12297987241-userloc_1005419&utm_term=kw_big+query-net_g-plac_&&gad_source=1&gclid=Cj0KCQiAwtu9BhC8ARIsAI9JHanslQbN6f8Ho6rvEvozknlBMbqaea0s6ILK-VA9YpQhRr_IUrVz6rYaAtXeEALw_wcB&gclsrc=aw.ds&hl=en for analytical or archival purposes. General use cases include:
7
7
8
8
* Realtime analytics on message data.
9
9
* Centralized storage for raw event data, enabling downstream processing.
10
-
* Historical auditing of messages with at least one delivery guarantee.
10
+
* Historical auditing of messages.
11
11
12
12
<aside data-type='note'>
13
-
<p>Ably's BigQuery integration rule for Firehose is in development status.</p>
13
+
<p>Ably's BigQuery integration rule for "Firehose":/integrations/streaming is in development status.</p>
14
14
</aside>
15
15
16
16
h3(#create-rule). Create a BigQuery rule
17
17
18
-
Create a BigQuery rule using the Ably Dashboard or the Control API.
19
-
20
-
Before creating the rule in Ably, ensure the following:
18
+
Set up the necessary BigQuery resources, permissions, and authentication to enable Ably to securely write data to a BigQuery table:
21
19
22
20
* Create or select a BigQuery dataset in the Google Cloud Console.
23
21
* Create a BigQuery table in that dataset:
24
-
** Use the JSON schema provided below.
25
-
** For large volumes of data, partition the table (recommended daily partitioning by ingestion time).
26
-
* Create a GCP service account with the minimal required BigQuery permissions:
27
-
** *@bigquery.tables.get@* to read table metadata.
28
-
** *@bigquery.tables.updateData@* to insert records.
29
-
* Add table-level access control to grant the service account permission on the specific table.
30
-
* Generate and securely store the JSON key file for the service account. Ably requires this key file to authenticate and write data for your table.
31
-
32
-
33
-
h4(#dashboard). Create a BigQuery rule in the Dashboard
34
-
35
-
* Log in to the Ably Dashboard and select the application from which you want to stream data.
36
-
* Navigate to the *Integrations* tab.
37
-
* Click *New Integration Rule*.
38
-
* Select *Firehose*.
39
-
* Choose *BigQuery* from the list of available Firehose integrations.
40
-
* Configure the rule settings as described below.Then, click *Create*.
22
+
** Use the "JSON schema":#schema.
23
+
** For large datasets, partition the table by ingestion time, with daily partitioning recommended for optimal performance.
24
+
* Create a Google Cloud Platform (GCP) "service account":https://cloud.google.com/iam/docs/service-accounts-create?utm_source=chatgpt.com with the minimal required BigQuery permissions.
25
+
* Grant the service account table-level access control to allow access to the specific table.
26
+
** @bigquery.tables.get@: to read table metadata.
27
+
** @bigquery.tables.updateData@: to insert records.
28
+
* Generate and securely store the JSON key file for the service account.
29
+
** Ably requires this key file to authenticate and write data to your table.
41
30
42
31
h3(#settings). BigQuery rule settings
43
32
44
33
|_. Section |_. Purpose |
45
34
| *Source* | Defines the type of event(s) for delivery. |
46
-
| *Channel Filter* | A regular expression to filter which channels to capture. Only events on channels matching this regex are streamed into BigQuery. |
35
+
| *Channel filter* | A regular expression to filter which channels to capture. Only events on channels matching this regex are streamed into BigQuery. |
47
36
| *Table* | The full destination table path in BigQuery, typically in the format @project_id.dataset_id.table_id@. |
48
-
| *Service Account Key* | A JSON key file Ably uses to authenticate with Google Cloud. You must upload or provide the contents of this key file. |
37
+
| *Service account Key* | A JSON key file Ably uses to authenticate with Google Cloud. You must upload or provide the contents of this key file. |
49
38
| *Partitioning* | _(Optional)_ The table must be created with the desired partitioning settings in BigQuery before making the rule in Ably. |
50
39
| *Advanced settings* | Any additional configuration or custom fields relevant to your BigQuery setup (for future enhancements). |
51
40
52
-
h4(#api-rule). Creating a BigQuery rule using the Control API
41
+
h4(#dashboard). Create a BigQuery rule in the Dashboard
42
+
43
+
The following steps to create a BigQuery rule using the Ably dashboard:
44
+
45
+
* Log in to the "Ably dashboard":https://ably.com/accounts/any and select the application you want to stream data from.
46
+
* Navigate to the *Integrations* tab.
47
+
* Click *New integration rule*.
48
+
* Select *Firehose*.
49
+
* Choose *BigQuery* from the list of available Firehose integrations.
50
+
* Configure the rule settings as described below.Then, click *Create*.
51
+
52
+
h4(#api-rule). Create a BigQuery rule using the Control API
53
53
54
-
Follow a similar process to other Firehose rules. When calling the Control API, specify:
54
+
The following steps to create a BigQuery rule using the Control API:
55
55
56
-
* *ruleType*: @bigquery@
57
-
* The correct settings, for example the destination table or service account credentials.
56
+
* Using the required "rules":/control-api#examples-rules to specify the following parameters:
57
+
** @ruleType@: Set this to "bigquery" to define the rule as a BigQuery integration.
58
+
** destinationTable: Specify the BigQuery table where the data will be stored.
59
+
** @serviceAccountCredentials@: Provide the necessary GCP service account JSON key to authenticate and authorize data insertion.
60
+
** @channelFilter@ (optional): Use a regular expression to apply the rule to specific channels.
61
+
** @format@ (optional): Define the data format based on how you want messages to be structured.
62
+
* Make an HTTP request to the Control API to create the rule.
58
63
59
-
See the Control API Rules endpoint documentation for examples of creating and managing Firehose rules.
60
64
61
65
h3(#schema). JSON Schema
62
66
63
-
Ably recommends creating your BigQuery table using the schema below, which separates standard message fields from the raw payload:
67
+
You can run queries directly against the Ably-managed BigQuery table. For example, if the message payloads are stored as raw JSON in the data column, you can parse them using the following query:
64
68
65
69
```[json]
66
70
{
@@ -71,16 +75,9 @@ Ably recommends creating your BigQuery table using the schema below, which separ
71
75
}
72
76
```
73
77
74
-
Ably transports arbitrary message payloads (JSON, text, or binary). Storing data in a @BYTES@ column ensures all message content is captured. Use the *content_type* field to understand how to interpret the payload.
75
-
76
-
h3. Data insertion and semantics
77
-
78
-
* *Protocol:* Ably uses the BigQuery Storage Write API over gRPC.
79
-
* *Delivery guarantee:* At-least-once. You may see duplicate messages in BigQuery under high-throughput or transient failure conditions. You can de-duplicate using the unique *id* in an ETL process or query logic.
80
-
81
78
h3(#queries). Direct queries
82
79
83
-
You can run queries directly against the Ably-managed table. For instance, to parse JSON payloads stored in @data@:
80
+
Run queries directly against the Ably-managed table. For instance, to parse JSON payloads stored in @data@:
84
81
85
82
```[sql]
86
83
SELECT
@@ -89,14 +86,23 @@ FROM project_id.dataset_id.table_id
89
86
WHERE channel = “my-channel”
90
87
```
91
88
92
-
However, JSON parsing at query time can be expensive for large datasets.
89
+
The following explains the components of the query:
90
+
91
+
|. Query Function |. Purpose |
92
+
| CAST(data AS STRING) | Converts the data column from BYTES (if applicable) into a STRING format. |
93
+
| PARSE_JSON(…) | Parses the string into a structured JSON object for easier querying. |
94
+
| WHERE channel = “my-channel” | Filters results to retrieve messages only from a specific Ably channel. |
95
+
96
+
<aside data-type='note'>
97
+
<p>Parsing JSON at query time can be computationally expensive for large datasets. If your queries need frequent JSON parsing, consider pre-processing and storing structured fields in a secondary table using an ETL pipeline for better performance.</p>
98
+
</aside>
93
99
94
-
h4(#etl). ETL (recommended)
100
+
h4(#etl). Extract, Transform, Load (ETL)
95
101
96
-
For large-scale analytics, consider an ETL pipeline to move data from the Ably-managed table to a secondary table with a more specific schema:
102
+
ETL is recommended for large-scale analytics and performance optimization, ensuring data is structured, deduplicated, and efficiently stored for querying. Transform raw data (JSON or BYTES) into a more structured format, remove duplicates, and write it into a secondary table optimized for analytics:
97
103
98
-
* Convert data from raw @BYTES@/JSON into structured columns (for example, geospatial columns, numeric fields).
99
-
* Write these transformed records into a new table optimized for your queries.
100
-
* Use the unique *id* field to eliminate duplicates.
101
-
* Use BigQuery scheduled queries or an external workflow to automate these steps periodically.
104
+
* Convert data from raw (BYTES/JSON) into structured columns for example geospatial data fields or numeric data types, for detailed analysis.
105
+
* Write transformed records to a new optimized table tailored for query performance.
106
+
* Deduplicate records using the unique ID field to ensure data integrity.
107
+
* Automate the process using BigQuery scheduled queries or an external workflow to run transformations at regular intervals.
0 commit comments