To tasks handle all data exports from your data lake to a destination. You can configure the items below. Click on the name for more info.
The extract part of the configuration considers the data to be selected from your data lake that will be uploaded to a destination.
Below are the types of extract. Click on an extract type for an explanation.
Extract data from a BigQuery table.
Example of extracting the data for the to_aws_sqs
task.
source: database
query: |
SELECT
'https://sqs.eu-west-1.amazonaws.com/12345678909/my-message-queue' AS QueueUrl,
'{"run_id": "{{ task.run_id }}", "task_type": "{{ task.type }}"}' AS MessageBody,
123 AS DelaySeconds
property | type | required | description |
---|---|---|---|
conn_id | string | no | Name of the connection. If not declared, client_cloud.db_conn_id is used. |
source | enumerator (database, storage) | no | Default is database |
query | string | yes | Use either query or template (below). Query to be executed, whose results will be uploaded to the destination. |
template | string | yes | Use either query(above) or template. Contains a link to a file in the `includes` folder in your repository that contains the SQL statement. This query will be executed and the results will be uploaded to the destination. |
project_id | string | no | Project ID of the destination table. If not declared, client_cloud.project_id is used. |
project_id | string | no | Project ID of the destination table. If not declared, client_cloud.project_id is used. |
dataset_id | string | no | Dataset ID of the destination table. If not declared, client_cloud.dataset_id is used. |
table_id | string | no | Table ID of the destination table. If not declared, task.id is used. |
use_legacy_sql | yesno (boolean) | no | Default is `no`. If legacy SQL should be used. |
params | object | no | Parameters that can be set. Useful for templating. |
Extract data from a Snowflake table.
Example of extracting for the to_aws_sqs
task.
source: database
query: |
SELECT
'https://sqs.eu-west-1.amazonaws.com/12345678909/my-message-queue' AS QueueUrl,
'{"run_id": "{{ task.run_id }}", "task_type": "{{ task.type }}"}' AS MessageBody,
123 AS DelaySeconds
property | type | required | description |
---|---|---|---|
source | enumerator (database, storage) | no | Default is database |
conn_id | string | no | Name of the connection. If not declared, client_cloud.db_conn_id is used. |
query | string | Use either query or template (below) | Query to be executed, whose results will be uploaded to the destination. |
template | string | Use either query or template (below) | Contains a link to a file in the `includes` folder in your repository that contains the SQL statement. This query will be executed and the results will be uploaded to the destination. |
database | string | yes | Database of the destination table. If not declared, client_cloud.database is used. |
schema | string | noe | Schema of the destination table. If not declared, client_cloud.schema is used. |
table | string | noe | Table of the destination table. If not declared, task.id.upper() is used. |
params | object | no | Parameters that can be set. Useful for templating. |
Example of extracting the data files with prefix s3://my-test-bucket/some/folder/part-
extract:
source: storage
conn_id: s3
bucket: my-test-bucket
prefix: some/folder/part-
Extracts data files from Amazon S3, Google Cloud Storage or Azure Blob Storage (in beta).
property | type | required | description |
---|---|---|---|
source | enumerator (database, storage) | yes | Set to storage |
conn_id | string | no | Name of the connection. If not declared, client_cloud.storage_conn_id is used. |
bucket | string | no | Name of the bucket. If not declared, client_cloud.bucket is used. |
prefix | string | no | Prefix of the file(s). |
project_id | string | no | Google Cloud only. Project ID of the bucket. If not declared, client_cloud.project_id is used. |
The `load` part of the configuration considers loading the data to external destinations. The vocabulary of the destination is leading in this part. Will Workflows talk about fields and records, it will use columns and rows here if the data destination uses those.
We identify two types of data destinations: management and data uploads. Management tasks sync a table with a destination. For example a table with all DoubleClick display ads including text and bidding amounts with DV360. Data upload tasks are tasks that upload data to an endpoint. An example of this is for example uploading a file to an external FTP server or uploading audiences to Facebook.
New task types are added to Workflows on a monthly basis. Currently Workflows supports the following to
tasks: