Workflows ยท To Tasks

To tasks handle all data exports from your data lake to a destination. You can configure the items below. Click on the name for more info.

Extract

The extract part of the configuration considers the data to be selected from your data lake that will be uploaded to a destination.

Extract types

Below are the types of extract. Click on an extract type for an explanation.

extract: database (BigQuery)

Extract data from a BigQuery table.

Example usage

Example of extracting the data for the to_aws_sqs task.

source: database
query: | 
        SELECT 
            'https://sqs.eu-west-1.amazonaws.com/12345678909/my-message-queue' AS QueueUrl,
            '{"run_id": "{{ task.run_id }}", "task_type": "{{ task.type }}"}' AS MessageBody,
            123 AS DelaySeconds
propertytyperequireddescription
conn_idstringnoName of the connection. If not declared, client_cloud.db_conn_id is used.
sourceenumerator (database, storage)noDefault is database
querystringyesUse either query or template (below). Query to be executed, whose results will be uploaded to the destination.
templatestringyesUse either query(above) or template. Contains a link to a file in the `includes` folder in your repository that contains the SQL statement. This query will be executed and the results will be uploaded to the destination.
project_idstringnoProject ID of the destination table. If not declared, client_cloud.project_id is used.
project_idstringnoProject ID of the destination table. If not declared, client_cloud.project_id is used.
dataset_idstringnoDataset ID of the destination table. If not declared, client_cloud.dataset_id is used.
table_idstringnoTable ID of the destination table. If not declared, task.id is used.
use_legacy_sqlyesno (boolean)noDefault is `no`. If legacy SQL should be used.
paramsobjectnoParameters that can be set. Useful for templating.

extract: database (Snowflake)

Extract data from a Snowflake table.

Example usage

Example of extracting for the to_aws_sqs task.

source: database
query: | 
        SELECT 
            'https://sqs.eu-west-1.amazonaws.com/12345678909/my-message-queue' AS QueueUrl,
            '{"run_id": "{{ task.run_id }}", "task_type": "{{ task.type }}"}' AS MessageBody,
            123 AS DelaySeconds
propertytyperequireddescription
sourceenumerator (database, storage)noDefault is database
conn_idstringnoName of the connection. If not declared, client_cloud.db_conn_id is used.
querystringUse either query or template (below)Query to be executed, whose results will be uploaded to the destination.
templatestringUse either query or template (below)Contains a link to a file in the `includes` folder in your repository that contains the SQL statement. This query will be executed and the results will be uploaded to the destination.
databasestringyesDatabase of the destination table. If not declared, client_cloud.database is used.
schemastringnoeSchema of the destination table. If not declared, client_cloud.schema is used.
tablestringnoeTable of the destination table. If not declared, task.id.upper() is used.
paramsobjectnoParameters that can be set. Useful for templating.

extract: storage

Example usage

Example of extracting the data files with prefix s3://my-test-bucket/some/folder/part-

extract:
    source: storage
    conn_id: s3
    bucket: my-test-bucket
    prefix: some/folder/part-

Extracts data files from Amazon S3, Google Cloud Storage or Azure Blob Storage (in beta).

propertytyperequireddescription
sourceenumerator (database, storage)yesSet to storage
conn_idstringnoName of the connection. If not declared, client_cloud.storage_conn_id is used.
bucketstringnoName of the bucket. If not declared, client_cloud.bucket is used.
prefixstringnoPrefix of the file(s).
project_idstringnoGoogle Cloud only. Project ID of the bucket. If not declared, client_cloud.project_id is used.

Load

The `load` part of the configuration considers loading the data to external destinations. The vocabulary of the destination is leading in this part. Will Workflows talk about fields and records, it will use columns and rows here if the data destination uses those.

We identify two types of data destinations: management and data uploads. Management tasks sync a table with a destination. For example a table with all DoubleClick display ads including text and bidding amounts with DV360. Data upload tasks are tasks that upload data to an endpoint. An example of this is for example uploading a file to an external FTP server or uploading audiences to Facebook.

New task types are added to Workflows on a monthly basis. Currently Workflows supports the following to tasks:

  1. to_aws_s3
  2. to_aws_sns
  3. to_aws_sqs
  4. to_dcm
  5. to_doubleclick_offline_conversions
  6. to_meta
  7. to_meta_custom_audience
  8. to_meta_offline_conversions
  9. to_google_analytics_data_import
  10. to_google_analytics_management
  11. to_google_measurement_protocol_v3
  12. to_ftp
  13. to_xandr
  14. to_xandr_server_side_segmentation