Skip to content

Customize output settings

Pipelines convert a stream of records into output files, and deliver the files to an R2 bucket in your account. This guide details how you can change the output destination, and how to customize batch settings to generate query ready files.

Configure an R2 bucket as a destination

To create or update a pipeline using Wrangler, run the following command in a terminal:

Terminal window
npx wrangler pipelines create [PIPELINE-NAME] --r2-bucket [R2-BUCKET-NAME]

After running this command, you'll be prompted to authorize Cloudflare Workers Pipelines to create an R2 API token on your behalf. Your pipeline uses the R2 API token to load data into your bucket. You can approve the request through the browser link which will open automatically.

If you prefer not to authenticate this way, you may pass your R2 API Token to Wrangler:

Terminal window
npx wrangler pipelines create [PIPELINE-NAME] --r2 [R2-BUCKET-NAME] --r2-access-key-id [ACCESS-KEY-ID] --r2-secret-access-key [SECRET-ACCESS-KEY]

File format and compression

Output files are generated as Newline Delimited JSON files (ndjson). Each line in an output file maps to a single record.

By default, output files are compressed in the gzip format. Compression can be turned off using the --compression flag:

Terminal window
npx wrangler pipelines update [PIPELINE-NAME] --compression none

Output files are named using a UILD slug, followed by an extension.

Customize batch behavior

When configuring your pipeline, you can define how records are batched before they are delivered to R2. Batches of records are written out to a single output file.

Batching can:

  1. Reduce the number of output files written to R2, and thus reduce the cost of writing data to R2
  2. Increase the size of output files, making them more efficient to query

There are three ways to define how ingested data is batched:

  1. batch-max-mb: The maximum amount of data that will be batched, in megabytes. Default is 10 MB, maximum is 100 MB.
  2. batch-max-rows: The maximum number of rows or events in a batch before data is written. Default, and maximum, is 10,000 rows.
  3. batch-max-seconds: The maximum duration of a batch before data is written, in seconds. Default is 15 seconds, maximum is 300 seconds.

Batch definitions are hints. A pipeline will follow these hints closely, but batches might not be exact.

All three batch definitions work together. Whichever limit is reached first triggers the delivery of a batch.

For example, a batch-max-mb = 100 MB and a batch-max-seconds = 100 means that if 100 MB of events are posted to the pipeline, the batch will be delivered. However, if it takes longer than 100 seconds for 100 MB of events to be posted, a batch of all the messages that were posted during those 100 seconds will be created.

Defining batch settings using Wrangler

You can use the following batch settings flags while creating or updating a pipeline:

  • --batch-max-mb
  • --batch-max-rows
  • --batch-max-seconds

For example:

Terminal window
npx wrangler pipelines update [PIPELINE-NAME] --batch-max-mb 100 --batch-max-rows 10000 --batch-max-seconds 300

Batch size limits

SettingDefaultMinimumMaximum
Maximum Batch Size batch-max-mb10 MB0.001 MB100 MB
Maximum Batch Timeout batch-max-seconds15 seconds0 seconds300 seconds
Maximum Batch Rows batch-max-rows10,000 rows1 row10,000 rows

Deliver partitioned data

Partitioning organizes data into directories based on specific fields to improve query performance. Partitions reduce the amount of data scanned for queries, enabling faster reads.

Output files are prefixed with event date and hour. For example, the output from a Pipeline in your R2 bucket might look like this:

Terminal window
- event_date=2025-04-01/hr=15/01JQWBZCZBAQZ7RJNZHN38JQ7V.json.gz
- event_date=2025-04-01/hr=15/01JQWC16FXGP845EFHMG1C0XNW.json.gz

Deliver data to a prefix

You can specify an optional prefix for all the output files stored in your specified R2 bucket, using the flag --r2-prefix.

For example:

Terminal window
npx wrangler pipelines update [PIPELINE-NAME] --r2-prefix test

After running the above command, the output files generated by your pipeline will be stored under the prefix "test". Files will remain partitioned. Your output will look like this:

Terminal window
- test/event_date=2025-04-01/hr=15/01JQWBZCZBAQZ7RJNZHN38JQ7V.json.gz
- test/event_date=2025-04-01/hr=15/01JQWC16FXGP845EFHMG1C0XNW.json.gz