S3 (Data Sync)

Humm syncs files from your S3 bucket into your warehouse via Airbyte, then queries the synced tables through your warehouse integration. Use this when source data already lands in S3 and you want it available for analysis alongside your other business data. If your data is already in Parquet and should stay in its existing S3 location, use Amazon Athena external tables instead. S3 Data Sync is the copy-based path.

Source Type

Sync

Humm copies matching files from S3 into your warehouse on a schedule. Humm then queries the synced tables rather than reading directly from the bucket during each question.

See Choosing a Source Type for help deciding.

Supported Files

S3 data sync supports one configured stream per connector. A stream is a set of files with the same structure that should become one table. Supported file formats:

CSV
JSON Lines (.jsonl)
Parquet
Avro

Use file glob patterns from the bucket root to choose which files belong in the stream. For example:

exports/usage/*.csv
events/**/*.jsonl
orders/*.parquet|archive/orders/*.parquet

Credentials

AWS Access Keys

To connect S3 data sync, provide:

Bucket: The S3 bucket name
AWS Region: The bucket region, such as us-east-1
AWS Access Key ID: Access key for an IAM user with read access
AWS Secret Access Key: Secret key for that IAM user
Output Stream Name: The table name for the synced files
File Globs: One or more file patterns to sync
File Format: CSV, JSON Lines, Parquet, or Avro

Humm currently supports access-key authentication for this connector. IAM role authentication is not available in the S3 data sync setup.

Required AWS Permissions

Create an IAM user or access key with read-only access to the bucket and prefixes you want to sync. Minimum permissions:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["s3:ListBucket"],
      "Resource": "arn:aws:s3:::your-bucket-name"
    },
    {
      "Effect": "Allow",
      "Action": ["s3:GetObject"],
      "Resource": "arn:aws:s3:::your-bucket-name/path-prefix/*"
    }
  ]
}

If your file glob reads from multiple prefixes, include each prefix in the s3:GetObject resources. If you need to sync files across the whole bucket, use arn:aws:s3:::your-bucket-name/*.

Sync Behavior

Data is refreshed on a schedule after setup
The connector reads files matching the configured glob patterns
Incremental sync uses file modification history
Humm reads the synced warehouse tables after the sync completes
Schema can be inferred from files, or you can provide an input schema during setup

Best For

Product or usage exports already written to S3
Batch exports from internal systems
Partner or vendor files delivered to your bucket
Joining file-based operational data with warehouse, CRM, billing, and support data

​Source Type

Sync

​Supported Files

​Credentials