Skip to main content

COPY INTO <table>

COPY loads data into Databend or unloads data from Databend.

This command loads data into a table from files in one of the following locations:

  • Named internal stage: Databend internal named stages. Files can be staged using the PUT to Stage API.
  • Named external stage: Stages created in AWS S3 compatible object storage services and Azure Blob storage.
  • External location: This can be a bucket in AWS S3 compatible object storage services, Azure Blob storage, Google Cloud Storage, or Huawei OBS. The exteranl location can be also just a remote server from where you can access the file by a URL (starting with "https://...").

See Also: COPY INTO location

Syntax

COPY INTO [<database>.]<table_name>
FROM { internalStage | externalStage | externalLocation }
[ FILES = ( '<file_name>' [ , '<file_name>' ] [ , ... ] ) ]
[ PATTERN = '<regex_pattern>' ]
[ FILE_FORMAT = ( TYPE = { CSV | JSON | NDJSON | PARQUET } [ formatTypeOptions ] ) ]
[ copyOptions ]

Where:

internalStage

internalStage ::= @<internal_stage_name>[/<path>]

externalStage

externalStage ::= @<external_stage_name>[/<path>]

externalLocation

AWS S3 Compatible Object Storage Service

externalLocation ::=
's3://<bucket>[<path>]'
CONNECTION = (
ENDPOINT_URL = 'https://<endpoint-URL>'
ACCESS_KEY_ID = '<your-access-key-ID>'
SECRET_ACCESS_KEY = '<your-secret-access-key>'
REGION = '<region-name>'
ENABLE_VIRTUAL_HOST_STYLE = true|false
)
ParameterDescriptionRequired
s3://<bucket>[<path>]External files located at the AWS S3 compatible object storage.Optional
ENDPOINT_URLThe bucket endpoint URL starting with "https://". To use a URL starting with "http://", set allow_insecure to true in the [storage] block of the file databend-query-node.toml.Optional
ACCESS_KEY_IDYour access key ID for connecting the AWS S3 compatible object storage. If not provided, Databend will access the bucket anonymously.Optional
SECRET_ACCESS_KEYYour secret access key for connecting the AWS S3 compatible object storage.Optional
REGIONAWS region name. For example, us-east-1.Optional
ENABLE_VIRTUAL_HOST_STYLEIf you use virtual hosting to address the bucket, set it to "true".Optional

Azure Blob storage

externalLocation ::=
'azblob://<container>[<path>]'
CONNECTION = (
ENDPOINT_URL = 'https://<endpoint-URL>'
ACCOUT_NAME = '<your-account-name>'
ACCOUNT_KEY = '<your-account-key>'
)
ParameterDescriptionRequired
azblob://<container>[<path>]External files located at the Azure Blob storage.Required
ENDPOINT_URLThe container endpoint URL starting with "https://". To use a URL starting with "http://", set allow_insecure to true in the [storage] block of the file databend-query-node.toml.Optional
ACCOUNT_NAMEYour account name for connecting the Azure Blob storage. If not provided, Databend will access the container anonymously.Optional
ACCOUNT_KEYYour account key for connecting the Azure Blob storage.Optional

Google Cloud Storage

externalLocation ::=
'gcs://<container>[<path>]'
CONNECTION = (
ENDPOINT_URL = 'https://<endpoint-URL>'
CREDENTIAL = '<your-credential>'
)
ParameterDescriptionRequired
gcs://<bucket>[<path>]External files located at the Google Cloud StorageRequired
ENDPOINT_URLThe container endpoint URL starting with "https://". To use a URL starting with "http://", set allow_insecure to true in the [storage] block of the file databend-query-node.toml.Optional
CREDENTIALYour credential for connecting the GCS. If not provided, Databend will access the container anonymously.Optional

Huawei Object Storage

externalLocation ::=
'obs://<container>[<path>]'
CONNECTION = (
ENDPOINT_URL = 'https://<endpoint-URL>'
ACCESS_KEY_ID = '<your-access-key-id>'
SECRET_ACCESS_KEY = '<your-secret-access-key>'
)
ParameterDescriptionRequired
obs://<bucket>[<path>]External files located at the obsRequired
ENDPOINT_URLThe container endpoint URL starting with "https://". To use a URL starting with "http://", set allow_insecure to true in the [storage] block of the file databend-query-node.toml.Optional
ACCESS_KEY_IDYour access key ID for connecting the OBS. If not provided, Databend will access the bucket anonymously.Optional
SECRET_ACCESS_KEYYour secret access key for connecting the OBS.Optional

HTTP

externalLocation ::=
'https://<url>'

Please note that, HTTP supports glob patterns. For example, use

  • ontime_200{6,7,8}.csv to represents ontime_2006.csv,ontime_2007.csv,ontime_2008.csv.
  • ontime_200[6-8].csv to represents ontime_2006.csv,ontime_2007.csv,ontime_2008.csv.

FILES = ( 'file_name' [ , 'file_name' ... ] )

Specifies a list of one or more files names (separated by commas) to be loaded.

PATTERN = 'regex_pattern'

A regular expression pattern string, enclosed in single quotes, specifying the file names to match.

formatTypeOptions

formatTypeOptions ::=
RECORD_DELIMITER = '<character>'
FIELD_DELIMITER = '<character>'
SKIP_HEADER = <integer>
COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE

RECORD_DELIMITER = '<character>'

Description: One character that separate records in an input file.

Default: '\n'

FIELD_DELIMITER = '<character>'

Description: One character that separate fields in an input file.

Default: ',' (comma)

SKIP_HEADER = '<integer>'

Description: Number of lines at the start of the file to skip.

Default: 0

COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | XZ | NONE

Description: String that represents the compression algorithm.

Default: NONE

Values:

ValuesNotes
AUTOAuto detect compression via file extensions
GZIP
BZ2
BROTLIMust be specified if loading/unloading Brotli-compressed files.
ZSTDZstandard v0.8 (and higher) is supported.
DEFLATEDeflate-compressed files (with zlib header, RFC1950).
RAW_DEFLATEDeflate-compressed files (without any header, RFC1951).
XZ
NONEIndicates that the files have not been compressed.

copyOptions

copyOptions ::=
[ SIZE_LIMIT = <num> ]
[ PURGE = <bool> ]
ParametersDescriptionRequired
SIZE_LIMIT = <num>Number (> 0) that specifies the maximum rows of data to be loaded for a given COPY statement. Default 0Optional
PURGE = <bool>True that specifies the command will purge the files in the stage if they are loaded successfully into table. Default falseOptional

Examples

Loading Files from Internal Stage

First, create a named internal stage:

CREATE STAGE my_internal_s1;

Then, PUT a local file to my_internal_s1 stage with PUT to Stage API:

curl  -H "stage_name:my_internal_s1" -F "upload=@books.parquet" -XPUT "http://localhost:8081/v1/upload_to_stage"

Finally, copy the file into mytable from the my_internal_s1 named internal stage:

LIST @my_internal_s1;
COPY INTO mytable FROM @my_internal_s1 pattern = 'books.*parquet' file_format = (type = 'PARQUET');

Loading Files from External Stage

First, create a named external stage:

CREATE STAGE my_external_s1 url = 's3://testbucket/admin/data/' credentials=(aws_key_id='minioadmin' aws_secret_key='minioadmin');

Then, copy the file into mytable from the my_external_s1 named external stage:

COPY INTO mytable FROM @my_external_s1 pattern = 'books.*parquet' file_format = (type = 'PARQUET');

Loading Files Directly from External Location

AWS S3 compatible object storage services

This example reads 10 rows from a CSV file and inserts them into a table:

COPY INTO mytable
FROM 's3://mybucket/data.csv'
CONNECTION = (
ENDPOINT_URL = 'https://<endpoint-URL>'
ACCESS_KEY_ID = '<your-access-key-ID>'
SECRET_ACCESS_KEY = '<your-secret-access-key>')
FILE_FORMAT = (type = 'CSV' field_delimiter = ',' record_delimiter = '\n' skip_header = 1) size_limit=10;

This example reads 10 rows from a CSV file compressed as GZIP and inserts them into a table:

COPY INTO mytable
FROM 's3://mybucket/data.csv.gz'
CONNECTION = (
ENDPOINT_URL = 'https://<endpoint-URL>'
ACCESS_KEY_ID = '<your-access-key-ID>'
SECRET_ACCESS_KEY = '<your-secret-access-key>')
FILE_FORMAT = (type = 'CSV' field_delimiter = ',' record_delimiter = '\n' skip_header = 1 compression = GZIP) size_limit=10;

This example loads data from a CSV file without specifying the endpoint URL:

COPY INTO mytable
FROM 's3://mybucket/data.csv'
FILE_FORMAT = (type = 'CSV' field_delimiter = ',' record_delimiter = '\n' skip_header = 1) size_limit=10;

This is an example loading data from a Parquet file:

COPY INTO mytable
FROM 's3://mybucket/data.parquet'
CONNECTION = (
ACCESS_KEY_ID = '<your-access-key-ID>'
SECRET_ACCESS_KEY = '<your-secret-access-key>')
FILE_FORMAT = (type = 'PARQUET');

Azure Blob storage

This example reads data from a CSV file and inserts it into a table:

COPY INTO mytable
FROM 'azblob://mybucket/data.csv'
CONNECTION = (
ENDPOINT_URL = 'https://<account_name>.blob.core.windows.net'
ACCOUNT_NAME = '<account_name>'
ACCOUNT_KEY = '<account_key>'
)
FILE_FORMAT = (type = 'CSV');

HTTP

This example reads data from three CSV files and inserts it into a table:

COPY INTO mytable
FROM 'https://repo.databend.rs/dataset/stateful/ontime_200{6,7,8}_200.csv'
FILE_FORMAT = (type = 'CSV');