Arrow Flight interface
Overview
ClickHouse supports the Apache Arrow Flight protocol — a high-performance RPC framework for efficient columnar data transport using the Arrow IPC format over gRPC.
The implementation includes support for Arrow Flight SQL, enabling BI tools and applications that speak the Flight SQL protocol to query ClickHouse directly.
Key capabilities:
- Execute SQL queries and retrieve results in Apache Arrow format.
- Insert data into tables using the Arrow format.
- Query metadata (catalogs, schemas, tables, primary keys) via Flight SQL commands.
- Create, bind, execute, and close server-side prepared statements via Flight SQL.
- Manage sessions and settings via Flight SQL actions.
- TLS encryption and username/password authentication.
- Incremental result retrieval via
PollFlightInfo. - Query cancellation via
CancelFlightInfo.
Enabling the Arrow Flight Server
To enable the Arrow Flight server, add the arrowflight_port setting to the ClickHouse server configuration:
Upon startup, a log message confirms the interface is active:
TLS configuration
To enable TLS for the Arrow Flight interface, configure the following settings:
When TLS is enabled, clients must connect using the grpc+tls:// scheme instead of grpc://.
Authentication
The Arrow Flight interface supports two authentication methods:
Basic Authentication
Clients authenticate with a username and password via the standard HTTP Authorization: Basic header. On successful authentication, the server returns a Bearer token in the response header.
Bearer Token Authentication
Subsequent requests can use the Bearer token returned from Basic authentication via the Authorization: Bearer <token> header. The token is automatically refreshed on each use and expires based on the default_session_timeout server setting (default: 60 seconds).
Python Example
With TLS:
Session Management
The Arrow Flight interface supports ClickHouse sessions through custom gRPC metadata headers:
| Header | Description |
|---|---|
x-clickhouse-session-id | Session identifier. If provided, multiple requests share the same session state (temporary tables, settings). |
x-clickhouse-session-timeout | Session timeout in seconds. Must not exceed max_session_timeout. |
x-clickhouse-session-check | Set to 1 to check if the session exists without creating one. |
x-clickhouse-session-close | Set to 1 to close the session after the request completes. Requires enable_arrow_close_session to be true in the server config. |
Because Arrow Flight uses gRPC over HTTP/2, metadata header names are case-sensitive and must be specified in lowercase exactly as shown (e.g., x-clickhouse-session-id, not X-ClickHouse-Session-Id). This is required by RFC 9113, Section 8.2, which mandates that HTTP/2 field names contain only lowercase characters. This differs from HTTP/1.1 where header names are case-insensitive.
Sessions allow setting persistent ClickHouse settings via the SetSessionOptions action (see DoAction).
Server Configuration Reference
| Setting | Default | Description |
|---|---|---|
arrowflight_port | — | Port for the Arrow Flight server. The server starts only if this setting is specified. |
arrowflight.enable_ssl | false | Enable TLS encryption. |
arrowflight.ssl_cert_file | — | Path to the TLS certificate file. Required when TLS is enabled. |
arrowflight.ssl_key_file | — | Path to the TLS private key file. Required when TLS is enabled. |
arrowflight.tickets_lifetime_seconds | 600 | Time in seconds before flight tickets expire and are cleaned up. Set to 0 to disable automatic ticket expiration. |
arrowflight.cancel_ticket_after_do_get | false | If true, tickets are cancelled immediately after being consumed by DoGet, freeing memory. |
arrowflight.poll_descriptors_lifetime_seconds | 600 | Time in seconds before poll descriptors expire. Set to 0 to disable automatic expiration. |
arrowflight.cancel_flight_descriptor_after_poll_flight_info | false | If true, poll descriptors are cancelled after being consumed by PollFlightInfo. |
arrowflight.max_prepared_statements_per_user | 100 | Maximum number of open prepared statements per user. Set to 0 to disable the limit. |
arrowflight.prepared_statements_lifetime_seconds | -1 | Prepared statement lifetime mode. > 0: use this value as lifetime and refresh expiration on each request for both session-bound and session-less statements. 0: disable automatic expiration. -1: for session-bound statements, use session timeout as lifetime and refresh it on each request; session-less statements do not expire automatically. |
enable_arrow_close_session | true | Allow clients to close sessions via the x-clickhouse-session-close header. |
default_session_timeout | 60 | Default session timeout in seconds. Also controls Bearer token expiration. |
max_session_timeout | 3600 | Maximum allowed session timeout in seconds. |
Supported RPC Methods
GetFlightInfo
Executes a query and returns a FlightInfo containing the result schema, endpoints with tickets for data retrieval, row count, and byte count.
Accepts a FlightDescriptor which can be:
- PATH descriptor: A single-component path interpreted as a table name. Generates
SELECT * FROM <table>. - CMD descriptor: Either a raw SQL query string, or a serialized Flight SQL protobuf command (see Flight SQL Commands).
The query is executed fully, and results are stored in server-side tickets. Each block of data produces a separate endpoint/ticket, allowing clients to retrieve data in parallel.
PollFlightInfo
Enables incremental result retrieval for long-running queries. Instead of waiting for the entire query to complete (as GetFlightInfo does), PollFlightInfo returns results block by block.
On the first call, the query starts executing. The response includes:
- A
FlightInfowith endpoints for any data blocks available so far. - A
FlightDescriptorfor the next poll (if more results are expected).
Subsequent calls with the returned descriptor retrieve additional blocks. When no more data is available, the response contains no next descriptor.
The current implementation blocks until a data block is available rather than returning immediately with no data.
GetSchema
Returns the Arrow schema for a query result without executing the full query. Accepts the same descriptor types as GetFlightInfo.
DoGet
Retrieves data for a given ticket. Accepts either:
- A ticket returned by
GetFlightInfoorPollFlightInfo. - A raw SQL query string as the ticket value.
DoPut
Sends data to ClickHouse. Accepts a FlightDescriptor and a stream of Arrow record batches.
Insert by table name (PATH descriptor):
Insert by SQL (CMD descriptor):
Execute DDL/DML via Flight SQL CommandStatementUpdate:
Flight SQL clients use CommandStatementUpdate to execute DDL/DML statements (CREATE, INSERT, ALTER, etc.). The response includes the affected row count.
Bulk ingest via Flight SQL CommandStatementIngest:
Only appending to existing tables is supported (TABLE_NOT_EXIST_OPTION_FAIL + TABLE_EXISTS_OPTION_APPEND). Catalogs and temporary tables are not supported for this command.
transaction_id is not supported for CommandStatementUpdate or CommandStatementIngest. If provided, ClickHouse returns a NotImplemented error.
Only the Arrow format is accepted for data transfer. Specifying other formats in SQL (e.g., FORMAT JSON) results in an error.
DoAction
Executes named actions. The following actions are supported:
CancelFlightInfo
Cancels a running query associated with a FlightInfo. The query ID is extracted from the FlightInfo's app_metadata field. Also cancels any poll descriptors associated with the query.
SetSessionOptions
Sets ClickHouse server settings for the current session. Requires a session ID to be set via the x-clickhouse-session-id header.
Supported value types: string, boolean, integer, double, and string lists.
If a setting name is unknown, the error INVALID_NAME is returned. If a value cannot be parsed, the error INVALID_VALUE is returned.
GetSessionOptions
Returns all current ClickHouse settings and their values for the session. Returns a map of setting names to string values (queries system.settings internally).
CreatePreparedStatement
Creates a server-side prepared statement and returns a statement handle. The request contains the SQL query text with ? placeholders.
transaction_id is not supported for this action. If it is provided, ClickHouse returns a NotImplemented error.
For query statements, the response may include:
dataset_schema: schema of the result set.parameter_schema: schema of statement parameters.
If schema inference fails for a valid query (for example, when replacing placeholders with NULL is not valid for that query), ClickHouse still creates the prepared statement and returns the handle without dataset_schema.
Prepared statements are owned by the authenticated user, not by a single session. If you open multiple sessions as the same user, you can execute, re-bind, and close the same statement handle from any of those sessions.
Other users cannot execute, bind, or close a statement handle they did not create.
arrowflight.prepared_statements_lifetime_seconds controls expiration behavior:
> 0: use the configured value as statement lifetime. Expiration is refreshed on each request for both session-bound and session-less statements.0: prepared statements do not expire automatically.-1(default): if the statement is created in a session, its lifetime follows that session timeout and is refreshed on each request in that session. If the statement is created without a session, it does not expire automatically.
Expired statements are removed and no longer count toward arrowflight.max_prepared_statements_per_user.
ClosePreparedStatement
Closes a prepared statement and releases the associated server-side resources when the request contains a non-empty statement handle.
ClickHouse also supports bulk close with ClosePreparedStatement when the handle is empty:
- If
x-clickhouse-session-idis present, it closes all prepared statements for the authenticated user in that session. - If no session ID is present, it closes only session-less prepared statements for the authenticated user.
If a prepared statement is created in a session (via x-clickhouse-session-id), it is also closed automatically when that session is closed.
Flight SQL Commands
When a CMD descriptor contains a serialized Flight SQL protobuf message, ClickHouse handles the following commands:
Supported via GetFlightInfo / GetSchema
| Command | Description |
|---|---|
CommandStatementQuery | Execute an arbitrary SQL query. transaction_id is not supported. |
CommandGetSqlInfo | Retrieve server metadata (name, version, Arrow version, capabilities). |
CommandGetCatalogs | List catalogs. Returns an empty result (ClickHouse does not use catalogs). |
CommandGetDbSchemas | List databases. Supports optional db_schema_filter_pattern (SQL LIKE pattern). |
CommandGetTables | List tables. Supports filters for schema, table name, table types, and optional schema inclusion. |
CommandGetTableTypes | List table engine types (from system.table_engines). |
CommandGetPrimaryKeys | Retrieve primary key columns for a specified table. |
CommandPreparedStatementQuery | Execute a prepared SELECT-style statement by handle. |
Supported via DoPut
| Command | Description |
|---|---|
CommandStatementUpdate | Execute a DDL/DML statement (CREATE, INSERT, ALTER, etc.). Returns affected row count. transaction_id is not supported. |
CommandStatementIngest | Bulk insert Arrow data into an existing table. Only append mode is supported. transaction_id is not supported. |
CommandPreparedStatementQuery | Bind parameter values for a prepared statement when sent via DoPut, then return DoPutPreparedStatementResult with the statement handle. Only one parameter set (one row) is accepted, and the number of bound values must exactly match the number of ? placeholders. |
CommandPreparedStatementUpdate | Execute a prepared DDL/DML statement by handle and return affected row count. |
Unsupported in ClickHouse
These commands map to features that ClickHouse does not provide, so they are not supported by the Arrow Flight SQL interface.
| Command | Reason |
|---|---|
CommandGetCrossReference | ClickHouse is not a relational database and does not implement foreign key constraints, so cross-reference metadata is not available. |
CommandGetExportedKeys | ClickHouse is not a relational database and does not implement foreign key constraints, so exported-key metadata is not available. |
CommandGetImportedKeys | ClickHouse is not a relational database and does not implement foreign key constraints, so imported-key metadata is not available. |
CommandStatementSubstraitPlan | ClickHouse does not support Substrait plans. |
Complete Example
Output:
Data Format
All data is transferred in Apache Arrow IPC format. Only the Arrow format is supported — specifying other ClickHouse formats (e.g., FORMAT JSON, FORMAT CSV) results in an error.
ClickHouse data types are mapped to Arrow types during serialization. The setting output_format_arrow_unsupported_types_as_binary controls whether unsupported ClickHouse types are serialized as binary blobs.
Compatibility
The Arrow Flight interface is compatible with any client or tool that supports the Arrow Flight or Arrow Flight SQL protocol, including:
- Python (
pyarrow) - Java (
org.apache.arrow.flight) - C++ (
arrow::flight) - Go (
apache/arrow/go) - ADBC (Arrow Database Connectivity) drivers
- DBeaver, and other tools with Flight SQL support
If a native ClickHouse connector is available for your tool (e.g., JDBC, ODBC, native protocol), prefer using it unless Arrow Flight is specifically required for performance or format compatibility.
Client-side ArrowFlight features
ClickHouse can also act as a Flight client to read data from external Arrow Flight servers. See: