Implementation Guide
Airbyte Self-Managed Enterprise is in an early access stage for select priority users. Once you are qualified for a Self-Managed Enterprise license key, you can deploy Airbyte with the following instructions.
Airbyte Self-Managed Enterprise must be deployed using Kubernetes. This is to enable Airbyte's best performance and scale. The core components (api server, scheduler, etc) run as deployments while the scheduler launches connector-related pods on different nodes.
Prerequisites
For a production-ready deployment of Self-Managed Enterprise, various infrastructure components are required. We recommend deploying to Amazon EKS or Google Kubernetes Engine. The following diagram illustrates a typical Airbyte deployment running on AWS:
Prior to deploying Self-Managed Enterprise, we recommend having each of the following infrastructure components ready to go. When possible, it's easiest to have all components running in the same VPC. The provided recommendations are for customers deploying to AWS:
Component | Recommendation |
---|---|
Kubernetes Cluster | Amazon EKS cluster running in 2 or more availability zones on a minimum of 6 nodes. |
Ingress | Amazon ALB and a URL for users to access the Airbyte UI or make API requests. |
Object Storage | Amazon S3 bucket with two directories for log and state storage. |
Dedicated Database | Amazon RDS Postgres with at least one read replica. |
External Secrets Manager | Amazon Secrets Manager for storing connector secrets. |
We also require you to install and configure the following Kubernetes tooling:
- Install
helm
by following these instructions - Install
kubectl
by following these instructions. - Configure
kubectl
to connect to your cluster by usingkubectl use-context my-cluster-name
:
Configure kubectl to connect to your cluster
- Amazon EKS
- GKE
- Configure your AWS CLI to connect to your project.
- Install eksctl.
- Run
eksctl utils write-kubeconfig --cluster=$CLUSTER_NAME
to make the context available to kubectl. - Use
kubectl config get-contexts
to show the available contexts. - Run
kubectl config use-context $EKS_CONTEXT
to access the cluster with kubectl.
- Configure
gcloud
withgcloud auth login
. - On the Google Cloud Console, the cluster page will have a "Connect" button, with a command to run locally:
gcloud container clusters get-credentials $CLUSTER_NAME --zone $ZONE_NAME --project $PROJECT_NAME
. - Use
kubectl config get-contexts
to show the available contexts. - Run
kubectl config use-context $EKS_CONTEXT
to access the cluster with kubectl.
Deploy Airbyte Enterprise
Add Airbyte Helm Repository
Follow these instructions to add the Airbyte helm repository:
- Run
helm repo add airbyte https://airbytehq.github.io/helm-charts
, whereairbyte
is the name of the repository that will be indexed locally. - Perform the repo indexing process, and ensure your helm repository is up-to-date by running
helm repo update
. - You can then browse all charts uploaded to your repository by running
helm search repo airbyte
.
Clone & Configure Airbyte
-
git clone
the latest revision of the airbyte-platform repository -
Create a new
airbyte.yml
file in theconfigs
directory of theairbyte-platform
folder. You may also copyairbyte.sample.yml
to use as a template:
cp configs/airbyte.sample.yml configs/airbyte.yml
-
Add your Airbyte Self-Managed Enterprise license key to your
airbyte.yml
. -
Add your auth details to your
airbyte.yml
.
Configuring auth in your airbyte.yml file
- Okta
- Other
To configure SSO with Okta, add the following at the end of your airbyte.yml
file:
auth:
identity-providers:
- type: okta
domain: $OKTA_DOMAIN
app-name: $OKTA_APP_INTEGRATION_NAME
client-id: $OKTA_CLIENT_ID
client-secret: $OKTA_CLIENT_SECRET
See the following guide on how to collect this information for Okta.
To configure SSO with any identity provider via OpenID Connect (OIDC), such as Azure Entra ID (formerly ActiveDirectory), add the following at the end of your airbyte.yml
file:
auth:
identity-providers:
- type: oidc
domain: $DOMAIN
app-name: $APP_INTEGRATION_NAME
client-id: $CLIENT_ID
client-secret: $CLIENT_SECRET
See the following guide on how to collect this information for Azure Entra ID (formerly ActiveDirectory).
To configure basic auth (deploy without SSO), remove the entire auth:
section from your airbyte.yml config file. You will authenticate with the instance admin user and password included in the your airbyte.yml
.
To modify auth configurations after Airbyte is installed, you will need to redeploy Airbyte with the additional environment variable RESET_KEYCLOAK_REALM=TRUE
. As this also resets the list of Airbyte users and permissions, please use this with caution.
Configuring the Airbyte Database
For Self-Managed Enterprise deployments, we recommend using a dedicated database instance for better reliability, and backups (such as AWS RDS or GCP Cloud SQL) instead of the default internal Postgres database (airbyte/db
) that Airbyte spins up within the Kubernetes cluster.
Currently, Airbyte requires connection to a Postgres 13 instance.
We assume in the following that you've already configured a Postgres instance:
External database setup steps
- In the
charts/airbyte/values.yaml
file, disable the default Postgres database (airbyte/db
):
postgresql:
enabled: false
- In the
charts/airbyte/values.yaml
file, enable and configure the external Postgres database:
externalDatabase:
host: ## Database host
user: ## Non-root username for the Airbyte database
database: db-airbyte ## Database name
port: 5432 ## Database port number
For the non-root user's password which has database access, you may use password
, existingSecret
or jdbcUrl
. We recommend using existingSecret
, or injecting sensitive fields from your own external secret store. Each of these parameters is mutually exclusive:
externalDatabase:
...
password: ## Password for non-root database user
existingSecret: ## The name of an existing Kubernetes secret containing the password.
existingSecretPasswordKey: ## The Kubernetes secret key containing the password.
jdbcUrl: "jdbc:postgresql://<user>:<password>@localhost:5432/db-airbyte" ## Full database JDBC URL. You can also add additional arguments.
The optional jdbcUrl
field should be entered in the following format: jdbc:postgresql://localhost:5432/db-airbyte
. We recommend against using this unless you need to add additional extra arguments can be passed to the JDBC driver at this time (e.g. to handle SSL).
Configuring External Logging
For Self-Managed Enterprise deployments, we recommend spinning up standalone log storage for additional reliability using tools such as S3 and GCS instead of against using the defaul internal Minio storage (airbyte/minio
). It's then a common practice to configure additional log forwarding from external log storage into your observability tool.
External log storage setup steps
- In the
charts/airbyte/values.yaml
file, disable the default Minio instance (airbyte/minio
):
minio:
enabled: false
- In the
charts/airbyte/values.yaml
file, enable and configure external log storage:
- S3
- GKE
global:
...
log4jConfig: "log4j2-no-minio.xml"
logs:
storage:
type: "S3"
minio:
enabled: false
s3:
enabled: true
bucket: "" ## S3 bucket name that you've created.
bucketRegion: "" ## e.g. us-east-1
accessKey: ## AWS Access Key.
password: ""
existingSecret: "" ## The name of an existing Kubernetes secret containing the AWS Access Key.
existingSecretKey: "" ## The Kubernetes secret key containing the AWS Access Key.
secretKey: ## AWS Secret Access Key
password:
existingSecret: "" ## The name of an existing Kubernetes secret containing the AWS Secret Access Key.
existingSecretKey: "" ## The name of an existing Kubernetes secret containing the AWS Secret Access Key.
For each of accessKey
and secretKey
, the password
and existingSecret
fields are mutually exclusive.
- Ensure your access key is tied to an IAM user with the following policies, allowing the user access to S3 storage:
{
"Version":"2012-10-17",
"Statement":[
{
"Effect":"Allow",
"Action": "s3:ListAllMyBuckets",
"Resource":"*"
},
{
"Effect":"Allow",
"Action":["s3:ListBucket","s3:GetBucketLocation"],
"Resource":"arn:aws:s3:::YOUR-S3-BUCKET-NAME"
},
{
"Effect":"Allow",
"Action":[
"s3:PutObject",
"s3:PutObjectAcl",
"s3:GetObject",
"s3:GetObjectAcl",
"s3:DeleteObject"
],
"Resource":"arn:aws:s3:::YOUR-S3-BUCKET-NAME/*"
}
]
}
global:
...
log4jConfig: "log4j2-no-minio.xml"
logs:
storage:
type: "GCS"
minio:
enabled: false
gcs:
bucket: airbyte-dev-logs # GCS bucket name that you've created.
credentials: ""
credentialsJson: "" ## Base64 encoded json GCP credentials file contents
Note that the credentials
and credentialsJson
fields are mutually exclusive.