Prerequisites
Multi-tenant databases
Due to the AWS RDS limitation where you cannot connect to the CDB and only being able to create local users, Streamkap does not support Oracle multi-tenant databases on AWS RDS.Standby databases
An Oracle database can be configured with either a physical or a logical standby database for recovery after a production failure. At this time, Streamkap does not support them.- Oracle 12c or above, Standard or Enterprise Edition
- AWS RDS endpoint and port of the database
- (single-tenant architecture): Database name
- The Oracle database master user credentials or equivalent
- An AWS console account with administrator access to the database
Oracle Setup
For the Connector to ingest changes from your database it is dependent on Oracle’s redo logs and archive logs. It is important that these redo logs are large enough and the archive logs are retained for long enough to ensure all changes are captured.1. Enable Archive Logs
When redo logs fill up, Oracle archives groups of them into archive logs. For Oracle on RDS, archiving is enabled when AWS automated backups is enabled.1.1) Enable AWS automated backups
Reboot required
When automated backups are enabled, your RDS instance and database are taken offline and a backup is immediately created which can take some time- Sign in to and open the Amazon RDS console at https://console.aws.amazon.com/rds/
- In the left-side navigation menu, choose Databases. The Databases page should appear
- Select the DB instance that you want to modify
- Click Modify. The Modify DB instance page should appear
- For Backup retention period, choose at least 1 or higher
- Click Continue
- Select Apply immediately
- On the confirmation page, click Modify DB instance to enable automated backups
LOG_MODE
is ARCHIVELOG
then it is enabled.
Or
- Sign into your AWS account
- Once signed in, navigate to the RDS dashboard by clicking on Services in the top left corner, Databases and then RDS or by typing RDS into the top left search box
- From the AWS RDS Dashboard, click on DB Instances or Databases in the left side menu
- Click on the DB identifier for the Oracle database you want Streamkap to use
- Click on the Maintenance & backups tab
- Under the Backups section, check Automated backups says “Enabled (N Days)“
1.2) Configure log retention
Archive logs should be retained for at least 24 hours. However, we recommend retaining them for longer, if possible. Too short a retention period and changes may not be captured and processed.Retention periods and database storage
Archive logs are retained on your database instance using up its storage capacity. It is important to make sure it has enough space, otherwise, performance issues and outages can occur.In general, the more tables (and columns) there are, the more capacity is required. For the Connector, additional capacity is essential because it depends on Oracle’s supplemental logging.Assuming your Oracle database has supplemental logging enabled already, to estimate storage capacity you can look at the last 1 hour of log storage usage and multiply that byarchivelog retention hours
. Here’s an example script for that:SQL
2. Enable LogMiner
For the Connector to query the redo and archive logs, it is dependent on the Oracle LogMiner utility. To enable that, supplemental logging must be enabled.SUPPLEMENTAL_LOG_DATA_MIN
is YES
then it is enabled.
2.1 Resize redo logs
An Amazon RDS Oracle instance starts with four, online redo log files, 128MB each. That is too small; more log files are necessary, and the logs should be resized to at least 1024MB or more, especially for production databases. Before making any changes, run this query to check the current log sizes:GROUP# | SIZE_MB | STATUS |
---|---|---|
1 | 128 | INACTIVE |
2 | 128 | CURRENT |
3 | 128 | INACTIVE |
4 | 128 | INACTIVE |
GROUP# | SIZE_MB | STATUS |
---|---|---|
1 | 128 | INACTIVE |
2 | 128 | CURRENT |
3 | 128 | INACTIVE |
4 | 128 | INACTIVE |
5 | 1024 | UNUSED |
6 | 1024 | UNUSED |
… | … | … |
INACTIVE
, drop them and run the query from earlier to confirm they have been dropped:
GROUP# | SIZE_MB | STATUS |
---|---|---|
2 | 128 | CURRENT |
5 | 1024 | UNUSED |
6 | 1024 | UNUSED |
… | … | … |
CURRENT
log so we can drop it and run the query from earlier to confirm it has switched.
GROUP# | SIZE_MB | STATUS |
---|---|---|
2 | 128 | ACTIVE |
5 | 1024 | CURRENT |
6 | 1024 | UNUSED |
… | … | … |
STATUS
of the log we want to drop is still ACTIVE
, we need to issue a checkpoint to make it INACTIVE
.
GROUP# | SIZE_MB | STATUS |
---|---|---|
5 | 1024 | CURRENT |
6 | 1024 | UNUSED |
… | … | … |
3. Create Database User
The script below creates a database user with privileges for the Connector.4. Enable Snapshots
To backfill your data, the Connector needs to be able to perform Snapshots (See Snapshots & Backfilling for more information). To enable this process, a table must be created for the Connector to use.Please create the signal table with the name
STREAMKAP_SIGNAL
. It will not be recognised if given another name.Streamkap Setup
- Go to Sources and choose Oracle on Amazon RDS
-
Input the following information:
- Name - A unique and memorable name for this Connector
- Hostname - The database endpoint
- Port (optional) - The database port
- Username (case sensitive) -
STREAMKAP_USER
or the username you chose - Password - The database user’s password
- Heartbeat - Required for low volume connectors
- Signal Table Schema - Streamkap will use a collection in this schema to manage snapshots e.g.
public
. See Enable Snapshots for more information - Database - The database name (single-tenant architecture)
- Connect via SSH Tunnel. See SSH Tunnel
- Advanced Parameters
- Represent Binary Data As (Default
bytes
)
- Represent Binary Data As (Default
- Add Schemas/Tables. Can also bulk upload here. The format is a simple list of each schema or table per row saved in csv format without a header.
- Click Save
Frequently Asked Questions
How do I find the database endpoint and port?
How do I find the database endpoint and port?
If you use any database tools such as Oracle SQL Developer to interact with the database, you’ll find them in the connection configuration you have saved in those tools.Alternatively, you can find them in the AWS account:
- Sign into your AWS account
- Once signed in, navigate to the RDS dashboard by clicking on Services in the top left corner, Databases and then RDS or by typing RDS into the top left search box
- From the AWS RDS Dashboard, click on DB Instances or Databases in the left side menu
- Click on the DB identifier for the Oracle database you want Streamkap to use. The Database Details page should appear
- Under the Connectivity & security section you will find Endpoint & port
Does the AWS automated backups retention period need to be large enough to accommodate the `archivelog retention hours`?
Does the AWS automated backups retention period need to be large enough to accommodate the `archivelog retention hours`?
No, they are mostly independent of each other.
archivelog retention hours
determines how long the logs are retained locally in the database storage- AWS automated backups retention period determines how long the logs are retained by AWS outside of the database storage
archivelog retention hours
, they are removed from the database storage and then retained by AWS - outside of the database storage - for the AWS automated backups retention period.The AWS automated backups are there so you can recover your database in the event of a disaster. The retention period doesn’t impact the Streamkap Connector, only the archivelog retention hours
does.Why does the Connector need CREATE TABLE privileges?
Why does the Connector need CREATE TABLE privileges?
Oracle has a background process called the “Log Writer” or LGWR for short. Logs are written to an in-memory buffer first, and then the LGWR writes them to disk.The Connector needs to keep track of the last recorded system change number (SCN) that the LGWR process records for each committed transaction.It’s best to persist that SCN somewhere outside of the Connector so it creates a very small, 1 column table named
LOG_MINING_TABLE
. If the Connector fails, it can use the last recorded value from that table to recover.Do I need to resize my redo logs for a database with low data volume and traffic?
Do I need to resize my redo logs for a database with low data volume and traffic?
We recommend it because the supplemental logging configuration from earlier in this guide increases the level of change tracking to include data that’s essential for the Streamkap Connector.We would also recommend for databases with low traffic to also enable the heartbeats feature.