> ## Documentation Index
> Fetch the complete documentation index at: https://docs.streamkap.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Oracle Source FAQ

## Oracle Sources FAQ for Streamkap

This FAQ focuses on using Oracle Database as a source in Streamkap, including self-hosted setups and cloud variants (AWS RDS Oracle). Streamkap's Oracle connector provides real-time CDC with managed features like automatic scaling, UI setup, and ETL transformations.

<AccordionGroup>
  <Accordion title="What log reading methods does Streamkap support for Oracle?">
    Streamkap's Oracle Source supports **LogMiner**, **XStream**, and **OpenLogReplicator** as log reading methods.
  </Accordion>

  <Accordion title="What is an Oracle source in Streamkap?">
    An Oracle source in Streamkap enables real-time Change Data Capture (CDC) from Oracle databases, capturing row-level inserts, updates, and deletes with sub-second latency. It uses LogMiner to read the redo logs and stream changes to destinations, supporting snapshots for initial loads, schema evolution, and heartbeats for low-traffic databases. Streamkap abstracts complexity, offering a serverless setup via UI or API.
  </Accordion>

  <Accordion title="What Oracle versions are supported as sources?">
    * Oracle Database 11g+ for basic CDC; 12c+ for enhanced features like multitenant architecture and JSON support; 19c+ for latest optimizations and cloud compatibility.
    * Compatible with Oracle Database 11.2+
    * Oracle RAC (Real Application Clusters) supported
  </Accordion>

  <Accordion title="What Oracle deployments are supported?">
    Streamkap supports:

    * Self-hosted (on-prem/VM)
    * AWS RDS Oracle (Standard Edition, Enterprise Edition)
    * Oracle Cloud Infrastructure (OCI)
    * Oracle RAC (Real Application Clusters)

    Streamkap also supports standalone instances, Data Guard configurations, and Oracle Exadata.
  </Accordion>

  <Accordion title="What are the key features of Oracle sources in Streamkap?">
    * **CDC**: LogMiner-based via redo logs; captures inserts/updates/deletes with SCN (System Change Number) tracking
    * **Snapshots**: Ad-hoc/initial backfills using signal table methods; locking or non-locking modes
    * **Schema Evolution**: Automatic handling of DDL changes when enabled; tracks table structure modifications
    * **Heartbeats**: Enabled by default to prevent redo log buildup in low-traffic databases
    * **Data Types**: Standard mappings (numerics, temporals, strings, LOBs, RAW as bytes/base64/hex, XML, JSON in 12c+)
    * **Ingestion Modes**: Inserts (append) or upserts
    * **Security**: SSL/TLS, Oracle wallet, IP allowlisting
    * **Monitoring**: Latency, lag, redo log metrics, LogMiner session stats in-app
    * Streamkap adds transaction metadata and ad-hoc snapshots with filters
  </Accordion>

  <Accordion title="How does CDC work for Oracle sources?">
    Streamkap uses Oracle LogMiner to read and parse redo log files, extracting committed change events. LogMiner queries are executed against redo logs to retrieve SQL statements and change data, which are then emitted as change events. The connector tracks progress using SCN (System Change Number).

    **Requirements:**

    * Supplemental logging must be enabled (database and table level)
    * Archive log mode must be enabled
    * LogMiner privileges required for the Streamkap user
  </Accordion>

  <Accordion title="What is supplemental logging and why does it matter?">
    Supplemental logging instructs Oracle to include additional information in redo logs beyond what's needed for database recovery. This extra data is essential for CDC.

    **Types:**

    * **Database-level**: `ADD SUPPLEMENTAL LOG DATA`
    * **Table-level**: `ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS` (recommended for complete change records)

    **Why it matters:**

    * Without supplemental logging, redo logs may not contain complete before/after images
    * Affects ability to capture deletes and updates with full column data
    * Must be enabled before CDC starts capturing

    **Setting:**

    ```sql theme={null}
    -- Database level
    ALTER DATABASE ADD SUPPLEMENTAL LOG DATA;

    -- Table level (for complete records)
    ALTER TABLE schema.table ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
    ```
  </Accordion>

  <Accordion title="How do snapshots work for Oracle sources?">
    * Trigger ad-hoc at source/table level
    * Requires a signal table (`streamkap_signal`)
    * Methods: Blocking (uses locks) or incremental (chunked by ROWID or primary key)
    * Modes: `initial` (default), `always`, `initial_only`, `no_data`, `when_needed`, `configuration_based`, `custom`

    Streamkap simplifies triggering via UI.
  </Accordion>

  <Accordion title="What are heartbeats and how do they work?">
    Connectors use "offsets"—like bookmarks—to track their position in the database's log or change stream. When no changes occur for long periods, these offsets may become outdated, and the Connector might lose its place or stop capturing changes.

    Heartbeats ensure the Connector stays active and continues capturing changes.

    There are two layers of heartbeat protection:

    #### Layer 1: Connector heartbeats (enabled by default)

    The Connector periodically emits heartbeat messages to an internal topic, even when no actual data changes are detected. This keeps offsets fresh and prevents staleness.

    No configuration is necessary for this layer; it is automatically enabled. We recommend keeping this layer enabled for all deployments.

    #### Layer 2: Source database heartbeats (recommended)

    <Info>
      **Why we recommend configuring Layer 2**

      While Layer 2 is **crucial** for low-traffic or intermittent databases, we recommend configuring it for all deployments. It provides additional resilience and helps prevent issues during periods of inactivity.
    </Info>

    You can configure regular updates to a dedicated heartbeat table in the source database. This simulates activity, ensuring change events are generated consistently, maintaining log progress and providing additional resilience.

    How this layer is configured depends on the connection type (if supported by the Source):

    * **Read-write connections** (when **Read only** is **No** during Streamkap Setup): The Connector updates the heartbeat table directly.
    * **Read-only connections** (when **Read only** is **Yes** during Streamkap Setup): A scheduled job on the **primary** database updates the heartbeat table, and these changes replicate to the read replica for the Connector to consume.

    This layer requires you to set up a heartbeat table—and for read-only connections, a scheduled job (e.g., `pg_cron` for PostgreSQL, `event_scheduler` for MySQL)—on your source database.

    For read-write connections (when **Read only** is **No** during Streamkap Setup), the Connector writes to the heartbeat table directly.

    <CodeGroup>
      ```SQL SQL theme={null}
      -- Replace {...} placeholders as needed
      ALTER SESSION SET CONTAINER={PDB};

      -- Create the heartbeat table with id, text, and last_update fields
      CREATE TABLE STREAMKAP_USER.STREAMKAP_HEARTBEAT (
          id NUMBER GENERATED BY DEFAULT ON NULL AS IDENTITY PRIMARY KEY,
          text VARCHAR2(4000),
          last_update TIMESTAMP DEFAULT CURRENT_TIMESTAMP
      );

      -- Grant permission to the Streamkap user
      GRANT SELECT, UPDATE, INSERT, DELETE ON STREAMKAP_USER.STREAMKAP_HEARTBEAT TO STREAMKAP_USER;

      -- Grant necessary privileges on the table to the common user
      GRANT SELECT, UPDATE, INSERT, DELETE ON STREAMKAP_USER.STREAMKAP_HEARTBEAT TO C##STREAMKAP_USER;

      -- Insert the first row into the heartbeat table
      INSERT INTO STREAMKAP_USER.STREAMKAP_HEARTBEAT (text) VALUES ('test_heartbeat');
      ```
    </CodeGroup>
  </Accordion>

  <Accordion title="What data types are supported?">
    * **Basics**: Numbers (NUMBER, INTEGER, FLOAT), strings (VARCHAR2, CHAR, NVARCHAR2, NCHAR, CLOB), dates/timestamps (DATE, TIMESTAMP, TIMESTAMP WITH TIME ZONE)
    * **Advanced**: Binary (RAW, LONG RAW, BLOB as bytes/hex), XML (XMLTYPE), JSON (12c+, stored as VARCHAR2/CLOB)
    * **Custom**: ROWID, UROWID, spatial types (SDO\_GEOMETRY) with limitations
    * **Unsupported**: BFILE (external file pointers), deprecated LONG types (use CLOB instead), nested tables, VARRAYs without flattening, Oracle object types without conversion
  </Accordion>

  <Accordion title="How to monitor redo logs for Oracle sources?">
    * Check archive log generation rate:
      ```sql theme={null}
      SELECT name, sequence#, first_time, next_time 
      FROM v$archived_log 
      WHERE first_time > SYSDATE - 1
      ORDER BY first_time;
      ```
    * Monitor LogMiner sessions:
      ```sql theme={null}
      SELECT * FROM v$logmnr_contents WHERE ROWNUM <= 10;
      ```
    * Check redo log retention and space:
      ```sql theme={null}
      SELECT * FROM v$log;
      ```

    **Best Practices**:

    * Retain archive logs for at least 3-5 days
    * Alert on archive log destination disk space
    * Monitor supplemental logging overhead
    * Track LogMiner query performance
  </Accordion>

  <Accordion title="Can CDC capture database Views, Materialized Views, and other virtual objects?">
    **No, CDC cannot capture Views or most virtual database objects.**

    **Why Views cannot be captured:**\
    CDC captures changes by reading the database transaction log (binlog, WAL, oplog, redo log, etc.). Views are query-time computations over base tables—they don't store data or generate transaction log entries. When you query a view, the database engine executes the underlying SELECT statement against the base tables. Since views don't store data, they don't generate transaction log entries.

    **What cannot be captured:**

    * **Views**: Virtual tables with no physical storage or redo log entries
    * **Materialized Views (MVs)**: While they have physical storage, their refresh operations may not generate standard DML redo logs, or may generate bulk operations that are difficult to track incrementally. **Solution**: Capture the base tables that feed the materialized view, then recreate the MV logic downstream.
    * **Materialized View Logs**: Internal structures for fast refresh; not user data
    * **Global Temporary Tables (GTT)**: Session or transaction-scoped, redo logging depends on ON COMMIT setting, generally excluded from CDC
    * **External Tables**: Reference files outside the database, no redo logging
    * **Index-Organized Tables (IOT)**: Can be captured, but with special considerations for ROWID-based snapshots
    * **System Tables** (DBA\_\*, ALL\_\*, USER\_\*, V\$\*): Metadata views, not user data
    * **CTEs (Common Table Expressions)**: Query-time constructs with no persistent storage
    * **Object Tables**: Tables based on Oracle object types; require special handling

    **Solution:**\
    Configure CDC on the underlying base tables that power your views. The view logic can be recreated in your destination or transformation layer.

    **Oracle-specific notes:**

    * **Materialized Views**: Depending on refresh method (COMPLETE, FAST, FORCE), may generate redo logs, but tracking incremental changes is problematic. Always capture the base tables instead.
    * **Materialized View with ON COMMIT refresh**: Generates redo logs but as part of transaction commit; difficult to associate with source changes
    * **Partitioned Views**: Capture each base table in the partition view union
    * **Index-Organized Tables**: Can be captured, but ensure supplemental logging includes the primary key (which is the physical organization key)

    **Example:**\
    If you have a view `V_CUSTOMER_ORDERS` that joins tables `CUSTOMERS` and `ORDERS`, enable supplemental logging and CDC on the `CUSTOMERS` and `ORDERS` base tables, then recreate the join logic in your destination or transformation layer.

    If you have a materialized view `MV_DAILY_SALES` that aggregates from `SALES` table, capture the `SALES` table and perform the aggregation downstream.
  </Accordion>

  <Accordion title="What are common limitations?">
    * Archive log mode must be enabled (not enabled by default in many deployments)
    * Supplemental logging overhead on write performance
    * LogMiner has performance impact on high-transaction databases
    * Large transactions can cause memory pressure in LogMiner sessions
    * Redo log space management critical for continuous CDC
    * Schema changes may require connector restart to refresh table metadata
    * Long-running transactions can delay change event visibility
    * Oracle RAC requires all nodes to have archive logs accessible
  </Accordion>

  <Accordion title="How to handle deletes?">
    Captures deletes as events with before-images, provided supplemental logging is enabled with ALL COLUMNS. Without proper supplemental logging, delete events may only contain primary key values.
  </Accordion>

  <Accordion title="What security features are available?">
    Encrypted connections (SSL/TLS), Oracle wallet for credential management, role-based access, IP allowlisting, support for Oracle Network Encryption.
  </Accordion>

  <Accordion title="Why does the Connector need CREATE TABLE privileges?">
    Oracle has a background process called the "Log Writer" or LGWR for short. Logs are written to an in-memory buffer first, and then the LGWR writes them to disk.

    The Connector needs to keep track of the last recorded system change number (SCN) that the LGWR process records for each committed transaction.

    It's best to persist that SCN somewhere outside of the Connector so it creates a very small, 1 column table named `LOG_MINING_TABLE`. If the Connector fails, it can use the last recorded value from that table to recover.
  </Accordion>

  <Accordion title="Do I need to resize my redo logs for a database with low data volume and traffic?">
    We recommend it because the supplemental logging configuration increases the amount of data in the redo logs, data that's required for tracking changes to your data, including its schema.

    We would also recommend for databases with low traffic to also enable the heartbeats feature.
  </Accordion>

  <Accordion title="How do I find the Container and pluggable database names?">
    Connect to the database and run this script to list the available container (CDB) and pluggable database (PDB) names.

    ```sql theme={null}
    SELECT NAME, CDB, CON_ID FROM V$DATABASE ORDER BY CON_ID;
    ```

    For the Container database names, the `CDB` column should show `YES`.
  </Accordion>

  <Accordion title="Can I create an Oracle local user instead?">
    At this time, no. The Connector makes calls to LogMiner APIs from within the root database. Additionally, it consults several `V$` tables which are not available from within a PDB.
  </Accordion>

  <Accordion title="How do I find the database endpoint and port? (AWS RDS)">
    If you use any database tools such as Oracle SQL Developer to interact with the database, you'll find them in the connection configuration you have saved in those tools.

    Alternatively, you can find them in the AWS account:

    1. Sign into your AWS account
    2. Once signed in, navigate to the **RDS** dashboard by clicking on **Services** in the top left corner, **Databases** and then **RDS** or by typing **RDS** into the top left search box
    3. From the AWS RDS Dashboard, click on **DB Instances** or **Databases** in the left side menu
    4. Click on the **DB identifier** for the Oracle database you want Streamkap to use. The **Database Details** page should appear
    5. Under the **Connectivity & security** section you will find **Endpoint & port**
  </Accordion>

  <Accordion title="Does the AWS automated backups retention period need to be large enough to accommodate the archivelog retention hours?">
    No, they are mostly independent of each other.

    * `archivelog retention hours` determines how long the logs are retained **locally** in the database storage
    * AWS automated backups retention period determines how long the logs are retained by AWS **outside** of the database storage

    When archive logs have existed for longer than the `archivelog retention hours`, they are removed from the database storage and then retained by AWS - outside of the database storage - for the AWS automated backups retention period.

    The AWS automated backups are there so you can recover your database in the event of a disaster. The retention period doesn't impact the Streamkap Connector, only the `archivelog retention hours` does.
  </Accordion>

  <Accordion title="Troubleshooting common issues">
    * **CDC Not Working**: Verify archive log mode enabled; check supplemental logging at database and table levels
    * **Redo Log Buildup**: Enable heartbeats; monitor archive log destination space; adjust retention
    * **Missing Events**: Ensure supplemental logging includes ALL COLUMNS; verify LogMiner privileges
    * **Performance Issues**: Monitor LogMiner session resource usage; limit captured tables; tune redo log size
    * **Connection Failures**: Check listener status, TNS configuration, firewall rules
    * **Schema Changes Not Reflected**: Restart connector to refresh metadata after DDL operations
  </Accordion>

  <Accordion title="How can I optimize schema history for large databases?">
    For large database instances with many databases or tables, schema history can impact performance. Learn about optimization settings that can reduce schema history topic size and improve connector startup times.

    See the [Schema History Optimization](/schema-history-optimization) guide for detailed information on when and how to use these settings.
  </Accordion>

  <Accordion title="Best practices for Oracle sources">
    * Use dedicated CDC user with minimal required privileges (LogMiner, SELECT on tables)
    * Enable supplemental logging with ALL COLUMNS for complete change records
    * Ensure archive log mode is enabled and monitored
    * Set appropriate archive log retention (3-5 days minimum)
    * Limit capture to needed schemas/tables to reduce LogMiner overhead
    * Monitor redo log generation rate and space
    * Test schema evolution procedures in staging
    * For cloud: Use provider monitoring; enable heartbeats for low traffic
    * Document supplemental logging configuration for team reference
    * Consider impact of supplemental logging on write-heavy workloads
  </Accordion>

  <Accordion title="Planning a database version upgrade?">
    Database version upgrades require careful planning to avoid data loss and minimize downtime for your CDC pipelines. See our [Database Upgrade Guide](/database-upgrade-guide) for step-by-step instructions.
  </Accordion>
</AccordionGroup>
