SQL Server on Google Cloud SQL
SQL Server Change Data Capture on Google Cloud SQL with Streamkap
Prerequisites
- SQL Server version ≥ 2016 Service Pack 1 Standard or Enterprise Edition
- A database user with sufficient privileges to configure the database, including enabling CDC, CT and creating users
SQL Server Setup
1. Grant Database Access
- Configure one of the Connection Options to ensure Streamkap can reach your database.
2. Enable Change Data Capture
Before you can use CDC with your Google Cloud SQL instance, you must enable it on the database.
-- Replace { ... } placeholders as required
EXEC msdb.dbo.gcloudsql_cdc_enable_db '{database}'
Microsoft recommends keeping CDC data files separate from your primary database files.
-- Replace { ... } placeholders as required
ALTER DATABASE {database} ADD FILEGROUP Streamkap_ChangeTracking;
ALTER DATABASE {database} ADD FILE (
NAME = Streamkap_ChangeTracking_Data,
FILENAME = N'{path}\Streamkap_ChangeTracking_Data.ndf'
) TO FILEGROUP Streamkap_ChangeTracking;
GO
After CDC is enabled on the database, for every table you want Streamkap to capture, enable change tracking on that using the script below:
-- Replace { ... } placeholders as required
EXEC sys.sp_cdc_enable_table
@source_schema = N'{schema}',
@source_name = N'{table}',
@role_name = N'streamkap_role',
@filegroup_name = N'Streamkap_ChangeTracking',
@supports_net_changes = 0
GO
3. Create Database User
It's recommended to create a separate user and role for Streamkap to access your SQL Server database. Below is an example script that does that.
-- Replace {database}, {schema} and {password} placeholders as required
USE {database};
GO
CREATE LOGIN streamkap_user WITH PASSWORD = '{password}';
CREATE USER streamkap_user FOR LOGIN streamkap_user;
CREATE ROLE streamkap_role;
ALTER ROLE streamkap_role ADD MEMBER streamkap_user;
GRANT SELECT ON SCHEMA::{schema} TO streamkap_role;
GO
4. Enable Snapshots
To backfill your data, the Connector needs to be able to perform snapshots (See Snapshots & Backfilling for more information). To enable this process, a table must be created for the Connector to use.
Please create the signal table with the name
streamkap_signal
. It will not be recognised if given another name.
CREATE SCHEMA streamkap;
-- Create the table within the schema
CREATE TABLE streamkap.streamkap_signal (
id VARCHAR(255) PRIMARY KEY,
type VARCHAR(32) NOT NULL,
data VARCHAR(2000) NULL
);
-- Enable change tracking on the signal table
EXEC sys.sp_cdc_enable_table
@source_schema = N'streamkap',
@source_name = N'streamkap_signal',
@role_name = N'streamkap_role',
@filegroup_name = N'Streamkap_ChangeTracking',
@supports_net_changes = 0
GO
-- Grant permission on the signal table to the Streamkap role
GRANT SELECT, UPDATE, INSERT ON streamkap.streamkap_signal TO streamkap_role;
Streamkap Setup
Follow these steps to configure your new connector:
1. Create the Source
- Navigate to Add Connectors.
- Choose SQL Server.
2. Connection Settings
-
Name: Enter a name for your connector.
-
Endpoint: Specify the endpoint of the SQL Server database.
-
Port: Default is
1433
. -
Connect via SSH Tunnel: The Connector will connect to an SSH server in your network which has access to your database. This is necessary if the Connector cannot connect directly to your database.
- See SSH Tunnel for setup instructions.
-
Username: Username to access the database. By default, Streamkap scripts use
streamkap_user
. -
Password: Password to access the database.
-
Source Database: Specify the database to stream data from.
-
Heartbeats: Crucial for low and intermittent traffic databases. Enabled by default.
- Heartbeat Table Schema: If configured, this allows the connector to receive and acknowledge changes from low and intermittent traffic databases and prevent offsets from become stale.
- See SQL Server Heartbeats for setup instructions.
- Heartbeat Table Schema: If configured, this allows the connector to receive and acknowledge changes from low and intermittent traffic databases and prevent offsets from become stale.
3. Snapshot Settings
- Signal Table Schema: The Connector will use a table in this schema to manage snapshots. See Enable Snapshots for setup instructions.
4. Advanced Parameters
- Represent binary data as: Specifies how the data for binary columns e.g.
blob
,binary
,varbinary
should be interpreted. Your destination for this data can impact which option you choose. Default isbytes
.
Click Next.
5. Schema and Table Capture
- Add Schemas/Tables: Specify the schema(s) and table(s) for capture
- You can bulk upload here. The format is a simple list of schemas and tables, with each entry on a new row. Save as a
.csv
file without a header.
- You can bulk upload here. The format is a simple list of schemas and tables, with each entry on a new row. Save as a
Click Save.
Troubleshooting
Table schema evolution and change tracking tables
When CDC is enabled for a table and changes occur to it, change events are persisted to a 'change table' on the SQL Server database server.
If you introduce a change in the structure of the source table, for example, by adding a new column, that change is not reflected in the change table.
For as long as the change table continues to use the outdated table structure, the Streamkap SQL Server Source is unable to capture change events for the table correctly.
You must intervene to refresh the change table structure because of the way that CDC is implemented in SQL Server.
Refresh change table structure (Online)
Online refresh limitationIn the interval between the source table structure changing in the source database, and _before _the change table structure is refreshed, change events continue to be captured with the outdated table structure.
For example, if you added a new column to a source table, change events that are captured before the change table is refreshed will not contain the new column.
If this cannot be tolerated, an Offline refresh has to be performed. However, that means downtime for whatever system made structural changes to your tables and your Streamkap pipelines.
Please contact us for assistance if an Offline refresh is required.
The procedure for completing an Online refresh is simpler than running it Offline, and you can complete it without any downtime to your systems and Streamkap pipelines.
For every source table that has changed, copy paste the script below into SQL Server Management Studio, change placeholders as required, and then run all queries.
If you're not sure what
{refresh_table}
name to use, use{schema}_{table}_v{N}
. For example, if the source table issales.orders
then you'd usesales_orders_v2
-- Replace {database}, {schema} and {table} placeholders
-- {database}: name of the CDC enabled database
-- {schema}: name of the schema with tables to refresh
-- {table}: name of the table to refresh
-- {refresh_table}: a unique name for the refreshed change table
USE {database};
GO
EXEC sys.sp_cdc_enable_table
@source_schema = N'{schema}',
@source_name = N'{table}',
@role_name = N'streamkap_role',
@filegroup_name = N'Streamkap_ChangeTracking',
@supports_net_changes = 0,
@capture_instance = N'{refresh_table}'
GO
Refresh table limitationThere cannot be more than 2 change tables for every source table.
After refreshing a change table using the above script, confirm with Streamkap Support that your SQL Server Source has started streaming from the refreshed change table. Once confirmed, disable CDC on the outdated change table. Use the script below to do that.
-- Replace {database}, {schema}, {table} and {refresh_table} placeholders -- {database}: name of the CDC enabled database -- {schema}: name of the schema with the table refreshed earlier -- {table}: name of the table refreshed earlier -- {refresh_table}: name of the previous refresh table, usually {schema}_{table} USE {database}; GO EXEC sys.sp_cdc_disable_table @source_schema = N'{schema}', @source_name = N'{table}', @capture_instance = N'{refresh_table}' GO
Two capture instances already exist for source table
If you're getting this error message when refreshing the change table structure, it's because there cannot be more than 2 change tables for every source table.
To fix the problem, 1 of the 2 change tables for the source table need to be disabled. The scripts below can help you do that.
-- Replace {database}, {schema} and {table} placeholders
-- {database}: name of the CDC enabled database
-- {schema}: name of the schema with the table refreshed earlier
-- {table}: name of the table refreshed earlier
USE {database};
GO
EXEC sys.sp_cdc_help_change_data_capture
@source_schema = N'{schema}',
@source_name = N'{table}'
GO
The above script should return 2 results, the 2 change tables for the {table}
specified. Typically you would disable the oldest change table, so use the create_date
column to identify the oldest one.
When you've identified the change table to disable, use itssource_schema
, source_table
and capture_instance
names in the query below and execute. Then, try your refresh table script again.
-- Replace {database}, {schema}, {table} and {refresh_table} placeholders
-- {database}: name of the CDC enabled database
USE {database};
GO
EXEC sys.sp_cdc_disable_table
@source_schema = N'{source_schema}',
@source_name = N'{source_table}',
@capture_instance = N'{capture_instance}'
GO
The Connector's running but there is no data
There can be a number of reasons. The most common are misconfiguration of CDC and privilege grants.
To diagnose, run the following script:
-- Replace {database} placeholder
-- {database}: name of the CDC enabled database
USE {database};
GO
EXEC sys.sp_cdc_help_change_data_capture
GO
If the query returns an error or no results check:
- you are connected to the SQL Server database with a user that has
DB_OWNER
role privileges. - if the
EXEC sys.sp_cdc_help_change_data_capture
stored procedure returns no results, the source tables you want Streamkap to capture may not be CDC enabled.
If the query returns results, check:
- the table(s) captured by the Connector are listed.
- there are
capture_instance
s for the capture tables. - the database user or role used by the Connector are members of the
rolename
for the capture table(s).
If you are still having issues after following the above steps, please don't hesitate to reach out to us.
The SQL Server Setup scripts are failing
There can be many reasons for the Setup scripts to fail, but the scripts below can help you diagnose the issues.
-- Replace {database} placeholder
-- {database}: name of the CDC enabled database
USE {database};
GO
SELECT name, database_id, source_database_id, compatibility_level, is_read_only, state, state_desc, is_in_standby, is_cleanly_shutdown, is_cdc_enabled, is_encrypted, replica_id
FROM sys.databases
WHERE name = '{database}' AND is_cdc_enabled=1;
EXEC sys.sp_cdc_help_change_data_capture
GO
If any of the queries return an error or no results:
- Check you connected to the SQL Server database with a user that has
DB_OWNER
role privileges - If the
SELECT ... FROM sys.databases ...
returns no results, the database you want Streamkap to capture may not be CDC enabled - If the
EXEC sys.sp_cdc_help_change_data_capture
stored procedure returns no results, the source tables you want Streamkap to capture may not be CDC enabled
If you are still having issues after following the above steps, please don't hesitate to reach out to us.
Updated 3 days ago