Sitecore collects a lot of data for analytics and this data will be stored in Shard databases. Over a period of time, some of this data may be redundant. Holding on to a large amount of data can be expensive especially if you are using Azure databases. Also, a large amount of data can have an impact on performance when transacting, For example: rebuilding XDB indexes.
Some of the data in these databases can be removed as they are no longer valuable, especially the anonymous contact data old than a month. We can clean up this data in multiple ways and in Sitecore 10.1, Sitecore introduced the xConnect purging tool with Web client Api or the DevEx CLI xConnect plugin.
But the approach I’ve taken here is to clean up data directly in databases instead of using the web API client or the DevEx CLI xConnect plugin to avoid writing custom code to achieve this.
If you are interested, please check the xConnect Purging Tool
The below scripts has to be executed on both Shard0 and Shard1 databases.
Database Script to Identify Contacts older than 1 month and capture the contact id’s in the TempContacts table
SELECT ContactId INTO TempContacts FROM [xdb_collection].[Contacts]
WHERE LastModified < DATEADD(month, -1, GetDate())
AND ContactId IN (SELECT DISTINCT contactId
FROM [xdb_collection].[ContactIdentifiers]
WHERE IdentifierType = 0
AND ContactId NOT IN (
SELECT contactId
FROM [xdb_collection].[ContactIdentifiers]
WHERE IdentifierType = 1
)
)
TempContacts table will have all the anonymous contacts that are older than 1 month
Database Script to delete Analytics data for these anonymous contacts
DELETE FROM [xdb_collection].[ContactFacets] WHERE ContactId IN (SELECT contactId FROM TempContacts)
GO
DELETE FROM [xdb_collection].[ContactIdentifiersIndex] WHERE ContactId IN (SELECT contactId FROM TempContacts)
GO
DELETE FROM [xdb_collection].[InteractionFacets] WHERE ContactId IN (SELECT contactId FROM TempContacts)
GO
DELETE FROM [xdb_collection].[Interactions] WHERE ContactId IN (SELECT contactId FROM TempContacts)
GO
DELETE FROM [xdb_collection].[ContactIdentifiers] WHERE ContactId IN (SELECT contactId FROM TempContacts)
GO
DELETE FROM [xdb_collection].[Contacts] WHERE ContactId IN (SELECT contactId FROM TempContacts)
GO
DELETE FROM [xdb_collection].[DeviceProfiles] WHERE LastKnownContactId IN (select contactId from TempContacts)
GO
Depending on the size of the Shard databases and the resources assigned for the Databases, these scripts can take anywhere from a few mins to days. I would suggest you change the database plans in Azure to higher plans until these scripts are executed.
Once these scripts are executed, depending on the amount of data in your databases, the database size will be drastically reduced (wait for index rebuild to get accurate size).
Now that we deleted Anonymous contacts, we no longer need the temp contact ids stored in the temp table, you can delete the temp table.
Drop Table TempContacts
Rebuild Indexes on SQL tables to improve the database performance, use the below script to create index rebuild scripts
SELECT 'ALTER INDEX ' + QUOTENAME(I.NAME) + ' ON ' +QUOTENAME(OBJECT_SCHEMA_NAME(I.OBJECT_ID)) + '.'+ QUOTENAME(OBJECT_NAME(I.OBJECT_ID)) + ' REBUILD ' AS [SQLQUERY]
FROM SYS.DM_DB_INDEX_PHYSICAL_STATS(DB_ID(), NULL, NULL, NULL, NULL) INS
INNER JOIN SYS.INDEXES I ON I.OBJECT_ID = INS.OBJECT_ID
AND I.INDEX_ID = INS.INDEX_ID
WHERE INS.AVG_FRAGMENTATION_IN_PERCENT > 10
AND I.NAME IS NOT NULL
ORDER BY INS.AVG_FRAGMENTATION_IN_PERCENT DESC
The result of the above query will generate an SQL script to rebuild indexes. Copy the script from the result and execute them to rebuild indexes.
With this clean up on anonymous content is done.
Rebuild XDB indexes
Now that we have done the clean-up of data, it’s time to rebuild XDB indexes. To do this, go to the XConnect Search application and go to the following path (if using azure use Kudu Console)
/wwwroot/App_Data/jobs/continuous/IndexWorker
Run the following command to rebuild the index
Sitecore.XConnectSearchIndexer.exe -rr
You can also monitor the status of the index rebuild by running the following command
Sitecore.XConnectSearchIndexer.exe -rm
Your index rebuild will be a lot quicker than earlier.