Clean-up xConnect Anonymous Contacts

Sitecore collects a lot of data for analytics and this data will be stored in Shard databases. Over a period of time, some of this data may be redundant. Holding on to a large amount of data can be expensive especially if you are using Azure databases. Also, a large amount of data can have an impact on performance when transacting, For example: rebuilding XDB indexes.

Some of the data in these databases can be removed as they are no longer valuable, especially the anonymous contact data old than a month. We can clean up this data in multiple ways and in Sitecore 10.1, Sitecore introduced the xConnect purging tool with Web client Api or the DevEx CLI xConnect plugin.

But the approach I’ve taken here is to clean up data directly in databases instead of using the web API client or the DevEx CLI xConnect plugin to avoid writing custom code to achieve this.

If you are interested, please check the xConnect Purging Tool

The below scripts has to be executed on both Shard0 and Shard1 databases.

Database Script to Identify Contacts older than 1 month and capture the contact id’s in the TempContacts table

SELECT ContactId INTO TempContacts FROM [xdb_collection].[Contacts] 
	WHERE LastModified < DATEADD(month, -1, GetDate()) 
	AND ContactId IN (SELECT DISTINCT contactId
			  FROM [xdb_collection].[ContactIdentifiers]
			  WHERE IdentifierType = 0 
			  AND ContactId NOT IN (
						SELECT contactId
						FROM [xdb_collection].[ContactIdentifiers]
						WHERE IdentifierType = 1
					       )
			)

TempContacts table will have all the anonymous contacts that are older than 1 month

Database Script to delete Analytics data for these anonymous contacts

DELETE FROM [xdb_collection].[ContactFacets] WHERE ContactId IN (SELECT contactId FROM TempContacts)
GO

DELETE FROM [xdb_collection].[ContactIdentifiersIndex] WHERE ContactId IN (SELECT contactId FROM TempContacts)
GO 

DELETE FROM [xdb_collection].[InteractionFacets] WHERE ContactId IN (SELECT contactId FROM TempContacts)
GO

DELETE FROM [xdb_collection].[Interactions] WHERE ContactId IN (SELECT contactId FROM TempContacts)
GO

DELETE FROM [xdb_collection].[ContactIdentifiers] WHERE ContactId IN (SELECT contactId FROM TempContacts)
GO
  
DELETE FROM [xdb_collection].[Contacts] WHERE ContactId IN (SELECT contactId FROM TempContacts)
GO

DELETE FROM [xdb_collection].[DeviceProfiles] WHERE LastKnownContactId IN (select contactId from TempContacts)
GO

Depending on the size of the Shard databases and the resources assigned for the Databases, these scripts can take anywhere from a few mins to days. I would suggest you change the database plans in Azure to higher plans until these scripts are executed.

Once these scripts are executed, depending on the amount of data in your databases, the database size will be drastically reduced (wait for index rebuild to get accurate size).

Now that we deleted Anonymous contacts, we no longer need the temp contact ids stored in the temp table, you can delete the temp table.

Drop Table TempContacts

Rebuild Indexes on SQL tables to improve the database performance, use the below script to create index rebuild scripts

SELECT 'ALTER INDEX ' + QUOTENAME(I.NAME) + ' ON ' +QUOTENAME(OBJECT_SCHEMA_NAME(I.OBJECT_ID)) + '.'+ QUOTENAME(OBJECT_NAME(I.OBJECT_ID)) + ' REBUILD ' AS [SQLQUERY]
FROM SYS.DM_DB_INDEX_PHYSICAL_STATS(DB_ID(), NULL, NULL, NULL, NULL) INS
INNER JOIN SYS.INDEXES I ON I.OBJECT_ID = INS.OBJECT_ID
AND I.INDEX_ID = INS.INDEX_ID
WHERE INS.AVG_FRAGMENTATION_IN_PERCENT > 10 
AND I.NAME IS NOT NULL
ORDER BY INS.AVG_FRAGMENTATION_IN_PERCENT DESC

The result of the above query will generate an SQL script to rebuild indexes. Copy the script from the result and execute them to rebuild indexes.

With this clean up on anonymous content is done.

Rebuild XDB indexes

Now that we have done the clean-up of data, it’s time to rebuild XDB indexes. To do this, go to the XConnect Search application and go to the following path (if using azure use Kudu Console)

/wwwroot/App_Data/jobs/continuous/IndexWorker

Run the following command to rebuild the index

Sitecore.XConnectSearchIndexer.exe -rr

You can also monitor the status of the index rebuild by running the following command

Sitecore.XConnectSearchIndexer.exe -rm

Your index rebuild will be a lot quicker than earlier.

Spread the love

Leave a Reply

Your email address will not be published. Required fields are marked *