Ever had random errors show up when performing an upgrade? Maybe the upgrade won't succeed or it succeeds but gives you some random errors later on? Read on to find out what problems I experienced and how I found the root cause to a "successful" upgrade which actually failed miserably.
We run between 20-30 Kentico upgrades each year on sites with little or no customizations to many integrations and customizations and everywhere in between. Being able to properly estimate those upgrades is very critical for yours and your client's success. Having some random database issues popup in the middle of this upgrade can cause hours or days of wasted time. I'm hoping this article will help save others time and help you be successful with your upgrades.
The Problem
I recently upgraded a Kentico 9 website to Kentico 10. The database upgrade succeeded 3 times in the development environment. We upgraded the code and it compiled without issue. We did QA testing and it all passed. Then we schedule a deployment to production and the database upgrade fails miserably. Thankfully we used Microsoft Azure and slots for this upgrade so there was no downtime!
As I dug into the issue, I found a lot of data which wasn't even being used like EMS features and web analytics. So I cleaned those up and tried the upgrade again. Without issue the upgrade succeeded. I received the succeed message in the Kentico Upgrade Tool as well as all the proper messages in the database event log for starting and properly finishing the upgrade. Once again, we scheduled the production upgrade and once again the database upgrade succeeds but overall the upgrade failed. This time, after days of digging, I found out it was corrupt data. I'm unsure if the reason for corruption is due to starting the project on Kentico+ then migrating to a regular Kentico install, upgrading from v8 to v9, directly accessing the database and modifying records or improper use of the API but I do know it was caused by at least 1 of those items.
I had a sneaking suspicion it was an issue with pages and version history although I wasn't able to put my finger directly on it. After some searching on the web I found fellow Kentico MVP Jeroen Furst's blog post about some similar experiences I was having and an article from Kentico on how to clean up version history.
The Solution
I ended up using a modified version of the code from the Kentico article mentioned above. Really the only change I made was to check if the page was checked out AND did not remove object history as I didn't see this was an issue. If the document was checked out, I simply checked it back in. The biggest problem I found with this solution is it still left orphaned version history ID values in the CMS_Document table data. This solution allowed me to truncate all the version history but the problem still persisted where I wasn't able to get a comptely successful upgrade AND I found a new issue: the content tree was not displaying properly for a specific set of nodes which had these corrupt version history ID on the document.
The next bit of SQL code helped me to erase ALL of the version history AND clean up the CMS_Document table.
update cms_document
set documentcheckedoutversionhistoryid = null, documentpublishedversionhistoryid = null, documentworkflowstepid = null
where documentid in(select documentid
from cms_versionhistory)
delete cms_workflowhistory
where versionhistoryid in (select versionhistoryid
from cms_versionhistory)
delete cms_versionattachment
delete cms_versionhistory
Be very careful when running this as it will permanently delete your records. Always take a backup prior to running this.
Final Thoughts
While the final fix for this was essentially running SQL against the database (NOT recommended) it is what fixed the underlying issue. I was NOT able to verify why those old version history records were not being deleted when a page type was deleted and I wasn't ready to continue looking either. Please verify with your client they are OK losing the hisory of their data when they upgrade. Some are OK with this and some are NOT. Error on the side of caution and assume they want to keep it and use this as a last resort.
Good Luck and Happy Coding!