Thursday, April 7, 2011

SharePoint 2007: Troubleshooting MSSQLSERVER Application Error 9002: One Approach

Problem

Users are reporting that they cannot connect to a SharePoint Server 2007 Enterprise website.  Checking the Windows Server 2003 Application Event log, I find it filled with Error 9002:

Event Type:       Error
Event Source:    MSSQLSERVER
Event Category: (2)
Event ID:           9002
Date:                 [date]
Time:                 12:07:31 PM
User:                 [domain]\[account]
Computer:         [machine name]
Description:
The transaction log for database 'SharePoint_Config' is full. To find out why space in the log cannot be reused, see the log_reuse_wait_desc column in sys.databases.  For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

Background

This error involved a SharePoint 2007 instance (version 12.0.0.6421), with a SQL Server 2005 (version 9.00.3042) backend. The SharePoint instance hosts four web applications (Shared Services, MySites, CA, and the primary content web).  There are four collections, where three have incidental purposes and one is the primary collection servicing user information needs.  The primary collection content is strictly static and its lock status is set to Read-only [1, 2].  It contains legacy information that is no longer updated but still frequently accessed and needed for other projects.

The original administrator of this instance (now since long gone) had installed both the SharePoint and SQL Server instances.  He had deployed both in essentially default configurations.  Daily, weekly, and monthly backups were configured and operational.  About three years ago, the project supporting this SharePoint instance had been terminated.  The data on this instance was determined to still be valuable, and thus it was kept operational, but its status was set to locked to prevent further modifications to it.  Another engineer was assigned to look in on it now and then to keep it operational and resolve any issues that arose.  After about two years I was assigned to manage this and other SharePoint instances.

This was not a SharePoint instance I had given much thought to for some time.  It was low priority, and my primary responsibilities involved administering and maintaining three other SharePoint instances, all of which were actively in use.  I was responsible for both these SharePoint instances and their database (SQL Server 2005) backends.  I am not a DBA but have sufficient experience and knowledge of database administration to be able to perform most routine functions.

In this article, I would like to share the approach I took to resolving a common problem in the hope it might be of benefit to others encountering similar problems with SharePoint.

Troubleshooting

I first checked up on this error, finding that it indicates that the transaction log associated with a SharePoint content database has become so large that it has consumed all available disk space and thus SQL Server can no longer write to this disk to maintain itself [3].  I then began the process of examining the SharePoint host generating this error.

I first checked the disk, containing the Content_Config database. I discovered that only a few KB space remained.

Next, checking the directory containing the content database, I discovered that the Content_Config log file was significantly larger than the content database.


The immediate solution to the problem was fairly obvious: free up more space on the disk.  So, to get things up and running as soon as possible, I simply looked through disk E: to see what could be comfortably deleted, removed, archived, etc.  Fortunately, there was considerable scope for disk cleanup, and so I removed enough files to free up about 12 GB of disk space.  I then restarted IIS, and then restarted the SQL Server service.  After this, I was able to connect to the primary collection without issue and determined that the problem was at least temporarily resolved.  I also verified that my other users were also now able to connect to and retrieve information from the primary collection. I then began investigating more permanent resolution of the problem, as it was bound to recur.

To get started with understanding this error more thoroughly, I performed some cursory searches of the literature and reviewed what others had done with regard to SharePoint instances and full transaction log files [4-10], but I was not entirely satisfied with what I read.  The intitial articles that I retrieved were interesting, and seemed to provide solutions, but it seemed that I was missing something somewhere, that perhaps I needed to step back and take a more productive approach to resolving this problem.  So I continued my search, and then I came across a good article by Michael Van Cleave that helped me to see more clearly what the actual underlying problem was and put me on the right track [11].  Essentially, I had to step back and rethink the most intelligent and efficient approach to maintaining this particular SharePoint instance.

It was Cleave's article that motivated me to give greater attention to the database recovery model configured for the Content_Config database and to recovery models in general.  I'm not a DBA, and though I recalled having read about recovery models somewhere, years ago, I was not very familiar with it in general.  Searching the literature for "SharePoint" and "database" and "recovery model" was much more fruitful, and I was quickly able to find a number of good references to study [12 - 20].  In fact, I came across a reference that I had never heard about, a Microsoft Resource Center dedicated to SharePoint recovery [18]!  I read through these articles, and then went back to examining the machine, armed with a better understanding of what needed to be done.

I proceded to look through each of the SharePoint databases to determine what their recovery models were.  I found that the SharePoint_Config database recovery model was set to Full.  So were a number of other SharePoint databases.  This undoubtedly was a legacy of that time when this SharePoint instance was undergoing constant content modification.  At the time that the previous SharePoint administrator had been maintaining this SharePoint instance, there was considerably more interaction with the SharePoint instance, including the addition, modification, and deletion of content.  Therefore, a full recovery model would have been justified. Since that time, the SharePoint instance has been locked to prevent further modifications, and over this time it has received minimal attention: so long as it worked, it was left alone.  I then went back to Cleave's article, and began to plan what I was going to do.

Reflecting on Cleave's article, it seemed to me that, as the role of the SharePoint instance changed, so too must its recovery model.  When the data on this SharePoint instance was constantly changing, and users are constantly making changes to the collection (adding lists, adding libraries, add/modifying/deleting content, etc) a full recovery model was best.  Now that the data is static, and users can no longer make any changes to the collection, a full recovery model was unnecessary.  Once the databases are backed up, that's it: they don't change.  With this in mind, I went back to the machine hosting the SharePoint instance and changed the recovery model of the Content_Config database from Full to Simple.  Here's the procedure I followed, using SQL Server Management Studio for all tasks.

First, I performed full backups of all the SharePoint databases.  Next, right-clicking on the SharePoint_Config database, I chose Properties.


I change the Recovery model to Simple, and then clicked OK.  Then, right-clicking on the database again, I pointed to Tasks > Shrink > Files.  For the File type, I selected Log.  Note the Available free space field: it indicates that 99% of the log file space can be recovered.


I then clicked OK.  After about two minutes, the Shrink file dialog closed.  I then went back to the directory containing the SharePoint_Config database and log files.


As a last check, I reconnected to the primary collection and did so without difficulty, verifying that no data was lost.

The Simple recovery model will work fine for the SharePoint_Config database, given the static nature of its content.  I have mutliple database backups created, and I have created full SharePoint backups using Central Administration and stsadm.  Thus, I have a number of restoration options available if need be.  Taking the course of least action, I left the recovery models of the other SharePoint databases as-is: there's no need to change them at this point, and their growth rate is extremely slow.

Lessons Learned

This issue was caused by inadequate attention to transitional issues - issues likely readily apparent to a DBA but outside the immediate knowledge base of a SharePoint administrator.  Once the original SharePoint administrator left the project, and before I was permanently assigned to supporting this SharePoint instance, there had been other SharePoint administrators temporarily assigned to it and who delt with problems as they came up.  And the results of their solutions were evident as I delved deeper into this problem - no criticism of their efforts: their efforts resolved the immediate problem (archiving log files, moving the log file to a new disk, etc) given limited availability to attend to the problem, but did not go to the heart of the problem, which essentially involved a change in requirements.  The requirement is: shall be able to recover the database without loss of data in the event of failure.  Effectively meeting this requirement depends upon circumstances.  Codifying what I've learned and summing it all up leaves me with the following:
  • Shall be able to recover the database without loss of data in the event of failure:
    • Scenario A: dynamic critical content
      • Recovery model: full
      • Backup plan: daily, weekly, monthly as appropriate
    • Scenario B: static critical content
      • Recovery model: simple
      • Backup plan: one-time or annual, as appropriate
This lessons-learned would be well-appreciated and known to any experienced and competent DBA.  But this small farm was not assisted by a DBA: it was the SharePoint administrator's responsibility to handle everything.  Reflecting on all that was done to resolve this error, I was struck by the importance of SharePoint administrators having significant experience in other technologies impacting SharePoint, in this case database administration. 

Conclusion

This concludes this article on one approach to troubleshooting Application Error 9002.  I hope that you found it useful with regard to your own troubleshooting efforts.  By the way, if you happen to know a Microsoft reference discussing database configuration defaults during installation of SharePoint - particularly recovery model defaults - please leave a note.  Jameson's article was the first such structured reference to how the SharePoint installation engine configures the databases that I had come across [14], and many thanks to him for documenting it.  However, I would like to find a reference to these configuration details somewhere in the actual Microsoft SharePoint documentation.  Happy Computing!

References
  1. Microsoft TechNet: SharePoint Server 2007
  2. Microsoft TechNet: Locking down Office SharePoint Server sites
  3. MSDN: Troubleshooting a Full Transaction Log (Error 9002)
  4. MSDN - SQL Server 2008 R2: Adding and Deleting Data and Transaction Log Files
  5. Microsoft Support (873235): Recover from a full transaction log in a SQL Server database
  6. Microsoft Support (317375): A transaction log grows unexpectedly or becomes full on a computer that is running SQL Server
  7. SQLAuthority: SQL SERVER – Shrinking Truncate Log File – Log Full
  8. SQL Server: BACKUP Log With TRUNCATE_ONLY
  9. JMPInline: How to empty a SQL Server database transaction log file
  10. StackOverflow: How do you clear the transaction log in a SQL Server 2005 database?
  11. Michael Van Cleave: Very Large SharePoint Database Files
  12. MSDN: How to: View or Change the Recovery Model of a Database (SQL Server Management Studio)
  13. Microsoft SharePoint Forums: sql recovery models for sharepoint databases
  14. MSDN Blogs: Random Musings of Jeremy Jameson: Default Recovery Models for SharePoint Databases
  15. TechNet Blogs: SharePoint Shenanigans: Setting the Recovery Model of all SharePoint Databases
  16. TechNet Blogs: Jose Barreto's Blog: Maintaining your MOSS 2007 SQL Databases
  17. TechNet Magazine: Paul S. Randal: Understanding SQL Server Backups
  18. TechNet: SharePoint Server 2007: Backup, Recovery, and Availability Resource Center for Microsoft Office SharePoint Server 2007
  19. MSSQLTips: SQL Server Simple Recovery Model
  20. MSDN Blogs: Sharing SharePoint: Back Up and Restore by Using Stsadm
  21. Restarting Your SQL Error Logs Using Sp_Cycle_ErrorLog, myITforum.com, Don Hite
Notes
  • None

1 comment:

Anonymous said...

I've to confess that i sometimes get bored to read the entire thing however i think you'll be able to add some value. Bravo !