Tuesday, September 1, 2015

SharePoint 2013: [workflow] failed to start or run

Problem

You have created a workflow attached to a list that is triggered via a custom information management policy involving retention date. You find that the workflow does complete, but only after a long delay.  Looking in the Workflow History for the list item, you see the following:

Date OccurredEvent TypeUser IDDescriptionOutcome
9/1/2015 10:25 AMErrorSystem Account[workflow] failed to start
9/1/2015 10:25 AMErrorSystem Account[workflow] failed to run
9/1/2015 10:49 AMCommentSystem Account[workflow] triggered

And this experience is repeatable.  Each time the workflow is triggered; it first fails to start; it then fails to run; and then, a few minutes later, it does in fact complete.  The workflow history provides no immediate clues as to what the cause might be.  However, it does at least provide a time stamp, and using this, along with the ULS logs, will enable the cause to eventually be identified and resolved.  In this case, the cause was eventually found to involve McAfee On-Access Scan (OAS), and the solution was simply to turn off and on OAS.  Next, I present my troubleshooting steps.

Troubleshooting

  1. Reviewed list item workflow history.
  2. Created new list items configured to trigger information management retention policy, and then re-ran appropriate timer jobs: issue was repeatable.
  3. Checked server system and application event logs on both WFE, seeking events that occurred at about the same time as the workflow time stamps: no events found.
  4. Reviewed ULS logs on both WFE's for events occuring at about the same time: identified key ULS entry (among others having same correlation ID):
    Process: OWSTIMER.EXE
    Product: SharePoint Foundation
    Category: Legacy Workflow Instrastructure
    Level: Unexpected
    tbdRunWorkflow: Microsoft.SharePoint.SPException: <error><compilererror column="-1" line="-1" text="Compilation failed. Could not load file or assembly 'Xoml.8a0e3a04_01c8_4fa3_a7dc_a23122cdcf10.2.4096.-1.0.dll' or one of its dependencies. Access is denied."></compilererror></error>  
     at Microsoft.SharePoint.Workflow.SPNoCodeXomlCompiler.LoadXomlAssembly(SPWorkflowAssociation association, SPWeb web)  
     at Microsoft.SharePoint.Workflow.SPWinOeHostServices.LoadDeclarativeAssembly(SPWorkflowAssociation association, Boolean fallback)  
     at Microsoft.SharePoint.Workflow.SPWinOeHostServices.CreateInstance(SPWorkflow workflow)  
     at Microsoft.SharePoint.Workflow.SPWinOeEngine.RunWorkflow(SPWorkflowHostService host, SPWorkflow workflow, Collection`1 events, TimeSpan timeOut)  
     at Microsoft.SharePoint.Workflow.SPWorkflowManager.RunWorkflowElev(SPWorkflow workflow, Collection`1 events, SPWorkflowRunOptionsInternal runOptions)
  5. Created new list item configured to trigger information management retention policy, re-ran appropriate timer jobs, and then reviewed ULS logs to determine if above entry was repeated: it was.
  6. Performed Internet search: found promising posting [4] indicating that anti-virus may be involved.
  7. Disabled McAfee OAS, created new list item configured to trigger information management retention policy, and then re-ran appropriate timer jobs: workflow completed without any issue.
  8. Created new list items configured to trigger information management retention policy, and then re-ran appropriate timer jobs: workflow completed without any issue.
  9. Re-enabled McAfee OAS, created new list item configured to trigger information management retention policy, and then re-ran appropriate timer job: workflow completed without any issue.
  10. Created new list items configured to trigger information management retention policy, and then re-ran appropriate timer jobs: workflow completed without any issue.

Solution

  • Disable anti-virus.

References

  • Information Management Policy for retention involves to key timer jobs: Expiration policy and Information management policy.  The first look at each list item and applies the information management policy retention setting that you configured, updating the hidden Expiration Date column for a list item.  The second jobs reviews the Expiration Date column and then executes the information management policy configured for the condition if the condition requirement is met.
  • Farm topology: small, three-tier, having one app, two WFEs, and one SQL Server instance.
  • Due to employing network load balancing (in this case Windows NLB), you will need to check both ULS logs to find the ULS entries mentioned above.  This is due to the fact that information management policy may be run on either WFE and you don't know which.
  • .NET version being used: to determine this, launch IIS, and then go to the Application Pools listing. One column that is displayed is the .NET Framework Version.
  • Location of Temporary ASP.NET Files folder: for default installations, patched through June 2015, this would be: C:\Windows\Microsoft.NET\Framework\v4.0.30319\Temporary ASP.NET Files

No comments: