Technical Stuff – SYSVol DFS Replication Annoyances

So today I just happened to notice that one of our DCs at work didn’t have the latest version of a GPO.

I was creating a new policy using Group Policy Preferences (amazing bit of kit btw – have a look at Alan Burchill’s how to get rid of your logon scripts video here: http://www.grouppolicy.biz/2012/09/teched-2012-video-how-to-get-rid-of-your-logon-scripts-the-easy-and-free-way/), and I was using the Group Policy Results Wizard to see if my changes were being applied to a remote machine – they weren’t.

My first response was to try a gpupdate /force on the machine in question – this didn’t help, but it did make me wonder if this was a AD replication issue.

Back in Group Policy Managment Console, I clicked on our domain and hit ‘detect now’ at the bottom of the screen. Sure enough, it came back with: 1 Domain Controller(s) with replication in progress, listing one of our DCs. Clicking the link under SysVol gave me a box telling me that 2 GPOs were out of date with the baseline controller – one of which was the policy I was editing… yay(!)

Having given the DCS plenty of time to replicate (the next day in this case), I decided to investigate a little further. All of our DCs run Server 2012, and the Forest and Domain Function Levels is set at 2012.

Looking on the server in question, a line in the error logs for DFSR drew my attention:

The DFS replication service stopped replication on volume C:. This occurs when a DFSR JET datavase is not shut down cleanly and Auto Recovery is disabled. To resolve this issue, back up the files in the affected replicated folders, and then use the ResumeReplication WMI method to resume replication. wmic /namespace:\\root\microsoftdfs path dfsrVolumeConfig where volumeGuid="B4A015E2-A116-11DE-89FB-806E6F6E6963" call ResumeReplication

The DFS replication service stopped replication on volume C:. This occurs when a DFSR JET datavase is not shut down cleanly and Auto Recovery is disabled. To resolve this issue, back up the files in the affected replicated folders, and then use the ResumeReplication WMI method to resume replication.

Okay – so the server must’ve crashed at some point in the past and caused this error. Not a problem – the error message helpfully gives you the command to run:

wmic /namespace:\\root\microsoftdfs path dfsrVolumeConfig where volumeGuid=”B4A015E2-A116-11DE-89FB-806E6F6E6963″ call ResumeReplication

After copy pasting the SysVol folder onto my desktop, I tried to run the command in an elevated PowerShell prompt (cos Microsoft love you if you use PowerShell!) and got the following error: Unexpected switch at this level.

Edit: Microsoft have since published an article about this issue – Adam’s comment below explains it – thanks!

According to this article: http://support.microsoft.com/kb/2846759 (about half way down)

Note for PowerShell users: You will need to add single quotations to the WMIC command to run it from PowerShell:

wmic /namespace:\\root\microsoftdfs pathdfsrVolumeConfig where ‘volumeGuid=”F1CF316E-6A40-11E2-A826-00155D41C919″’ call ResumeReplication

Running the command again, but this time in an elevated command prompt window, yields a lovely response stating that replication will occur.

Looking back at the event log, I thought all was sorted until this one popped up:

The DFS Replication service stopped replication on the folder with the following local path: C:\Windows\SYSVOL\domain. This server has been disconnected from other partners for 64 days, which is longer than the time allowed by the MaxOfflineTimeInDays parameter (60). DFS Replication considers the data in this folder to be stale, and this server will not replicate the folder until this error is corrected.

The DFS Replication service stopped replication on the folder with the following local path: C:\Windows\SYSVOL\domain. This server has been disconnected from other partners for 64 days, which is longer than the time allowed by the MaxOfflineTimeInDays parameter (60). DFS Replication considers the data in this folder to be stale, and this server will not replicate the folder until this error is corrected.

Which indicates that this DC has been out of sync since sometime in November last year! We don’t tend to change our policies that often, so I hadn’t noticed this issue before.

Unfortunately, the advice given in the message (to use the DFS management snap in to remove and replace the DFS member) isn’t applicable when you’re trying to replicate the SysVol folder – it doesn’t let you!

Instead, after making sure I had a backup of the SysVol folder, I ran the following:

wmic.exe /namespace:\\root\microsoftdfs path DfsrMachineConfig set MaxOfflineTimeInDays=65

Then re-ran the ResumeReplication call. After a few minutes, I received message ID 4002 from DFSR: The DFS Replication service successfully initialized the replicated folder at local path C:\Windows\SYSVOL\domain. YAY!

Why did this occur? Well in Server 2008 R1 SP1 (and included in 2012), Microsoft introduced a lovely registry key called ‘StopReplicationOnAutoRecovery’ located here: HKLM\System\CurrentControlSet\Services\DFSR\Parameters. When it is set to 1, whenever the server experiences an invalid (dirty) shutdown, DFS replication just stops. No brightly coloured message to tell you this, no message in ADDS, it just stops. Very helpful. Apparently, the reason they’ve done this is to stop files from being lost on downstream servers when upstream servers experience a power loss or whatever. It is up to you as the administrator to resume replication and see which versions of the files is most up to date.

Although this makes much sense, when we’re talking about the SYSVOL folder which doesn’t seem to change much in our case, I’d like it to resume replication automatically. Luckily, there’s an easy way to make this happen!

1. Change the HKLM\System\CurrentControlSet\Services\DFSR\Parameters\StopReplicationOnAutoRecovery registry key to a DWORD value of 0 (or delete it).

2. Run in an elevated command prompt:

wmic /namespace:\\root\microsoftdfs path dfsrmachineconfig set StopReplicationOnAutoRecovery=FALSE

If you’re running a RODC, you’ll be fairly safe to enable this command, as it will just re-sync the data from the sending server whenever there’s a dirty shutdown. Other implementations of DCs, you may want to proceed with a bit of caution! Remember you can attach tasks to specific events in eventmgr – get it to send you an email or create a file on your desktop?

The Microsoft KB that deals with this is here: http://support.microsoft.com/kb/2663685

27 Comments

  • Alex Reply

    THANK YOU for this, I was looking for a solution for weeks

  • Dave Hallwas Reply

    Just found the same problem at a client site. I didn’t think about changing the MaxOfflineTimeInDays value until I saw this post. Brilliant! Worked like a charm. Thanks for this!

  • I have had this similar problem for months now, and when I tried to do what you say here, I noticed that I do not have that registry key on any of my domain controllers, and also when I run that wmic command it says Error: Description=Not found

    C:\Windows\system32>wmic /namespace:\\root\microsoftdfs path dfsrmachineconfig s
    et StopReplicationOnAutoRecovery=FALSE
    ERROR:
    Description = Not found

    C:\Windows\system32>
    C:\Windows\system32>wmic /namespace:\\root\microsoftdfs path dfsrmachineconfig s
    et StopReplicationOnAutoRecovery=FALSE
    ERROR:
    Description = Not found

    It would be great if you could email me with another waY TO solve this problem. I do not know what to do to fix my SYSVOL replication I am only getting two eventlog messages, one is about a overlapped SYSVOL folder and the other message is the one about the sysvol being offline for 175 days (the same message you mentioned).
    I have tried to fix this problem by utilizing all the tricks I have found so far on the web, but there are not many for event id 4012 and 6410. those are my errors. thanks in advance

  • Jeff Graves Reply

    Great post – just ran into this same issue with recently deployed Server 2012 DC’s. Was finally able to get this resolved by following this procedure.

  • Ray Reply

    You are the man! Thank you. For some reason Sysvol and Netlogon would not create the shares. After fumbling through other posts I was able to get these folders shared. Next challenge was replicating the policies in sysvol with proper permissions. You post gave me the clue which showed me the command to re-initialize after a “dirty shutdown”. Thanks again!

    Ray

  • Andrew Reply

    Thank you Thank you Thank you

    Had read just about everything was all set to do authoratitive restore and had rebuilt about 4 dc’s

    Thank you once again.

  • Rupesh Reply

    Thank you. We were breaking our head and this document fixed our issue.

    You are life saver

  • Mark Harrigan Reply

    Awesome, thanks for this!

  • Matt Stevens Reply

    Just had a similar issue at our site, deployed 2 new DCs to discover this issue. Turns out we hadn’t been syncing for almost 200 days O.o

  • Jim Reply

    Thanks for your post saved me!!

  • Steve Reply

    Thanks!!! Been struggling with this for way too long (122 days apparently). I can’t believe using the cmd line instead of PS was the key!

  • Patrick Gotsch Reply

    Hello,

    great article. After some hours of working thru ms articles I remebered when I installed my hyperv cluster (3 servers, 1 dc, 2 hyperv, 1 additional dc as a vm) we had some hardware failurs which was followed by a hardreset of the dc… Nevertheless… 240 days of not syncing… problem solved now. ty

  • Yuriy Reply

    THX!

  • Adam Reply

    Thank you very much for this post. I was having an issue with our Server 2012 not replicating it’s sysvol. Note, I cam accross an MS Articel that explained the error “Unexpected switch at this level.”

    According to this article: http://support.microsoft.com/kb/2846759 (about half way down)

    Note for PowerShell users: You will need to add single quotations to the WMIC commnand to run it from PowerShell:

    wmic /namespace:\\root\microsoftdfs pathdfsrVolumeConfig where ‘volumeGuid=”F1CF316E-6A40-11E2-A826-00155D41C919″’ call ResumeReplication

    After I added the single quotes the command worked in Power Shell. I don’t like how MS is pushing the use of PS and yet commands run differently in PS and CMD.

    Hope this helps anyone that is coming across that issue.

    Great article, and thanks again.

  • Chris Reply

    Man I have been chasing this rabbit for a couple of days. This has fixed my problems.

  • Ludovic Reply

    Your are my god man !

  • Deji Reply

    It saved the day. Thanks for the post.

  • Sherman Reply

    You are a beautiful genius!

  • Jeff Reply

    AWESOME Write-up!

    If it wasn’t for Microsoft issues, I would be out of a job. What would posses MS to have replication stop due to a power outage? You are a life saver!

  • Eamonn Reply

    We encountered this same issue with our RODC running on a VM caused by an issue with a vNIC that was not compatible with VMSphere 5.5 and Windows Server 2012 resulting in the VM not responding to any requests although the server was up. As one could logon because the network was not available which required a dirty shutdown to resolve the issue and as a result replication was stopped without any bells or whistles in the event logs.

    Your solution worked although I had some issues with the command wmic /namespace:\\root\microsoftdfs pathdfsrVolumeConfig where ‘volumeGuid=”F1CF316E-6A40-11E2-A826-00155D41C919″’ call ResumeReplication in both a Run as administrator command prompt and Windows PowerShell, but it did eventually worked for me in a command prompt without the use of single quotes ”.

    Although sysvol replication restarted, I was concerned that the KCC errors and warnings for EVENT 2847 and 1925 in the Directory Service logs would not clear, but they eventually did clear and the RODC is back.

    The Knowledge Consistency Checker located a replication connection for the local read-only directory service and attempted to update it remotely on the following directory service instance. The operation failed. It will be retried.

    Additional Data
    Connection:
    CN=RODC Connection (FRS),CN=NTDS Settings,CN=RODC0,CN=Servers,CN=OFFSITE,CN=Sites,CN=Configuration,DC=wintest,DC=local
    Remote Directory Service:
    CN=NTDS Settings,CN=DC1,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=wintest,DC=local

    Additional Data
    Error value:
    The RPC server is unavailable. 1722

    The attempt to establish a replication link for the following writable directory partition failed.

    Directory partition:
    DC=wintest,DC=local
    Source directory service:
    CN=NTDS Settings,CN=DC1,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=wintest,DC=local
    Source directory service address:
    253d871a-56b2-4e2a-98ad-3cb33f6413a5._msdcs.wintest.local
    Intersite transport (if any):
    CN=IP,CN=Inter-Site Transports,CN=Sites,CN=Configuration,DC=wintest,DC=local

    This directory service will be unable to replicate with the source directory service until this problem is corrected.

    User Action
    Verify if the source directory service is accessible or network connectivity is available.

    Additional Data
    Error value:
    1722 The RPC server is unavailable.

    Many thanks for you help with this as without it I would have had to forcibly remove and recreate the RODC which may only added to my workload.

    AWESOME!

    • Eamonn Reply

      Actually KCC EVENT 2847 and 1925 in the Directory Service logs did not clear on the RODC.

      Instead for some reason we have Event 1129 resulting in more 2847 events because DC1 fails to initiate a replication connection to the RODC after DC2 replication connection was removed.

      To improve the replication load of Active Directory Domain Services, a replication connection from the following source directory service to the local directory service was deleted.

      Source directory service:
      CN=NTDS Settings,CN=DC2,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=wintest,DC=local
      Local directory service:
      CN=NTDS Settings,CN=OFFSITE-RODC0,CN=Servers,CN=OFFSITE,CN=Sites,CN=Configuration,DC=wintest,DC=local

      Additional Data
      Reason Code:
      0x8
      Deletion Point Internal ID:
      f0c092b

      Any help appreciated.

  • Jeganraj K Reply

    Hi,
    I am facing “dfs replication service disabled” automatically in windows 2008 r2 server.
    In my setup, two win2008r2 server configured with dfs replication. Please share if you are aware of this issue.

    Thanks.

  • Keith W Reply

    I am having the same issue with a new 2012 R2 domain controller, which is the 2nd DC in a 2DC domain with the PDC emulator DC running 2012. I like the idea of just setting the MaxOfflineTimeInDays setting which seems very simple. I’ve seen other articles that have you go through a bunch of ADSI Edit procedures, etc. https://support.microsoft.com/en-us/kb/2958414 and https://support.microsoft.com/en-us/kb/2218556 for example. If it’s this simple, why are others recommending to go through a lot more hoops to get to the same result? Is there any risk in adjusting the MaxOfflineTimeInDays ? My SYSVOL folder and subfolders on the new DC are completely empty and the new DC is waiting to start replication because of the MaxOfflineTimeInDays being exceeded.

    • hdic Reply

      Keith – if the second DC is new, have you tried just demoting and re-promoting it again? I would have thought this would fix the issue and is certainly better than messing with DFSR!

      As far as I know, the only ill-effects that can come of this is if the two DCs have conflicting versions of files and it doesn’t know how to merge the two sets. In your case it should be fine as one SYSVOL folder is totally empty (and therefore won’t have any conflicts).

      The default in pre-2012 was actually to do this automatically (and silently!).

      Matt

  • Lotfi Reply

    great post, thank you 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *