Google
 

View Full Version : File Synchonization Help


Manu
01-19-2006, 03:19 PM
So, heres my deal...

I have a main office and a branch office. I have about 20GB of data on a NAS at the branch office.

I have a current copy of this data at a NAS at my corporate office.

I would like to look into software that will do file synchronization both ways. I have people working on the data at both sides, so I need to synchronize both ways, but I also need to have a file lock on the data not being actively used....

does that make sense? Anyone have any ideas for me?

RightWingZealot
01-19-2006, 04:07 PM
maybe double-take?

http://www.nsisoftware.com/what-we-offer/double-take/

92Notch
01-20-2006, 08:57 PM
So, heres my deal...

I have a main office and a branch office. I have about 20GB of data on a NAS at the branch office.

I have a current copy of this data at a NAS at my corporate office.

I would like to look into software that will do file synchronization both ways. I have people working on the data at both sides, so I need to synchronize both ways, but I also need to have a file lock on the data not being actively used....

does that make sense? Anyone have any ideas for me?

Not sure on that one.

Is setting up one of the sites as the "data center" having the NAS there and accessing it from the other site out of the question? Perhaps with some VPN hardware at both sites to protect the information if it's going out on public lines.

If I understand, you want to sync "real time", sounds like it could be a bitch, and a possible (probable) point of failure.

What do you mean by "file lock"?

Gibson
01-20-2006, 09:08 PM
Are they actual NAS appliances? Do the OS's support DFS? You could just use that :shrug:
If they're windows boxes you could do DFS, if they're *ix boxes you can write a cron job that'll do a shadow copycon of the latest data files on each end. Just use S/FTP on each side :shrug:

Manu
04-06-2006, 05:21 PM
Long term update :)

Notch-

Thats how we'd been doing it, but the bandwidth needs were getting out of hand.

It looks like we're going to go with the newest version of DFS....

GROFF200
04-06-2006, 05:50 PM
Don't know if this will address your problem, but...
You could setup a subversion repository and put your files in it.
When users want to edit a file, they update their local copy, lock the file they want to work on, then commit it back to the repository.
This, at least, is how developers keep synchronized when multiple people are writing software. Not sure if it applies to your environment though.

PlatyGuy
04-06-2006, 08:24 PM
Several questions immediately pop to mind.

What client operating systems are involved?
What kinds of NAS boxes are these?
Do you need transparent file locking, or is an explicit checkin/checkout such as Groff suggests OK?
Do updates need to be real-time, near real-time (seconds to minutes), or periodic (hours)?

Explicit checkin/checkout with periodic updates is by far the easiest case, and you could use something like Unison (http://www.cis.upenn.edu/~bcpierce/unison/) or Onion Networks's WAN Transport (http://onionnetworks.com/products/) to optimize the bandwidth. There are plenty of other "WAN optimization" (http://www.byteandswitch.com/document.asp?doc_id=91559) products out there, but those are two that I know particularly well.

If you need transparent locking and/or real-time updates, you're kind of stuck with distributed filesystem - now often called Wide Area File System or WAFS in storage-industry lingo. The Byte&Switch article cited above links to many players in that space as well, but many of those are pricey and tied to proprietary hardware. If your NAS is beige-box Linux you might try InterMezzo (http://www.inter-mezzo.org/), or OpenAFS (http://openafs.org/) for broader platform support. NFSv4 has some specific features to allow hierachies of servers and proxies to function efficiently even over wide area networks, but it's not widely implemented or deployed. The most difficult case would be if you're using specialized NAS boxes (e.g. EMC, NetApp, BlueArc) in which case you're pretty much at the vendor's mercy for such extra functionality.

If you have any further questions, feel free to PM or email me. I used to do this stuff for a living (at my last job so no conflict of interest wrt my current one); if I can understand your specific requirements in more detail it might help jar some more relevant memories loose.

Manu
04-06-2006, 08:51 PM
Client - Windows XP SP2
Server - Windows 2003 Sever (and Windows 2003 Storage Edition)
Transparent file locking, no check in out
Near real time updates, if there's file locking, then updates can be slower...

Especially given costs of DFS (free) I think im 'stuck' with it.

What potential up and downsides do you see with DFS?

PlatyGuy
04-06-2006, 09:58 PM
What potential up and downsides do you see with DFS?
There are at least three potential disadvantages. The first is performance; naively implemented WAFS can suffer from either latency or bandwidth limitations if it's not designed to use the network efficiently. The last time I looked at it Microsoft's DFS seemed pretty far on the naive side, providing a unified namespace but not a whole lot in terms of efficient updates or synchronization. Since OpenAFS is also free and does support Windows I personally would look at that first. The second problem, which 92Notch already alluded to, is potential for failure. The software might support multiple redundant servers and connections etc. but if your'e not actually deploying in that fashion then you now have more components in your (single) data path and thus lower reliability than if you had fewer. The third potential problem, which 92Notch also touched on, is security. Besides the issue of sending data over public lines, there's also an issue of authenticating users and enforcing permissions. If you're already doing some kind of synchronization I'd guess you've already had to address the need for a single system to authenticate across sites, so maybe it's not actually a problem in this case but it always deserves to be considered.

One possible advantage to Microsoft DFS over OpenAFS is that the former is more likely to be supported by commercial WAN optimization products that can alleviate performance issues. Since OpenAFS is inherently better designed for network efficiency, though, such add-ons are less likely to be necessary.

RightWingZealot
04-07-2006, 11:47 AM
Isnt there a limit to the file/folder number when synchronizing with DFS?

Manu
04-07-2006, 07:35 PM
There are at least three potential disadvantages. The first is performance; naively implemented WAFS can suffer from either latency or bandwidth limitations if it's not designed to use the network efficiently. The last time I looked at it Microsoft's DFS seemed pretty far on the naive side, providing a unified namespace but not a whole lot in terms of efficient updates or synchronization.
Hav eyou worked with MS DFS since Windows 2003 R2 has been released? From what I've read they've implemented RDC (remotely differential compression) and the reviews are pretty good. The MS demo (with a grain of salt of course), but their perf mon stuff looked good.

Since OpenAFS is also free and does support Windows I personally would look at that first.

Thanks for the name, will check it out.

The second problem, which 92Notch already alluded to, is potential for failure. The software might support multiple redundant servers and connections etc. but if your'e not actually deploying in that fashion then you now have more components in your (single) data path and thus lower reliability than if you had fewer.
I think that kind of goes without saying, that is more a setup concern than a technological limitation, no? Its like saying I have a RAID controller, 2 HDs, but not configuring the array.

The implementation we're looking at would be a hub/branch setup, where the branch will replicate data back to the hub.


The third potential problem, which 92Notch also touched on, is security. Besides the issue of sending data over public lines, there's also an issue of authenticating users and enforcing permissions. If you're already doing some kind of synchronization I'd guess you've already had to address the need for a single system to authenticate across sites, so maybe it's not actually a problem in this case but it always deserves to be considered.
The sites are on the same domain and connected via hardware VPN clients. The security is not a concern as the tunnels are encrypted. In terms of authentication, it will all be our windows domain authentication, thats the nice thing with MS DFS, it will use my established permissions.

One possible advantage to Microsoft DFS over OpenAFS is that the former is more likely to be supported by commercial WAN optimization products that can alleviate performance issues. Since OpenAFS is inherently better designed for network efficiency, though, such add-ons are less likely to be necessary.
I'll need to just do some perfmon testing I think...

RWZ-

In domain mode, each root can does have a 5000 folder limit. But on the whole, the name space does not have a limit.

PlatyGuy
04-07-2006, 08:25 PM
Hav eyou worked with MS DFS since Windows 2003 R2 has been released? From what I've read they've implemented RDC (remotely differential compression) and the reviews are pretty good. The MS demo (with a grain of salt of course), but their perf mon stuff looked good.
No, I haven't worked with a version of DFS that recent. Differential compression is a great addition - it's what the WAN optimization folks mostly do - but it's not quite a substitute for an inherently efficient protocol. To put it another way, compression mostly helps with bandwidth but it's often latency that's the real killer. If your protocol requires more messages between nodes than are truly necessary, then you can see significant performance degradation long before the network becomes saturated. Make sure you look for "latency bubbles" when you're evaluating performance, and also make sure you look at some network traces in addition to perfmon data to get the full picture.
I think that kind of goes without saying, that is more a setup concern than a technological limitation, no? Its like saying I have a RAID controller, 2 HDs, but not configuring the array.
Yeah, it's a lot like that. BTW, you'd be amazed at how many people do exactly that. The reason it's an issue is that the consequences of a failure in your data path are much more severe when using a DFS (where things get blocked in the kernel) than when using user-level synchronization tools. Putting a WAN connection in your data path makes that even worse, so a DFS solution basically forces you to adopt a redundant solution whereas it's kind of optional otherwise. If you need a DFS because of immediacy/transparency requirements then it's just something you have to live with, but it's worth noting.

Google