In this podcast, we look at file synchronisation software, with Jimmy Tan, CEO of Peer Software.
We talk about how it works and the differing methodologies that can be used, such as peer-to-peer and hub-spoke topologies.
Tan also talks about key use cases for file synchronisation, such as large file shares in collaboration scenarios, content dissemination within organisations, and ensuring virtual desktop profiles are harmonised.
Antony Adshead: What is real-time file sync, how does it work, and how do customers deploy it?
Jimmy Tan: Real-time file synchronisation can be described as a process that ensures that files or data from multiple devices or systems are always kept up to date and consistent with each other. When changes are made on one file system or device, the same changes are then quickly and automatically made to all the other devices, and thus ensures the most current version of a file is available across all devices in real time.
You can break down real-time file synchronisation into a three-step process.
First, monitoring changes. Second, the transmission of those changes. Third, the commitment of those changes.
When it comes to monitoring the changes, the first step, generally there’s a software or a service responsible for file synchronisation that continuously monitors designated folders or directories on a connected device. This monitoring is best served through some sort of real-time file system event tracking. A lot of devices these days have some sort of API [application programming interface] or file event log where you can grab that information and use that as the event tracker.
Then, when a change is detected on a system, such as a file addition, modification, deletion or rename, the synchronisation software notes that change and prepares to propagate or transmit to other devices, which moves onto stage two; the transmission of those changes.
The synchronisation software can then transmit those changes directly from one device to another in a peer-to-peer architecture, or can send those changes to a central server first, whether on-premise or in the cloud, or carry out changes to the log recorded and coordinate it before further dissemination it to target systems.
This second methodology uses a central server and creates a mediator between all the connected devices and follows more of a hub-and-spoke topology versus a peer-to-peer architecture.
The third part is the easiest, the commitment of those changes. As each device receives those file events, the file changes are made to the respective files on each local storage, and ensures it’s in sync with everything else.
In addition, real-time file synchronisation can be categorised as one-layer bi-directional in one-way scenarios or typically a single master-source system that collects file events, file event changes, and all other devices are merely recipients of file changes made at the master site.
In contrast, bi-directional synchronisation would typically connote a multi-master or active-active scenario where changes can be made on any system. In these multi-master environments, file version conflicts can occur when different devices make changes to the same file simultaneously.
To resolve this, modern synchronisation software generally will include some sort of file version conflict prevention capabilities, like file locking, or some sort of conflict resolution mechanism, such as prioritising the newest version of a file or prompting the user to correct the version.
Adshead: What are the benefits, what workloads are most commonly able to use it, and what can’t it be used for?
Tan: In terms of key benefits for real-time file synchronisation, the most obvious for customers looking for this type of technology are productivity and availability.
Productivity is enhanced obviously because the real-time synchronisation enables information to be immediately available on all connected devices and also to all end users regardless of where they’re at. You know, there’s no delays in information access.
From a high availability perspective, because of the redundant nature of the connected devices, when any one system fails, if it’s also overlaid with some sort of auto-failover technology, real-time file synchronisation is a core component piece of that automatic failover to a second site that houses that redundant data. And it allows end users to continue to work, therefore reducing downtime.
So, those are the two key benefits.
With regards to the workloads that most benefit from file synchronisation, I’ll name three off the top of my head.
First off is any sort of file sharing and project collaboration for distributed teams. I have a lot of conversations with engineering firms, for example, that have very large CAD files that are shared across distributed team members and they have a follow-the-sun workflow model.
Now imagine that you have all those large files stored in a single site and accessing it remotely from around the world can be very slow. So, real-time file synchronisation is the core component piece to enabling collaboration in a follow-the-sun workflow model.
The second example is content publishing and distribution. Just recently, I had a conversation with a large media firm that had a single master dataset of content stored in Connecticut. And, as they updated that information they wanted to make sure that content was pushed out to all their affiliate TV stations around the world to make sure they had the most ready example of logos, additional imagery and videos to be published on their stations.
The third use case that I have seen a lot recently, and perhaps not as obvious, is virtual desktop profile synchronisation. Virtual desktops have exploded – they were very much in use pre-pandemic, but with the pandemic, their use has exploded.
Most people don’t think about it, but enabling virtual desktop profile synchronisation – and a profile is just a set of files – gives a consistent virtual desktop experience across companies that are multi-site. As an example, we were recently talking to a healthcare system that had many hospitals around the country, with doctors going between hospitals and surgeries and they want to make sure that their virtual desktops were consistent across sites.
Now, where is it not useful? It’s not useful in any high-latency or limited-bandwidth scenarios, because synchronisation across highly latent or limited-bandwidth scenarios is just taxing to the network or the storage systems.