In this blog post I’d like to talk about a new file system for Windows. This file system, which we call ReFS, has been designed from the ground up to meet a broad set of customer requirements, both today’s and tomorrow’s, for all the different ways that Windows is deployed. The key goals of ReFS are:We wanted to continue our dialog about data storage by talking about the next generation file system being introduced in Windows 8. Today, NTFS is the most widely used, advanced, and feature rich file system in broad use. But when you’re reimagining Windows, as we are for Windows 8, we don’t rest on past successes, and so with Windows 8 we are also introducing a newly engineered file system. ReFS, (which stands for Resilient File System), is built on the foundations of NTFS, so it maintains crucial compatibility while at the same time it has been architected and engineered for a new generation of storage technologies and scenarios. In Windows 8, ReFS will be introduced only as part of Windows Server 8, which is the same approach we have used for each and every file system introduction. Of course at the application level, ReFS stored data will be accessible from clients just as NTFS data would be. As you read this, let’s not forget that NTFS is by far the industry’s leading technology for file systems on PCs. This detailed architectural post was authored by Surendra Verma, a development manager on our Storage and File System team, though, as with every feature, a lot of folks contributed. We have also used the FAQ approach again in this post.
--Steven PS: Don't forget to track us on @buildwindows8 where we were providing some updates from CES. <hr />
- Maintain a high degree of compatibility with a subset of NTFS features that are widely adopted while deprecating others that provide limited value at the cost of system complexity and footprint.
- Verify and auto-correct data. Data can get corrupted due to a number of reasons and therefore must be verified and, when possible, corrected automatically. Metadata must not be written in place to avoid the possibility of “torn writes,” which we will talk about in more detail below.
- Optimize for extreme scale. Use scalable structures for everything. Don’t assume that disk-checking algorithms, in particular, can scale to the size of the entire file system.
- Never take the file system offline. Assume that in the event of corruptions, it is advantageous to isolate the fault while allowing access to the rest of the volume. This is done while salvaging the maximum amount of data possible, all done live.
- Provide a full end-to-end resiliency architecture when used in conjunction with the Storage Spaces feature, which was co-designed and built in conjunction with ReFS.
- Metadata integrity with checksums
- Integrity streams providing optional user data integrity
- Allocate on write transactional model for robust disk updates (also known as copy on write)
- Large volume, file and directory sizes
- Storage pooling and virtualization makes file system creation and management easy
- Data striping for performance (bandwidth can be managed) and redundancy for fault tolerance
- Disk scrubbing for protection against latent disk errors
- Resiliency to corruptions with "salvage" for maximum volume availability in all cases
- Shared storage pools across machines for additional failure tolerance and load balancing
D:>format /fs:refs /q /i:enable <volume>
D:>format /fs:refs /q /i:disable <volume></pre>
By default, when the /i switch is not specified, the behavior that the system chooses depends on whether the volume resides on a mirrored space. On a mirrored space, integrity is enabled because we expect the benefits to significantly outweigh the costs. Applications can always override this programmatically for individual files.
Battling “bit rot”
As we described earlier, the combination of ReFS and Storage Spaces provides a high degree of data resiliency in the presence of disk corruptions and storage failures. A form of data loss that is harder to detect and deal with happens due to “bit rot,” where parts of the disk develop corruptions over time that go largely undetected since those parts are not read frequently. By the time they are read and detected, the alternate copies may have also been corrupted or lost due to other failures.
In order to deal with bit rot, we have added a system task that periodically scrubs all metadata and Integrity Stream data on a ReFS volume residing on a mirrored Storage Space. Scrubbing involves reading all the redundant copies and validating their correctness using the ReFS checksums. If checksums mismatch, bad copies are fixed using good ones.
The file attribute FILE_ATTRIBUTE_NO_SCRUB_DATA indicates that the scrubber should skip the file. This attribute is useful for those applications that maintain their own integrity information, when the application developer wants tighter control over when and how those files are scrubbed.
The Integrity.exe command line tool is a powerful way to manage the integrity and scrubbing policies.
When all else fails…continued volume availability
We expect many customers to use ReFS in conjunction with mirrored Storage Spaces, in which case corruptions will be automatically and transparently fixed. But there are cases, admittedly rare, when even a volume on a mirrored space can get corrupted – for example faulty system memory can corrupt data, which can then find its way to the disk and corrupt all redundant copies. In addition, some customers may not choose to use a mirrored storage space underneath ReFS.
For these cases where the volume gets corrupted, ReFS implements “salvage,” a feature that removes the corrupt data from the namespace on a live volume. The intention behind this feature is to ensure that non-repairable corruption does not adversely affect the availability of good data. If, for example, a single file in a directory were to become corrupt and could not be automatically repaired, ReFS will remove that file from the file system namespace while salvaging the rest of the volume. This operation can typically be completed in under a second.
Normally, the file system cannot open or delete a corrupt file, making it impossible for an administrator to respond. But because ReFS can still salvage the corrupt data, the administrator is able to recover that file from a backup or have the application re-create it without taking the file system offline. This key innovation ensures that we do not need to run an expensive offline disk checking and correcting tool, and allows for very large data volumes to be deployed without risking large offline periods due to corruption.
A clean fit into the Windows storage stack
We knew we had to design for maximum flexibility and compatibility. We designed ReFS to plug into the storage stack just like another file system, to maximize compatibility with the other layers around it. For example, it can seamlessly leverage BitLocker encryption, Access Control Lists for security, USN journal, change notifications, symbolic links, junction points, mount points, reparse points, volume snapshots, file IDs, and oplocks. We expect most file system filters to work seamlessly with ReFS with little or no modification. Our testing bore this out for example, we were able to validate the functionality of the existing Forefront antivirus solution.
Some filters that depend on the NTFS physical format will need greater modification. We run an extensive compatibility program where we test our file systems with third-party antivirus, backup, and other such software. We are doing the same with ReFS and will work with our key partners to address any incompatibilities that we discover. This is something we have done before and is not unique to ReFS.
An aspect of flexibility worth noting is that although ReFS and Storage Spaces work well together, they are designed to run independently of each other. This provides maximum deployment flexibility for both components without unnecessarily limiting each other. Or said another way, there are reliability and performance tradeoffs that can be made in choosing a complete storage solution, including deploying ReFS with underlying storage from our partners.
With Storage Spaces, a storage pool can be shared by multiple machines and the virtual disks can seamlessly transition between them, providing additional resiliency to failures. Because of the way we have architected the system, ReFS can seamlessly take advantage of this.
Usage
We have tested ReFS using a sophisticated and vast set of tens of thousands of tests that have been developed over two decades for NTFS. These tests simulate and exceed the requirements of the deployments we expect in terms of stress on the system, failures such as power loss, scalability, and performance. Therefore, ReFS is ready to be deployment-tested in a managed environment. Being the first version of a major file system, we do suggest just a bit of caution. We do not characterize ReFS in Windows 8 as a “beta” feature. It will be a production-ready release when Windows 8 comes out of beta, with the caveat that nothing is more important than the reliability of data. So, unlike any other aspect of a system, this is one where a conservative approach to initial deployment and testing is mandatory.
With this in mind, we will implement ReFS in a staged evolution of the feature: first as a storage system for Windows Server, then as storage for clients, and then ultimately as a boot volume. This is the same approach we have used with new file systems in the past.
Initially, our primary test focus will be running ReFS as a file server. We expect customers to benefit from using it as a file server, especially on a mirrored Storage Space. We also plan to work with our storage partners to integrate it with their storage solutions.
Conclusion
Along with Storage Spaces, ReFS forms the foundation of storage on Windows for the next decade or more. We believe this significantly advances our state of the art for storage. Together, Storage Spaces and ReFS have been architected with headroom to innovate further, and we expect that we will see ReFS as the next massively deployed file system.
-- Surendra
FAQ:
Q) Why is it named ReFS?
ReFS stands for Resilient File System. Although it is designed to be better in many dimensions, resiliency stands out as one of its most prominent features.
Q) What are the capacity limits of ReFS?
The table below shows the capacity limits of the on-disk format. Other concerns may determine some practical limits, such as the system configuration (for example, the amount of memory), limits set by various system components, as well as time taken to populate data sets, backup times, etc.
<tbody>
Attribute
Limit based on the on-disk format
Maximum size of a single file
2^64-1 bytes
Maximum size of a single volume
Format supports 2^78 bytes with 16KB cluster size (2^64 * 16 * 2^10). Windows stack addressing allows 2^64 bytes
Maximum number of files in a directory
2^64
Maximum number of directories in a volume
2^64
Maximum file name length
32K unicode characters
Maximum path length
32K
Maximum size of any storage pool
4 PB
Maximum number of storage pools in a system
No limit
Maximum number of spaces in a storage pool
No limit
</tbody>
Q) Can I convert data between NTFS and ReFS?
In Windows 8 there is no way to convert data in place. Data can be copied. This was an intentional design decision given the size of data sets that we see today and how impractical it would be to do this conversion in place, in addition to the likely change in architected approach before and after conversion.
Q) Can I boot from ReFS in Windows Server 8?
No, this is not implemented or supported.
Q) Can ReFS be used on removable media or drives?
No, this is not implemented or supported.
Q) What semantics or features of NTFS are no longer supported on ReFS?
The NTFS features we have chosen to not support in ReFS are: named streams, object IDs, short names, compression, file level encryption (EFS), user data transactions, sparse, hard-links, extended attributes, and quotas.
Q) What about parity spaces and ReFS?
ReFS is supported on the fault resiliency options provided by Storage Spaces. In Windows Server 8, automatic data correction is implemented for mirrored spaces only.
Q) Is clustering supported?
Failover clustering is supported, whereby individual volumes can failover across machines. In addition, shared storage pools in a cluster are supported.
Q) What about RAID? How do I use ReFS capabilities of striping, mirroring, or other forms of RAID? Does ReFS deliver the read performance needed for video, for example?
ReFS leverages the data redundancy capabilities of Storage Spaces, which include striped mirrors and parity. The read performance of ReFS is expected to be similar to that of NTFS, with which it shares a lot of the relevant code. It will be great at streaming data.
Q) How come ReFS does not have deduplication, second level caching between DRAM & storage, and writable snapshots?
ReFS does not itself offer deduplication. One side effect of its familiar, pluggable, file system architecture is that other deduplication products will be able to plug into ReFS the same way they do with NTFS.
ReFS does not explicitly implement a second-level cache, but customers can use third-party solutions for this.
ReFS and VSS work together to provide snapshots in a manner consistent with NTFS in Windows environments. For now, they don’t support writable snapshots or snapshots larger than 64TB.
Source: [url=http://blogs.msdn.com/b/b8/archive/2012/01/16/building-the-next-generation-file-system-for-windows-refs.aspx]Windows 8 Blog
Last edited: