The Dedupeer Project was developed by Paulo Fernando, Master candidate in Computer Science in the Federal University of Pernambuco, with Silvio Meira as advisor and Vinicius Garcia as co-advisor, aiming create a software component that could be integrated in the storage systems and benefits them with saving space, which the economy of disk usage contributes to the green storage.
"EPA understands that there are many software-based approaches to improving the energy efficiency of storage products. The benefits of virtualization, data deduplication, and other software-based data management techniques are well documented. These software solutions, perhaps even more so than the hardware itself, are heavily customized for specific customers and applications. Achieving maximum efficiency gains is highly dependent upon proper software architecture, implementation, operation, and maintenance by individual users. A key objective of the ENERGY STAR specification is to identify and reward storage solutions that seamlessly integrate software and hardware efficiency strategies that provide verifiable benefits without user intervention."
To see Dedupeer in operation, you can download the Dedupeer File Storage, which is a project that manages the distributed data and creates a distributed storage system easily based on Apache Cassandra.
Screenshots from DeFS
This first screenshot shows the storage economy with deduplication between two files (2.97GB the base file and 4.28GB the file updated) of virtual machines with chunks of 128 KB to illustrate the GUI of the DefS version 0.1.5.
The second screenshot shows the modification map of the deduplicated file.