09 November 2011
A blog about making HPC things (kind of) work
An HPC user found his local file system was filing up with data files. He asked me if he should get more storage. My first response was to ask him how he was using the storage server. He explained that he kept the result of his protein folding runs in the file system. Each directory was for one model and represented quite a bit of computer time. He wanted to keep the historical record, but had no need to use the data in his daily research. I then proceeded to examine the file server and found that he could save a lot of space by simply archiving and compressing older directories. A typical directory could be compressed 3-4 times. the compression would take about 10-15 minutes using bzip2. The file server had eight cores and the load on the system was low. In order to speed thing up, I decided to try using a parallel version of bzip2. Perhaps using all 8 cores I could compress the "compression" time! In order to test pbzip2, I first archived an example directory with the following size and MD5:- Uncompressed File Size: 4.2G
- MD5: 370256432a78c6314f7808eafeb347b4
- Wall clock time: 13 minutes; 34 seconds
- Compressed size: 1.3G
- Wall clock time: 4 minutes; 22 seconds
- Compressed size: 1.3G
- Restored File Size: 4.2G
- MD5: 370256432a78c6314f7808eafeb347b4
| < Prev | Next > |
|---|





