Analysis of File sizes on disk


Analysing the files on a computer disk drive is very enlightening in terms of usage, backup requirements, and even backup strategies.

While there are plenty of applications like WinDirStat that can analyze the disks and present useful statistics, nothing beats generating raw data, and importing it into a spreadsheet for your own analysis.

The challenge then is to generate the data in spreadsheet compatible format such as CSV or TSV.

Approach #1:
I found this as a down-voted answer on StackOverflow, for the simple reason that the original answer did not work. Few quick tweaks made it work like a charm.

Ensure that you execute this program in Windows Command Prompt or PowerShell.

forfiles /p c:\ /s /c "cmd /c if /i @isdir==false echo @file:::@fsize" >>d:\dirlist.txt

Explanation:

forfiles /s /c is a rarely used command that can execute an action on all files / folders that match a condition. The /s switch executes it across sub-directories, and the /c switch indicates that forfiles must execute another program on the matched files and folders.

cmd /c launches a child-process of the Command Processor (that is used to create a Command Prompt window). The /c switch indicates that cmd must execute another program on the file / folder that has been passed to it by forfiles

if /i @isdir==false executes the if command and matches a condition. The /i switch instructs if command to ignore text-case (uppercase, lowercase etc.) when matching. @IsDir is a value generated by forfiles; it is true if the matched item is a Directory (aka folder) or a file. Comparing @IsDir with False, ensures that the if command will succeed only for files and not folders.

echo @file:::@fsize executes the echo command which simply sends the output of the string after it to the terminal. @file, and @fsize are variables defined by forfiles command that contain the name of the matched file, and file-size in bytes. I am using a triple-colon separator to create a reliable field separator for successful import into a Spreadsheet program. You can use anything you like, for ex: a #, or a $. Please note that characters such as space, hash, dollar etc. can appear legitimately in file-names, and this can confuse the Import Data functionality in the Spreadsheet program.

>>d:\dirlist.txt redirects the output of the echo command to a file located in the root directory of D drive. Please note the use of double greater-than symbols. A single greater-than symbol will result in the dirlist.txt file getting over-written every time the echo command runs. A double greater-than symbol results in the output of the echo command getting appended to the dirlist.txt file.

Note that I am writing to a file in the root directory of the D drive. On some configurations, this may fail as Windows may prevent creation of files in the root directory by programs that are not being run with Administrator privileges. Writing to a file in a sub-directory such as D:\Temp\DirList.txt is a better idea.

Also note that, in my case, the D drive is actually a RAM drive. My main HDD (C drive) contains over 5 Lakh files and if the dirlist.txt file was located on a physical HDD, it would put quite a bit of strain on the mechanics of the HDD as the file is opened, written, closed, and the File-allocation-table is updated for every file in C drive. Writing this file to a SSD will quickly use up valuable write-cycles of a SSD which are quite limited to start with.

To create a RAM-Disk, you can use ImDisk or SoftPerfect RAM Disk. ImDisk can create really large RAM Disks (I created a 24 GB disk on my Laptop featuring 32 GB RAM), however it has a bug due to which if you were to locate Windows TEMP on the RAM disk, Windows Update fails to work along with many other Software Installers. SoftPerfect RAMDisk is limited to 2 GB in size but works well with Windows Update etc. I use both software – SoftPerfect for a 1 GB RAM-disk on which Windows TEMP folder resides, and ImDisk to create and destroy RAM-disks as per my needs.

The forfiles command defines many variables such as @ext, @fdate etc. which can be used to analyze the type of files residing on your computer. You can read the official documentation by Microsoft here.

The redirectors (greater-than symbols) send the output of forfiles command to the text-file, and no output is shown on the window. To be able to view the output on the Screen as well as save it in the file, you will need an application similar to the tee command in Unix. Fortunately, Microsoft bundles the tee command as part of PowerShell. Simply execute the following in a Windows PowerShell (not the default Command Prompt or Terminal).

forfiles /p c:\ /s /c "cmd /c if /i @isdir==false echo @file:::@fsize" | tee -append d:\dirlist.txt

Explanation:

The pipe symbol (|)at the end of the echo command redirects the output to another Program and does not attempt to save it to a file.

The tee command saves the output to the file specified. Due to the -append switch, it will add the output to the end of the file. Tee also sends a copy of the output to the terminal, hence you will be able to see the output of the forfiles command as it executes.

Bug:

forfiles has a major bug that can introduce errors in your data. The command follows actual files and folders as well as links to the files and folders. For example, Windows 10 by default stores the files related to the users of the machine in the C:\USERS folder. It also creates a link (aka shortcut) called “C:\Documents and Settings” that points to C:\USERS. Because forfiles traverses links, every file in C:\USERS is processed twice. God forbid if there are cyclic links; forfiles will simply wear out the HDD till it dies.


Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.