Organizing Images or Other Files with Hashed Folder Names

I finally decided to tackle a site that had over 20GB of images in a single folder. I won’t say who created this monster (whoops), but while moving the site to a new dedicated server, I decided it was time to remedy the situation.

The first issue is with anything you want to (graphically) process folders is limited greatly by the GUI and how much data needs to be processed for each request. This is why I decided a hashed folder naming convention would be the best approach. The folder names do not matter, just so long as there is reasonable grouping and/or separation.

Initially I thought hashed folder names with 4 hex characters would be fine. That is of course until I tried loading said folders in FTP or through Cpanel File Manager. Using a 4-character hex hash results in 65,535 [0000-ffff] possible folder names. This was way too many for File manager and either my server or FileZilla limits directory listing to 9999 entries.

Okay, fine… so I lowered it to a 3 character hash thinking 4,095 [ 000-fff] possible folder names would perform efficiently enough. Wrong! While it would eventually load the folders and did perform better on both GUIs, it still took too long for my tastes.

I finally settled on a 2-character hash (eg: 00-ff) limiting each nested level to 255 possible folder names. For this particular application, I only nested 2 levels deep since there are less than 65k images [255^2]. You could of course nest up to 16 [255^16 = 3.1962658e+38] levels since MD5 hashes are 32 characters lon.

Here is the procedural function I used to process images received in a URL format…

function copyImageFromURL($url, $length = 2, $depth = 2, $chmod = 0755) {
    if (!empty($url)) {
        $base_folder = '/path/to/images/';
        
        $dst = $base_folder;
        
        $basename = basename(parse_url($url, PHP_URL_PATH));
        
        $hash = md5($basename);
        
        $folders = array();
        
        // Populate folder array
        for ($i = 0; $i < ($depth * $depth); $i += $depth) {
            $folders[] = substr($hash, $i, $length);
        }
        
        // Ensure folders exists
        foreach ($folders as $path) {
            if (!is_dir($dst . $path)) {
                mkdir($dst . $path, $chmod);
            }
            
            $dst .= $path . '/';
        }
        
        $dst .= $basename;
        
        if (file_exists($dst) || @copy($url, $dst)) {
            return substr($dst, strlen($base_folder));
        }
    }
    
    return '';
}

There you have it. You can get the base image name (from a URL in this case), generate an MD5 hash based on that string, then split away to your heart’s content creating as many organizational folders as one might need to suit their quantity of images.

Call it like so…

$new_image = copyImageFromURL('http://www.domain.com/i/some-image.jpg');

This method will eliminate duplicates (assuming they all come from the same source, use a different $base_folder for each source otherwise) and organize them into [00-ff] folders that each contain [00-ff] folders providing you 65k folders over all.

You can of course pass higher $length an $depth arguments, but I found $length=2 best for various GUIs and $depth=2 providing 65k folders suited my needs just fine.

Leave a Reply

Your email address will not be published. Required fields are marked *