WordPress源码深度解析之：`WordPress`的`Password`哈希：`wp_hash_password()`函数的底层实现。 - 智猿学院-前后端，数据库，人工智能，云计算等领域前沿技术讲座

各位技术老铁，晚上好！我是今晚的主讲人，很高兴能和大家一起聊聊WordPress密码哈希的那些事儿。今天咱们不整虚的，直接扒开WordPress的裤衩，看看wp_hash_password()函数到底是怎么把咱们的密码变成一堆乱码的。

开场白：密码，安全的第一道防线，也可能是最薄弱的防线

密码这玩意儿，就像咱们家的门锁，锁好了，小偷进不来，锁不好，那可就成了免费参观了。在互联网世界里，密码更是至关重要。但问题来了，明文存储密码，那简直就是裸奔，任何能访问数据库的人都能看到。所以，密码哈希就应运而生了。

主角登场：wp_hash_password()函数

wp_hash_password()函数，就是WordPress用来给用户密码进行哈希处理的利器。它的作用是：

接受用户的原始密码：也就是用户在注册或修改密码时输入的明文密码。
使用安全的哈希算法进行处理：将明文密码转换成一串看起来毫无规律的字符串。
返回哈希后的密码：这个哈希后的密码会被存储到数据库中，代替原始密码。

源码剖析：一步一步揭开它的神秘面纱

我们先来看看wp_hash_password()函数的简化版本（省略了一些兼容性判断和插件过滤部分，保留核心逻辑）：

function wp_hash_password( $password ) {
    global $wp_hasher;

    if ( empty( $wp_hasher ) ) {
        require_once ABSPATH . WPINC . '/class-phpass.php';
        $wp_hasher = new PasswordHash( 8, true );
    }

    return $wp_hasher->HashPassword( trim( $password ) );
}

这段代码看着不长，但信息量可不小。

global $wp_hasher;: 这行代码声明了一个全局变量$wp_hasher。这个变量用来存储一个PasswordHash类的实例。之所以使用全局变量，是为了避免每次调用wp_hash_password()函数时都重新创建PasswordHash对象，提高效率。
if ( empty( $wp_hasher ) ) { ... }: 这是一个条件判断语句，用来检查$wp_hasher是否为空。如果为空，说明还没有创建PasswordHash对象，就需要创建一个。
require_once ABSPATH . WPINC . '/class-phpass.php';: 这行代码引入了一个名为class-phpass.php的文件。这个文件定义了PasswordHash类，是WordPress密码哈希的核心。ABSPATH是WordPress的根目录，WPINC是wp-includes目录。
$wp_hasher = new PasswordHash( 8, true );: 这行代码创建了一个PasswordHash类的实例，并将其赋值给$wp_hasher变量。 PasswordHash类的构造函数接受两个参数：
- 8: 表示哈希的迭代次数（cost）。迭代次数越多，哈希过程越慢，安全性越高，但同时也更耗费资源。
- true: 表示使用可移植的哈希算法。可移植的哈希算法可以在不同的PHP环境中生成相同的哈希值，保证兼容性。
return $wp_hasher->HashPassword( trim( $password ) );: 这行代码调用了PasswordHash类的HashPassword()方法，对密码进行哈希处理，并返回哈希后的密码。trim( $password )用于去除密码字符串两端的空格。

幕后英雄：PasswordHash类

PasswordHash类才是真正干活的家伙，它封装了密码哈希的核心逻辑。我们来扒一扒它的内部结构（代码依然是简化版，只保留核心部分）：

class PasswordHash {
    var $itoa64;
    var $iteration_count_log2;
    var $portable_hashes;
    var $random_state;

    function PasswordHash( $iteration_count_log2, $portable_hashes ) {
        $this->itoa64 = './0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz';

        if ( $iteration_count_log2 < 4 || $iteration_count_log2 > 31 )
            $iteration_count_log2 = 8;
        $this->iteration_count_log2 = $iteration_count_log2;

        $this->portable_hashes = $portable_hashes;

        $this->random_state = microtime();
        if (function_exists('getmypid'))
            $this->random_state .= getmypid();
    }

    function HashPassword( $password ) {
        $random = $this->get_random_bytes(16);
        $hash = $this->crypt_private( $password, $this->gensalt_private( $random ) );
        if ( strlen( $hash ) == 34 )
            return $hash;

        # Returning '*' means an error occurred
        return '*';
    }

    function gensalt_private( $input ) {
        $output = '$P$';
        $output .= $this->itoa64[min($this->iteration_count_log2 + ((PHP_VERSION >= '5' ) ? 5 : 3), 30)];
        $output .= $this->encode64( $input, 16 );
        return $output;
    }

    function crypt_private( $password, $setting ) {
        $output = '*0';
        if ( substr( $setting, 0, 3 ) != '$P$' )
            return $output;

        $count_log2 = strpos( $this->itoa64, $setting[3] );
        if ( $count_log2 < 7 || $count_log2 > 30 )
            return $output;

        $count = 1 << $count_log2;

        $salt = substr( $setting, 4, 8 );
        if ( strlen( $salt ) != 8 )
            return $output;

        # We're kind of forced to use MD5 here since it's the only
        # cryptographic primitive guaranteed to be available in all
        # PHP installations.
        if (PHP_VERSION >= '5') {
            $hash = md5($salt . $password, TRUE);
            do {
                $hash = md5($hash . $password, TRUE);
            } while (--$count);
        } else {
            $hash = pack('H*', md5($salt . $password));
            do {
                $hash = pack('H*', md5($hash . $password));
            } while (--$count);
        }

        $output = substr( $setting, 0, 12 );
        $output .= $this->encode64( $hash, 16 );

        return $output;
    }

    function encode64( $input, $count ) {
        $output = '';
        $i = 0;
        do {
            $value = ord($input[$i++]);
            $output .= $this->itoa64[$value & 0x3f];
            if ($i < $count)
                $value |= ord($input[$i]) << 8;
            $output .= $this->itoa64[($value >> 6) & 0x3f];
            if ($i++ >= $count)
                break;
            if ($i < $count)
                $value |= ord($input[$i]) << 16;
            $output .= $this->itoa64[($value >> 12) & 0x3f];
            if ($i++ >= $count)
                break;
            if ($i < $count)
                $value |= ord($input[$i]) << 24;
            $output .= $this->itoa64[($value >> 18) & 0x3f];
        } while ($i < $count);

        return $output;
    }

    function get_random_bytes( $count ) {
        $output = '';
        if (is_readable('/dev/urandom') &&
            ($fh = @fopen('/dev/urandom', 'rb'))) {
            $output = fread($fh, $count);
            fclose($fh);
        }

        if (strlen($output) < $count) {
            $output = '';
            for ($i = 0; $i < $count; $i += 16) {
                $this->random_state = md5(microtime() . $this->random_state);
                $output .= pack('H*', md5($this->random_state));
            }
            $output = substr($output, 0, $count);
        }

        return $output;
    }

    function CheckPassword( $password, $stored_hash ) {
        $hash = $this->crypt_private( $password, $stored_hash );
        if ( '*' === $hash )
            return false;

        return $hash === $stored_hash;
    }
}

让我们逐个击破：

$itoa64: 这是一个字符串，包含了64个不同的字符，用于将二进制数据编码成可读的字符串。你可以把它想象成一个密码本，每个字符代表一个特定的数值。
$iteration_count_log2: 存储了哈希的迭代次数的以2为底的对数。例如，如果迭代次数是8，那么$iteration_count_log2就是3。迭代次数越多，哈希过程越慢，安全性越高。
$portable_hashes: 一个布尔值，表示是否使用可移植的哈希算法。
$random_state: 用于生成随机数的内部状态。

HashPassword() 方法

这个方法是哈希密码的入口。

$random = $this->get_random_bytes(16);: 生成16个随机字节，作为盐（salt）。盐的作用是增加密码的复杂度，防止彩虹表攻击。
$hash = $this->crypt_private( $password, $this->gensalt_private( $random ) );: 调用crypt_private()方法，使用生成的盐对密码进行哈希处理。gensalt_private()方法用于生成盐的字符串表示形式。
if ( strlen( $hash ) == 34 ) return $hash;: 检查哈希后的密码长度是否为34个字符。如果长度正确，则返回哈希后的密码。
*`return ‘‘;**: 如果哈希过程中出现错误，则返回*`号。

gensalt_private() 方法

这个方法用于生成盐的字符串表示形式。

$output = '$P$';: 盐的字符串表示形式以 $P$ 开头，这是一个固定的前缀，用于标识使用了这种哈希算法。
$output .= $this->itoa64[min($this->iteration_count_log2 + ((PHP_VERSION >= '5' ) ? 5 : 3), 30)];: 将迭代次数编码到盐的字符串表示形式中。这里的min()函数用于限制迭代次数的最大值，防止出现性能问题。
$output .= $this->encode64( $input, 16 );: 将随机生成的盐进行Base64编码，并添加到盐的字符串表示形式中。
return $output;: 返回生成的盐的字符串表示形式。

crypt_private() 方法

这个方法是哈希密码的核心。

*`$output = ‘0′;**: 如果哈希过程中出现错误，则返回*0`。
if ( substr( $setting, 0, 3 ) != '$P$' ) return $output;: 检查盐的字符串表示形式是否以 $P$ 开头。如果不是，则说明盐的格式不正确，返回错误。
$count_log2 = strpos( $this->itoa64, $setting[3] );: 从盐的字符串表示形式中提取迭代次数。
if ( $count_log2 < 7 || $count_log2 > 30 ) return $output;: 检查迭代次数是否在允许的范围内。
$count = 1 << $count_log2;: 计算实际的迭代次数。
$salt = substr( $setting, 4, 8 );: 从盐的字符串表示形式中提取盐。
if ( strlen( $salt ) != 8 ) return $output;: 检查盐的长度是否为8个字符。
if (PHP_VERSION >= '5') { ... } else { ... }: 根据PHP版本选择不同的哈希算法。在PHP5及以上版本中，使用md5()函数进行哈希处理。在PHP5以下版本中，使用pack('H*', md5(...))的方式进行哈希处理。
do { ... } while (--$count);: 进行多次哈希迭代。每次迭代都将上一次的哈希值和密码作为输入，生成新的哈希值。
$output = substr( $setting, 0, 12 );: 将盐的字符串表示形式的前12个字符作为输出的前缀。
$output .= $this->encode64( $hash, 16 );: 将哈希后的密码进行Base64编码，并添加到输出中。
return $output;: 返回哈希后的密码。

encode64() 方法

这个方法用于将二进制数据编码成Base64字符串。

get_random_bytes() 方法

这个方法用于生成随机字节。它首先尝试从/dev/urandom文件中读取随机数。如果读取失败，则使用md5()函数生成伪随机数。

CheckPassword() 方法

这个方法用于验证密码是否正确。它将用户输入的密码和存储在数据库中的哈希值进行比较。

哈希算法的选择：为什么是MD5？

你可能会问，现在都2024年了，WordPress还在用MD5？MD5不是早就被破解了吗？

没错，MD5确实存在安全漏洞，但WordPress在这里使用MD5，并不是直接对用户的密码进行MD5哈希，而是结合了加盐和多次迭代的方式。

加盐 (Salt)：每个用户的密码都会生成一个独一无二的盐值，这个盐值会和密码一起进行哈希。这样即使两个用户使用相同的密码，生成的哈希值也会不一样，大大增加了破解难度。
多次迭代 (Iteration)：哈希函数会重复执行多次，每次都使用上一次的结果作为输入。这样可以增加计算成本，使得暴力破解更加困难。

虽然MD5本身存在安全隐患，但在加盐和多次迭代的加持下，安全性还是有所保障的。当然，如果WordPress未来升级到更安全的哈希算法（比如bcrypt或argon2），那自然是极好的。

总结：密码哈希的艺术

wp_hash_password()函数和PasswordHash类，共同构成了WordPress密码哈希的核心。它们通过加盐、多次迭代等手段，将用户的密码变成一堆难以破解的乱码，保护用户的账户安全。

加盐：让每个密码都独一无二。
多次迭代：增加破解的计算成本。
PasswordHash类: 安全的哈希算法的实现。

希望通过今天的讲解，大家对WordPress的密码哈希有了更深入的了解。记住，保护密码安全，人人有责！

Q&A环节

现在是自由提问时间，大家有什么疑问都可以提出来，我会尽力解答。

一些思考 (可以作为未来的改进方向)

算法升级: 考虑升级到bcrypt或argon2等更安全的哈希算法。
密钥管理: 优化密钥管理机制，防止密钥泄露。
强化验证: 增加双因素认证等额外的安全验证手段。

好了，今天的分享就到这里。感谢大家的参与！下次有机会再和大家一起探讨更深入的技术话题。拜拜！

发表回复 取消回复

发表回复取消回复