剖析 `do_shortcode()` 函数的源码，它如何使用正则表达式解析短代码并调用对应的处理函数？ - 智猿学院-前后端，数据库，人工智能，云计算等领域前沿技术讲座

各位观众老爷们，晚上好！今天咱们来聊聊WordPress短代码的幕后英雄——do_shortcode() 函数。这家伙可是个正则表达式高手，专门负责把博客文章里那些看起来像 [my_shortcode] 的东西，变成实际的内容。

一、短代码：化腐朽为神奇的魔法棒

先来明确一下，什么是短代码？简单来说，它就是WordPress提供的一种简便方式，让你在文章、页面或者小工具里插入复杂的HTML、PHP代码，而不用直接修改主题文件。想象一下，你需要在每篇文章里都插入一个广告横幅，如果没有短代码，你就得手动复制粘贴N次。有了短代码，只需要定义一个像 [ad_banner] 这样的标签，然后在文章里写上它，WordPress就会自动替换成你的广告代码。是不是很方便？

二、do_shortcode()：短代码的司令官

do_shortcode() 函数就是负责执行这些替换工作的。它的主要任务是：

在文本中查找短代码。
提取短代码的名称和属性。
调用与短代码名称对应的处理函数。
用处理函数的返回值替换原文中的短代码。

三、源码剖析：拨开迷雾见真章

让我们深入wp-includes/shortcodes.php 文件，看看 do_shortcode() 函数的庐山真面目。为了便于理解，我对源码进行简化，并添加了注释。

/**
 * 执行短代码
 *
 * @param string $content 要处理的文本
 * @return string 处理后的文本
 */
function do_shortcode( $content ) {
    global $shortcode_tags; // 全局变量，存储了所有已注册的短代码

    if ( empty( $shortcode_tags ) || ! is_array( $shortcode_tags ) ) {
        return $content; // 如果没有注册任何短代码，直接返回原始文本
    }

    $pattern = get_shortcode_regex(); // 获取用于匹配短代码的正则表达式

    return preg_replace_callback( "/$pattern/s", 'do_shortcode_tag', $content ); // 使用正则表达式匹配并替换短代码
}

/**
 * 获取用于匹配短代码的正则表达式
 *
 * @return string 正则表达式
 */
function get_shortcode_regex() {
    global $shortcode_tags;
    $tagnames = array_keys($shortcode_tags);
    $tagregexp = join( '|', array_map('preg_quote', $tagnames) );

    // WARNING! Do not change this regex without changing do_shortcode_tag() and strip_shortcodes()
    return
        '\['                              // Opening bracket
        . '(\[?)'                           // 1: Optional second opening bracket for escaping shortcodes: [[tag]]
        . "($tagregexp)"                     // 2: Shortcode name
        . '(?![\w-])'                       // Word boundary
        . '('                                // 3: Unroll the loop: Inside the opening shortcode tag
        .     '[^\]\/]*'                   // Not a closing bracket or forward slash
        .     '(?:'
        .         '\/(?!\])'               // A forward slash not followed by a closing bracket
        .         '[^\]\/]*'               // Not a closing bracket or forward slash
        .     ')*?'
        . ')'
        . '(?:'
        .     '(\/)'                        // 4: Self closing tag ...
        .     '\]'                          // ... and closing bracket
        . '|'
        .     '\]'                          // Closing bracket
        .     '(?:'
        .         '('                        // 5: Unroll the loop: Optionally, anything between the opening and closing shortcode tags
        .             '[^\[]*+'             // Not an opening bracket
        .             '(?:'
        .                 '\[(?!\/\2\])' // An opening bracket not followed by the closing shortcode tag
        .                 '[^\[]*+'         // Not an opening bracket
        .             ')*+'
        .         ')'
        .         '\[\/\2\]'             // Closing shortcode tag
        .     )?'
        . ')'
        . '(\]?)';                          // 6: Optional second closing brocket for escaping shortcodes: [[tag]]
}

/**
 * 处理单个短代码标签
 *
 * @param array $m 正则表达式匹配结果
 * @return string 处理后的内容
 */
function do_shortcode_tag( $m ) {
    global $shortcode_tags;

    // allow [[foo]] syntax for escaping a tag
    if ( $m[1] == '[' && $m[6] == ']' ) {
        return substr( $m[0], 1, -1 );
    }

    $tag = $m[2];
    $attr = shortcode_parse_atts( $m[3] );

    if ( empty( $m[5] ) ) {
        // opening tag, nothing between opening and closing tags
        return $shortcode_tags[$tag]( $attr, null, $tag );
    } else {
        // opening tag, some content between opening and closing tags
        return $shortcode_tags[$tag]( $attr, $m[5], $tag );
    }
}

/**
 * 解析短代码属性
 *
 * @param string $text 属性字符串
 * @return array 属性数组
 */
function shortcode_parse_atts( $text ) {
    $atts = array();
    $pattern = '/(w+)s*=s*"([^"]*)"(?:s|$)|(w+)s*=s*'([^']*)'(?:s|$)|(w+)s*=s*([^s'"]+)(?:s|$)|"([^"]*)"(?:s|$)|(S+)(?:s|$)/';
    $text = preg_replace("/[x{00a0}x{200b}]+/u", " ", $text);
    if ( preg_match_all( $pattern, $text, $match, PREG_SET_ORDER ) ) {
        foreach ( $match as $m ) {
            if ( ! empty( $m[1] ) ) {
                $atts[ strtolower( $m[1] ) ] = stripcslashes( $m[2] );
            } elseif ( ! empty( $m[3] ) ) {
                $atts[ strtolower( $m[3] ) ] = stripcslashes( $m[4] );
            } elseif ( ! empty( $m[5] ) ) {
                $atts[ strtolower( $m[5] ) ] = stripcslashes( $m[6] );
            } elseif ( ! empty( $m[7] ) ) {
                $atts[] = stripcslashes( $m[7] );
            } elseif ( ! empty( $m[8] ) ) {
                $atts[] = stripcslashes( $m[8] );
            }
        }
    } else {
        $atts = ltrim( $text );
    }
    return $atts;
}

四、代码流程：一步一个脚印

检查短代码是否已注册： do_shortcode() 函数首先检查全局变量 $shortcode_tags，这个变量是一个关联数组，存储了所有已注册的短代码。键是短代码的名称，值是对应的处理函数。如果 $shortcode_tags 为空，说明没有注册任何短代码，函数直接返回原始文本。
构建正则表达式： 如果有注册的短代码，do_shortcode() 函数会调用 get_shortcode_regex() 函数来构建用于匹配短代码的正则表达式。这个正则表达式非常复杂，但它的核心作用是匹配以下几种短代码形式：
- [shortcode]：简单的短代码，没有属性。
- [shortcode attribute="value"]：带有属性的短代码。
- [shortcode]content[/shortcode]：带有内容的短代码。
- [shortcode attribute="value"]content[/shortcode]：带有属性和内容的短代码。
- [[shortcode]]: 用于转义的短代码, 不会被解析
get_shortcode_regex() 函数主要做了两件事：
- 获取所有已注册的短代码名称。
- 将这些名称组合成一个正则表达式，用于匹配文章内容中的短代码。
这个正则表达式是整个短代码机制的核心，它定义了短代码的语法规则。
使用正则表达式进行匹配和替换： do_shortcode() 函数使用 preg_replace_callback() 函数，结合上面构建的正则表达式，在文本中查找短代码，并对每个匹配到的短代码调用 do_shortcode_tag() 函数进行处理。
处理单个短代码标签： do_shortcode_tag() 函数接收 preg_replace_callback() 函数传递的匹配结果数组。这个数组包含了短代码的各个部分，比如短代码名称、属性和内容。

do_shortcode_tag() 函数的主要任务是：
- 提取短代码的名称。
- 使用 shortcode_parse_atts() 函数解析短代码的属性。
- 调用与短代码名称对应的处理函数，并将属性和内容作为参数传递给该函数。
- 将处理函数的返回值作为替换文本，替换原文中的短代码。
解析短代码属性： shortcode_parse_atts() 函数负责解析短代码的属性。它可以处理以下几种属性形式：
- attribute="value"
- attribute='value'
- attribute=value
- "value"
- value
shortcode_parse_atts() 函数使用正则表达式来匹配属性，并将属性名和属性值存储在一个关联数组中。

五、正则表达式：短代码的灵魂

get_shortcode_regex() 函数返回的正则表达式是短代码机制的灵魂，理解它对于深入了解短代码的工作原理至关重要。让我们来分解一下这个正则表达式：

'\['                              // Opening bracket
. '(\[?)'                           // 1: Optional second opening bracket for escaping shortcodes: [[tag]]
. "($tagregexp)"                     // 2: Shortcode name
. '(?![\w-])'                       // Word boundary
. '('                                // 3: Unroll the loop: Inside the opening shortcode tag
.     '[^\]\/]*'                   // Not a closing bracket or forward slash
.     '(?:'
.         '\/(?!\])'               // A forward slash not followed by a closing bracket
.         '[^\]\/]*'               // Not a closing bracket or forward slash
.     ')*?'
. ')'
. '(?:'
.     '(\/)'                        // 4: Self closing tag ...
.     '\]'                          // ... and closing bracket
. '|'
.     '\]'                          // Closing bracket
.     '(?:'
.         '('                        // 5: Unroll the loop: Optionally, anything between the opening and closing shortcode tags
.             '[^\[]*+'             // Not an opening bracket
.             '(?:'
.                 '\[(?!\/\2\])' // An opening bracket not followed by the closing shortcode tag
.                 '[^\[]*+'         // Not an opening bracket
.             ')*+'
.         ')'
.         '\[\/\2\]'             // Closing shortcode tag
.     )?'
. ')'
. '(\]?)';                          // 6: Optional second closing brocket for escaping shortcodes: [[tag]]

编号	正则表达式片段	描述
1	`\[`	匹配一个开方括号 `[`。
2	`(\[?)`	匹配可选的第二个开方括号 `[`。用于转义短代码，例如 `[[shortcode]]`。
3	`($tagregexp)`	匹配短代码的名称。`$tagregexp` 是一个动态生成的正则表达式，包含了所有已注册的短代码名称，例如 `(my_shortcode1\|my_shortcode2)`。
4	`(?![\w-])`	负向先行断言，确保短代码名称后面不是字母、数字或下划线，这可以避免匹配到错误的短代码，例如 `[shortcode-text]`。
5	`(`…`)`	匹配短代码的属性。这部分比较复杂，使用了非贪婪匹配和回溯控制，以确保正确匹配到属性。
6	`(?:(\/)\]\|\])`	匹配短代码的闭合标签。有两种形式：自闭合标签 `[/]` 和普通闭合标签 `]`。
7	`(?:`…`)?`	匹配短代码的内容。这部分也比较复杂，使用了非贪婪匹配和回溯控制，以确保正确匹配到短代码的内容。
8	`(\]?)`	匹配可选的第二个闭合方括号 `]`。用于转义短代码，例如 `[[shortcode]]`。

这个正则表达式的设计非常巧妙，它考虑了各种可能的短代码形式，并使用了大量的技巧来提高匹配的准确性和效率。

六、注册短代码：让你的短代码闪亮登场

要让 do_shortcode() 函数能够识别你的短代码，你需要先注册它。 WordPress提供了 add_shortcode() 函数来注册短代码。

/**
 * 注册短代码
 *
 * @param string $tag 短代码名称
 * @param callable $callback 处理函数
 */
function add_shortcode( $tag, $callback ) {
    global $shortcode_tags;

    if ( is_callable( $callback ) ) {
        $shortcode_tags[$tag] = $callback;
    }
}

add_shortcode() 函数接收两个参数：

$tag：短代码的名称。
$callback：处理函数的名称。

例如，要注册一个名为 my_shortcode 的短代码，可以使用以下代码：

function my_shortcode_handler( $atts, $content = null ) {
    $atts = shortcode_atts( array(
        'name' => 'World',
    ), $atts );

    $output = 'Hello, ' . esc_html( $atts['name'] ) . '!';

    if ( ! is_null( $content ) ) {
        $output .= ' Content: ' . esc_html( $content );
    }

    return $output;
}

add_shortcode( 'my_shortcode', 'my_shortcode_handler' );

在这个例子中，my_shortcode_handler() 函数就是 my_shortcode 短代码的处理函数。它接收两个参数：

$atts：一个关联数组，包含了短代码的属性。
$content：短代码的内容，如果没有内容，则为 null。

shortcode_atts() 函数用于设置属性的默认值，并合并用户传入的属性。

七、实际应用：让短代码发挥光芒

注册了短代码之后，就可以在文章、页面或者小工具里使用它了。例如，在文章中插入以下代码：

[my_shortcode name="John"]

WordPress就会自动将这段代码替换成 Hello, John!。

如果插入以下代码：

[my_shortcode name="John"]This is the content[/my_shortcode]

WordPress就会自动将这段代码替换成 Hello, John! Content: This is the content。

八、总结：短代码的奥秘

do_shortcode() 函数是WordPress短代码机制的核心。它使用正则表达式来匹配和替换短代码，并调用对应的处理函数来生成实际的内容。理解 do_shortcode() 函数的工作原理，可以帮助你更好地使用和扩展短代码功能，让你的WordPress网站更加强大和灵活。

九、优化建议

虽然do_shortcode()功能强大，但在高流量的网站上，频繁的正则表达式匹配和函数调用可能会影响性能。以下是一些优化建议：

缓存短代码结果： 对于静态内容，可以将短代码的处理结果缓存起来，避免每次都重新生成。
限制短代码的使用范围： 仅在需要的地方使用短代码，避免在整个网站范围内都启用短代码功能。
优化短代码处理函数： 确保短代码处理函数的代码高效，避免执行不必要的计算。
避免嵌套过深的短代码： 嵌套过深的短代码会导致性能问题，尽量避免这种情况。

希望今天的讲座能让你对WordPress短代码有更深入的了解。记住，理解原理才能更好地运用！谢谢大家！

发表回复 取消回复

发表回复取消回复