Spring Cloud微服务间TraceId丢失根因分析与日志追踪最佳实践 - 智猿学院-前后端，数据库，人工智能，云计算等领域前沿技术讲座

Spring Cloud 微服务间 TraceId 丢失根因分析与日志追踪最佳实践

各位同学，大家好！今天我们来聊聊微服务架构下，使用 Spring Cloud 进行日志追踪时，TraceId 丢失的问题，以及如何构建一个健壮的日志追踪体系。在微服务架构中，一个请求往往需要经过多个服务的处理，如果 TraceId 丢失，我们将难以追踪整个调用链，给问题排查带来极大的困难。

1. TraceId 丢失的常见根因

TraceId 丢失的原因多种多样，但总结起来，主要可以归纳为以下几类：

线程上下文传递缺失： 异步调用、线程池使用不当等情况可能导致 TraceId 没有在线程之间正确传递。
HTTP Header 传递遗漏： 服务间通过 HTTP 调用时，忘记显式传递 TraceId Header。
中间件配置错误： 例如，负载均衡器、消息队列等中间件没有正确配置，导致 TraceId 被丢弃。
日志框架配置不一致： 不同服务使用的日志框架配置不一致，导致 TraceId 信息没有正确地被添加到日志中。
代码逻辑错误： 在某些特殊情况下，代码逻辑可能错误地覆盖或清空了 TraceId。

接下来，我们针对每种情况进行深入分析，并提供相应的解决方案。

2. 线程上下文传递缺失的解决方案

在微服务架构中，异步调用和线程池的使用非常普遍。如果 TraceId 没有在不同的线程之间正确传递，就会导致部分调用链的 TraceId 丢失。

2.1 使用 ThreadLocal 传递 TraceId (不推荐)

一种常见的做法是使用 ThreadLocal 来存储 TraceId。虽然简单，但这种方式存在很多问题：

内存泄漏： 如果 ThreadLocal 中的 TraceId 没有及时清理，可能会导致内存泄漏。
线程池复用问题： 在线程池中，线程会被复用，如果上一个任务遗留的 TraceId 没有清理，可能会影响下一个任务。
代码侵入性强： 需要在每个需要传递 TraceId 的地方显式地设置和清理 ThreadLocal，代码侵入性较强。

因此，强烈不推荐使用 ThreadLocal 来传递 TraceId。

2.2 使用 InheritableThreadLocal (不推荐)

InheritableThreadLocal 是 ThreadLocal 的一个变种，它可以在创建子线程时将父线程的 ThreadLocal 值复制到子线程中。但是，它仍然存在与 ThreadLocal 类似的问题，并且在线程池场景下效果不佳，因为线程池中的线程不是每次都重新创建，而是复用的。

2.3 使用 Spring Cloud Sleuth 的 Span 机制

Spring Cloud Sleuth 提供了更加优雅的解决方案。它通过 Span 机制来管理 TraceId，并自动地在线程之间传递。

示例代码：

首先，确保你的项目中引入了 Spring Cloud Sleuth 的依赖：

<dependency>
    <groupId>org.springframework.cloud</groupId>
    <artifactId>spring-cloud-starter-sleuth</artifactId>
</dependency>

然后，在需要异步调用的地方，使用 Tracer 对象创建 Span：

import brave.Span;
import brave.Tracer;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.scheduling.annotation.Async;
import org.springframework.stereotype.Service;

@Service
public class AsyncService {

    @Autowired
    private Tracer tracer;

    @Async
    public void doSomethingAsync() {
        Span newSpan = tracer.nextSpan().name("asyncTask").start();
        try (Tracer.SpanInScope ws = tracer.withSpanInScope(newSpan.context())) {
            // 在这里执行异步任务
            System.out.println("Executing asynchronous task...");
            // 模拟耗时操作
            Thread.sleep(100);
        } catch (InterruptedException e) {
            e.printStackTrace();
        } finally {
            newSpan.finish();
        }
    }
}

在这个例子中，tracer.nextSpan() 会创建一个新的 Span，并自动继承当前的 TraceId。Tracer.SpanInScope 确保在整个异步任务执行期间，TraceId 都被正确地设置在线程上下文中。使用 try-with-resources 确保 Span 最终会被关闭。

2.4 使用 Spring Cloud Stream 的消息传递

如果使用 Spring Cloud Stream 进行异步消息传递，Sleuth 会自动地将 TraceId 添加到消息的 Headers 中，并在消息消费者端自动地恢复 TraceId。你只需要确保你的消息通道配置正确，Sleuth 就会自动处理 TraceId 的传递。

3. HTTP Header 传递遗漏的解决方案

服务间通过 HTTP 调用时，必须显式地传递 TraceId Header。否则，下游服务将无法获取到 TraceId，导致调用链中断。

3.1 使用 Spring Cloud OpenFeign

如果使用 Spring Cloud OpenFeign 进行服务调用，Sleuth 会自动地将 TraceId 添加到 HTTP Header 中。你只需要确保你的 Feign Client 配置正确，Sleuth 就会自动处理 TraceId 的传递。

示例代码：

import org.springframework.cloud.openfeign.FeignClient;
import org.springframework.web.bind.annotation.GetMapping;

@FeignClient(name = "downstream-service")
public interface DownstreamServiceClient {

    @GetMapping("/api/data")
    String getData();
}

3.2 使用 RestTemplate 或 WebClient

如果使用 RestTemplate 或 WebClient 进行服务调用，你需要手动地将 TraceId 添加到 HTTP Header 中。

示例代码 (RestTemplate)：

import brave.Tracer;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.http.HttpEntity;
import org.springframework.http.HttpHeaders;
import org.springframework.http.HttpMethod;
import org.springframework.stereotype.Service;
import org.springframework.web.client.RestTemplate;

@Service
public class RestTemplateService {

    @Autowired
    private RestTemplate restTemplate;

    @Autowired
    private Tracer tracer;

    public String callDownstreamService() {
        String traceId = tracer.currentSpan().context().traceIdString();

        HttpHeaders headers = new HttpHeaders();
        headers.set("X-B3-TraceId", traceId); // 或者其他TraceId的header name，根据你的tracing系统

        HttpEntity<String> entity = new HttpEntity<>(headers);

        return restTemplate.exchange("http://downstream-service/api/data", HttpMethod.GET, entity, String.class).getBody();
    }
}

示例代码 (WebClient)：

import brave.Tracer;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;
import org.springframework.web.reactive.function.client.WebClient;
import reactor.core.publisher.Mono;

@Service
public class WebClientService {

    @Autowired
    private WebClient webClient;

    @Autowired
    private Tracer tracer;

    public Mono<String> callDownstreamService() {
        String traceId = tracer.currentSpan().context().traceIdString();

        return webClient.get()
                .uri("http://downstream-service/api/data")
                .header("X-B3-TraceId", traceId) // 或者其他TraceId的header name，根据你的tracing系统
                .retrieve()
                .bodyToMono(String.class);
    }
}

在这个例子中，我们首先从 Tracer 对象中获取当前的 TraceId，然后将其添加到 HTTP Header 中。请注意，你需要根据你使用的追踪系统（例如 Zipkin、Jaeger）来设置正确的 Header 名称。常见的 Header 名称包括 X-B3-TraceId、traceId 等。

3.3 使用拦截器统一处理

为了避免在每个 HTTP 调用中都手动添加 TraceId Header，你可以使用拦截器来统一处理。

示例代码 (RestTemplate 拦截器)：

import brave.Tracer;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.http.HttpRequest;
import org.springframework.http.client.ClientHttpRequestExecution;
import org.springframework.http.client.ClientHttpRequestInterceptor;
import org.springframework.http.client.ClientHttpResponse;
import org.springframework.stereotype.Component;

import java.io.IOException;

@Component
public class RestTemplateTraceIdInterceptor implements ClientHttpRequestInterceptor {

    @Autowired
    private Tracer tracer;

    @Override
    public ClientHttpResponse intercept(HttpRequest request, byte[] body, ClientHttpRequestExecution execution) throws IOException {
        if (tracer.currentSpan() != null) {
            String traceId = tracer.currentSpan().context().traceIdString();
            request.getHeaders().set("X-B3-TraceId", traceId); // 或者其他TraceId的header name
        }
        return execution.execute(request, body);
    }
}

然后，将拦截器添加到 RestTemplate 中：

import org.springframework.boot.web.client.RestTemplateBuilder;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.web.client.RestTemplate;

import java.util.Collections;

@Configuration
public class RestTemplateConfig {

    @Bean
    public RestTemplate restTemplate(RestTemplateBuilder builder, RestTemplateTraceIdInterceptor interceptor) {
        return builder.interceptors(Collections.singletonList(interceptor)).build();
    }
}

示例代码 (WebClient 拦截器)：

import brave.Tracer;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Component;
import org.springframework.web.reactive.function.client.ClientRequest;
import org.springframework.web.reactive.function.client.ClientResponse;
import org.springframework.web.reactive.function.client.ExchangeFilterFunction;
import reactor.core.publisher.Mono;

@Component
public class WebClientTraceIdFilter {

    @Autowired
    private Tracer tracer;

    public ExchangeFilterFunction traceIdFilter() {
        return ExchangeFilterFunction.ofRequestProcessor(clientRequest -> {
            if (tracer.currentSpan() != null) {
                String traceId = tracer.currentSpan().context().traceIdString();
                ClientRequest request = ClientRequest.from(clientRequest)
                        .header("X-B3-TraceId", traceId) // 或者其他TraceId的header name
                        .build();
                return Mono.just(request);
            }
            return Mono.just(clientRequest);
        });
    }
}

然后，将拦截器添加到 WebClient 中：

import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.web.reactive.function.client.WebClient;

@Configuration
public class WebClientConfig {

    @Bean
    public WebClient webClient(WebClientTraceIdFilter webClientTraceIdFilter) {
        return WebClient.builder()
                .filter(webClientTraceIdFilter.traceIdFilter())
                .build();
    }
}

4. 中间件配置错误的解决方案

某些中间件（例如负载均衡器、消息队列）可能没有正确配置，导致 TraceId 被丢弃。

4.1 负载均衡器

确保你的负载均衡器配置为转发所有必要的 HTTP Header，包括 TraceId Header。不同的负载均衡器有不同的配置方式，请参考相应的文档。例如，对于 Nginx，你需要配置 proxy_set_header 来传递 Header。

4.2 消息队列

如果使用消息队列进行服务间通信，确保消息队列配置为传递所有必要的 Header，包括 TraceId Header。例如，对于 RabbitMQ，你需要配置 messageProperties 来传递 Header。如果使用 Spring Cloud Stream，Sleuth 会自动处理消息队列的 Header 传递。

5. 日志框架配置不一致的解决方案

不同的服务使用的日志框架配置不一致，可能导致 TraceId 信息没有正确地被添加到日志中。

5.1 统一日志格式

确保所有服务使用相同的日志格式，并在日志格式中包含 TraceId。

示例代码 (logback.xml)：

<configuration>
    <appender name="CONSOLE" class="ch.qos.logback.core.ConsoleAppender">
        <encoder>
            <pattern>%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level %logger{36} - [%X{traceId:-},%X{spanId:-}] %msg%n</pattern>
        </encoder>
    </appender>

    <root level="INFO">
        <appender-ref ref="CONSOLE"/>
    </root>
</configuration>

在这个例子中，%X{traceId:-},%X{spanId:-} 表示从 MDC 中获取 TraceId 和 SpanId，如果 MDC 中不存在这些值，则使用 - 作为默认值。

5.2 使用 MDC (Mapped Diagnostic Context)

MDC 是一种在多线程环境中存储和访问上下文信息的机制。Spring Cloud Sleuth 会自动地将 TraceId 和 SpanId 添加到 MDC 中。你只需要在日志格式中引用 MDC 中的值，就可以将 TraceId 和 SpanId 添加到日志中。

示例代码：

import org.slf4j.MDC;
import org.springframework.stereotype.Service;

@Service
public class LogService {

    public void logMessage(String message) {
        // 通常情况下，Sleuth会自动将traceId放入MDC，如果不在，可以手动设置
        // MDC.put("traceId", "your_trace_id");  // 不推荐手动设置，除非确实需要

        System.out.println("Logging message: " + message);
    }
}

6. 代码逻辑错误的解决方案

在某些特殊情况下，代码逻辑可能错误地覆盖或清空了 TraceId。

6.1 仔细检查代码

仔细检查代码，特别是涉及线程切换、异步调用、HTTP 调用等地方，确保没有错误地覆盖或清空 TraceId。

6.2 使用单元测试

编写单元测试，模拟各种场景，验证 TraceId 是否被正确地传递。

7. 日志追踪最佳实践

除了解决 TraceId 丢失的问题，我们还需要构建一个健壮的日志追踪体系，以便更好地进行问题排查。

7.1 统一 TraceId 生成策略

为了保证 TraceId 的唯一性，建议使用统一的 TraceId 生成策略。Spring Cloud Sleuth 默认使用随机数生成 TraceId，你可以根据自己的需求自定义 TraceId 生成策略。

7.2 集中式日志管理

将所有服务的日志集中到一个地方进行管理，例如使用 ELK Stack（Elasticsearch, Logstash, Kibana）或 Splunk。这样可以方便地进行日志搜索、分析和可视化。

7.3 使用 Span Annotation

在关键代码段添加 Span Annotation，记录关键事件的发生时间。Span Annotation 可以帮助你更精确地定位问题。

示例代码：

import brave.Tracer;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;

@Service
public class BusinessService {

    @Autowired
    private Tracer tracer;

    public void doSomethingImportant() {
        tracer.annotate("start_important_task");

        // 执行一些重要的操作
        System.out.println("Doing something important...");

        tracer.annotate("end_important_task");
    }
}

7.4 监控和告警

建立完善的监控和告警机制，及时发现 TraceId 丢失的问题。例如，可以监控每个服务的请求数量、响应时间、错误率等指标，如果发现异常，及时告警。

8. 总结：保障TraceId传递，构建健全的日志追踪体系

TraceId 丢失是微服务架构中常见的问题，但通过合理的配置和编码，我们可以有效地避免这个问题。Spring Cloud Sleuth 提供了强大的日志追踪功能，可以帮助我们构建一个健壮的日志追踪体系。记住，选择合适的TraceId传递方式，统一日志格式，集中管理日志，并建立完善的监控和告警机制，是构建健全的日志追踪体系的关键。务必进行代码审查，确保TraceId在任何情况下的传递。