深入分析前端的错误监控和性能监控系统，如何通过 JavaScript 捕获错误、收集指标并进行上报分析。 - 智猿学院-前后端，数据库，人工智能，云计算等领域前沿技术讲座

各位前端小可爱们，早上好！（或者下午好，晚上好，取决于你啥时候看到这篇讲座了）。今天咱们来聊聊前端的监控大保健——错误监控和性能监控。

监控嘛，就像给你的代码安排了私人医生，随时观察它的健康状况，一旦发现不对劲，立马报警。这样你才能及时抢救，避免你的用户体验一泻千里，直接投奔竞争对手的怀抱。

废话不多说，咱们直接上干货。

第一部分：错误监控——Bug 你无处遁形！

错误监控，顾名思义，就是盯着代码报错，把所有漏网之鱼都抓起来。前端错误主要分为两大类：

JavaScript 运行时错误： 这是最常见的，比如 undefined 属性访问、类型错误、函数未定义等等。
资源加载错误： 比如图片加载失败、CSS 文件加载失败、JS 文件加载失败等等。

1. JavaScript 运行时错误捕获

JavaScript 提供了 try...catch 语句来捕获同步代码的错误。但是，对于异步代码，try...catch 就有点力不从心了。

try...catch 的用法

try {
  // 可能会出错的代码
  console.log(a.b.c); // 模拟一个 undefined 错误
} catch (error) {
  // 捕获到错误后的处理
  console.error("Error caught:", error);
  // 在这里可以上报错误信息
}

window.onerror 全局捕获

window.onerror 是一个全局事件处理函数，可以捕获页面中未被 try...catch 捕获的 JavaScript 运行时错误。

window.onerror = function(message, source, lineno, colno, error) {
  console.error("Global error caught:", message, source, lineno, colno, error);
  // 在这里可以上报错误信息，例如：
  reportError({
    message: message,
    source: source,
    lineno: lineno,
    colno: colno,
    error: error
  });

  // 阻止默认错误处理行为（可选）
  return true; // 返回 true 可以阻止浏览器控制台显示错误信息
};

message: 错误信息。
source: 发生错误的脚本 URL。
lineno: 发生错误的行号。
colno: 发生错误的列号。
error: Error 对象本身（如果可用）。

window.addEventListener('error', ...)

window.addEventListener('error', ...) 用于捕获资源加载错误，比如图片加载失败。注意，onerror 无法捕获资源加载错误。

window.addEventListener('error', function(event) {
  const target = event.target || event.srcElement;
  if (target instanceof HTMLImageElement) {
    console.error("Image load error:", target.src);
    // 在这里可以上报错误信息，例如：
    reportError({
      type: "image_load_error",
      url: target.src
    });
  } else if (target instanceof HTMLLinkElement && target.rel === 'stylesheet') {
      console.error("CSS load error:", target.href);
      reportError({
          type: "css_load_error",
          url: target.href
      });
  } else if (target instanceof HTMLScriptElement) {
      console.error("Script load error:", target.src);
      reportError({
          type: "script_load_error",
          url: target.src
      });
  }
}, true); // 第三个参数必须是 true，表示在捕获阶段处理事件

event.target：发生错误的元素。
event.srcElement：同 event.target，兼容老版本浏览器。

unhandledrejection 事件

用于捕获 Promise 中未处理的 rejection。

window.addEventListener('unhandledrejection', function(event) {
  console.error("Unhandled promise rejection:", event.reason);
  // 在这里可以上报错误信息，例如：
  reportError({
    type: "unhandled_rejection",
    reason: event.reason
  });
});

event.reason：Promise 的 rejection 原因。

2. 错误信息上报

上面捕获到错误信息后，就需要把它们上报到服务器，以便进行分析和处理。上报方式有很多种：

XMLHttpRequest： 最传统的方式，兼容性好。

function reportError(errorInfo) {
  const url = "/api/report-error"; // 你的错误上报接口
  const xhr = new XMLHttpRequest();
  xhr.open("POST", url, true);
  xhr.setRequestHeader("Content-Type", "application/json");
  xhr.send(JSON.stringify(errorInfo));
}

fetch： 现代浏览器推荐的方式，语法更简洁。

async function reportError(errorInfo) {
  const url = "/api/report-error"; // 你的错误上报接口
  try {
    const response = await fetch(url, {
      method: "POST",
      headers: {
        "Content-Type": "application/json"
      },
      body: JSON.stringify(errorInfo)
    });
    if (!response.ok) {
      console.error("Error reporting failed:", response.status, response.statusText);
    }
  } catch (error) {
    console.error("Error reporting failed:", error);
  }
}

navigator.sendBeacon： 在页面卸载时发送数据，不会阻塞页面卸载，更可靠。

function reportError(errorInfo) {
  const url = "/api/report-error"; // 你的错误上报接口
  const data = JSON.stringify(errorInfo);
  navigator.sendBeacon(url, data);
}

3. 错误信息处理

上报的错误信息，通常需要进行一些处理，比如：

错误去重： 相同的错误，短时间内只上报一次，避免刷屏。
错误聚合： 将相似的错误聚合在一起，方便分析。
错误分析： 分析错误的类型、发生频率、影响范围等等，找出问题的根源。

第二部分：性能监控——让你的网站飞起来！

性能监控，就是关注网站的性能指标，比如加载速度、渲染速度、交互速度等等，找出性能瓶颈，并进行优化。

1. 性能指标收集

Navigation Timing API： 提供了一系列属性，可以用来计算页面的加载时间。

window.onload = function() {
  const performance = window.performance;
  if (!performance) {
    console.log("Performance API is not supported.");
    return;
  }

  const timing = performance.timing;
  const loadTime = timing.loadEventEnd - timing.navigationStart;
  console.log("Page load time:", loadTime, "ms");

  // 其他性能指标
  const dnsLookupTime = timing.domainLookupEnd - timing.domainLookupStart;
  const tcpConnectTime = timing.connectEnd - timing.connectStart;
  const requestTime = timing.responseEnd - timing.requestStart;
  const domReadyTime = timing.domComplete - timing.domLoading;

  reportPerformance({
    loadTime: loadTime,
    dnsLookupTime: dnsLookupTime,
    tcpConnectTime: tcpConnectTime,
    requestTime: requestTime,
    domReadyTime: domReadyTime
  });
};

navigationStart: 导航开始时间。
domLoading: DOM 开始加载时间。
domComplete: DOM 加载完成时间。
loadEventEnd: 页面加载完成时间。

Resource Timing API： 提供了页面中每个资源（图片、CSS、JS 等）的加载时间。

function getResourceTiming() {
  if (!window.performance || !window.performance.getEntriesByType) {
    console.log("Resource Timing API is not supported.");
    return [];
  }

  const resources = performance.getEntriesByType("resource");
  const resourceTimings = resources.map(resource => {
    return {
      name: resource.name,
      duration: resource.duration,
      initiatorType: resource.initiatorType,
      transferSize: resource.transferSize,
      encodedBodySize: resource.encodedBodySize,
      decodedBodySize: resource.decodedBodySize
    };
  });
  return resourceTimings;
}

window.onload = function() {
  const resourceTimings = getResourceTiming();
  console.log("Resource timings:", resourceTimings);
  reportPerformance({
    resourceTimings: resourceTimings
  });
};

name: 资源 URL。
duration: 加载时间。
initiatorType: 资源类型（img, script, css 等）。
transferSize: 资源传输大小。

Long Tasks API： 用于检测长时间运行的任务，这些任务可能会阻塞主线程，导致页面卡顿。

const observer = new PerformanceObserver((list) => {
  list.getEntries().forEach(entry => {
    console.log("Long task:", entry.name, entry.duration);
    reportPerformance({
      type: "long_task",
      name: entry.name,
      duration: entry.duration
    });
  });
});

observer.observe({ entryTypes: ['longtask'] });

entry.name: 任务名称。
entry.duration: 任务执行时间。

First Contentful Paint (FCP)： 首次内容绘制时间，表示浏览器首次绘制任何文本、图像、非白色画布或 SVG 的时间。

new PerformanceObserver((entryList) => {
    for (const entry of entryList.getEntries()) {
        console.log('FCP candidate:', entry.startTime, entry.size);
        reportPerformance({
            type: "fcp",
            startTime: entry.startTime,
            size: entry.size
        });
    }
}).observe({ type: 'paint', buffered: true });

Largest Contentful Paint (LCP)： 最大内容绘制时间，表示视口中最大的可见元素开始绘制的时间。

new PerformanceObserver((entryList) => {
    for (const entry of entryList.getEntries()) {
        console.log('LCP candidate:', entry.startTime, entry.size);
        reportPerformance({
            type: "lcp",
            startTime: entry.startTime,
            size: entry.size
        });
    }
}).observe({ type: 'largest-contentful-paint', buffered: true });

First Input Delay (FID)： 首次输入延迟，表示用户首次与页面交互（例如点击链接、按钮等）到浏览器响应交互的时间。这个需要使用 PerformanceObserver API，而且需要用户交互才能触发。

new PerformanceObserver((entryList) => {
    for (const entry of entryList.getEntries()) {
        const delay = entry.processingStart - entry.startTime;
        console.log('FID candidate:', delay, entry);
        reportPerformance({
            type: "fid",
            delay: delay
        });
    }
}).observe({ type: 'first-input', buffered: true });

Cumulative Layout Shift (CLS)： 累积布局偏移，测量页面上发生的意外布局偏移的总和。

let clsValue = 0;
new PerformanceObserver((entryList) => {
    for (const entry of entryList.getEntries()) {
        if (!entry.hadRecentInput) {
            clsValue += entry.value;
            console.log('CLS candidate:', clsValue, entry);
            reportPerformance({
                type: "cls",
                value: clsValue
            });
        }
    }
}).observe({ type: 'layout-shift', buffered: true });

2. 性能数据上报

和错误上报类似，性能数据也需要上报到服务器进行分析。可以使用 XMLHttpRequest、fetch 或 navigator.sendBeacon。

async function reportPerformance(performanceData) {
  const url = "/api/report-performance"; // 你的性能上报接口
  try {
    const response = await fetch(url, {
      method: "POST",
      headers: {
        "Content-Type": "application/json"
      },
      body: JSON.stringify(performanceData)
    });
    if (!response.ok) {
      console.error("Performance reporting failed:", response.status, response.statusText);
    }
  } catch (error) {
    console.error("Performance reporting failed:", error);
  }
}

3. 性能数据分析

上报的性能数据，需要进行分析，找出性能瓶颈，并进行优化。常见的分析方法包括：

趋势分析： 观察性能指标随时间的变化趋势，找出性能下降的时间段，并分析原因。
对比分析： 将不同版本、不同设备、不同地区的性能指标进行对比，找出性能差异，并分析原因。
Top N 分析： 找出性能最差的资源、页面、接口等等，优先进行优化。

第三部分：监控系统的架构设计

一个完善的前端监控系统，通常包括以下几个部分：

数据采集： 负责收集错误信息和性能数据。
数据传输： 负责将数据上报到服务器。
数据存储： 负责存储收集到的数据。
数据分析： 负责分析数据，生成报表和告警。
数据展示： 负责展示报表和告警信息。

可以用下图来简单表示：

+---------------------+      +---------------------+      +---------------------+
|  前端 (Data Source)  | ---> |  监控服务 (Backend) | ---> |  分析平台 (Dashboard)|
+---------------------+      +---------------------+      +---------------------+
      |                       |                       |
      |  错误、性能数据        |  数据接收、处理、存储    |  报表、告警、趋势分析    |
      |                       |                       |
      +---------------------+      +---------------------+      +---------------------+
      |                       |                       |
      |  JavaScript 脚本      |  Node.js, Python, Java   |  Grafana, Kibana, 自研    |
      +---------------------+      +---------------------+      +---------------------+

第四部分：一些最佳实践

使用 Source Map： 将压缩后的代码还原成原始代码，方便定位错误。
自定义错误信息： 在关键代码处，添加自定义错误信息，方便排查问题。
区分环境： 在不同环境下，使用不同的监控配置，避免影响生产环境。
用户行为关联： 将错误信息和用户行为关联起来，方便重现问题。

第五部分：第三方监控工具

除了自己搭建监控系统，还可以使用一些第三方监控工具，比如：

Sentry： 功能强大的错误监控平台，支持多种语言和框架。
Fundebug： 国内的错误监控平台，提供更贴心的服务。
阿里云 ARMS： 阿里云的应用实时监控服务，提供全方位的监控能力。
Google Analytics： 主要是做用户行为分析，也可以收集一些性能数据。

选择哪个工具，取决于你的需求和预算。

第六部分：总结

前端监控是一个持续的过程，需要不断地收集数据、分析数据、优化代码。希望今天的讲座能帮助你更好地了解前端监控，让你的网站更稳定、更流畅。

记住，监控不是目的，优化才是王道！如果你的代码就像一辆跑车，那监控就是你的仪表盘，时刻告诉你车子的状态，让你开得更稳、更快！

今天的分享就到这里，大家有什么问题吗？

发表回复 取消回复

发表回复取消回复