前端日志系统的构建：设计一个完整的日志收集、上报和分析系统，用于排查线上问题。 - 智猿学院-前后端，数据库，人工智能，云计算等领域前沿技术讲座

前端日志系统构建：从收集到分析，助力线上问题排查

大家好，今天我们来聊一聊前端日志系统的构建。作为前端工程师，我们经常会遇到线上问题，而有效的日志系统是排查问题的利器。一个完善的日志系统不仅能帮助我们快速定位错误，还能提供用户行为分析、性能监控等重要数据。本次分享将深入探讨前端日志系统的设计与实现，涵盖日志收集、上报和分析三个核心环节。

一、日志收集：捕获关键信息

日志收集是整个系统的基石。我们需要尽可能全面地收集对问题排查有价值的信息，同时也要注意避免过度收集导致性能下降。

1. 日志类型划分：

首先，我们需要对日志进行分类，便于后续的分析和处理。常见的日志类型包括：

日志类型	描述	示例
`info`	常规信息，用于记录系统运行状态、用户操作等。	"用户点击了按钮A"，"页面加载完成"
`warn`	警告信息，表示可能存在潜在问题，但不影响系统正常运行。	"使用了已弃用的API"，"图片加载失败"
`error`	错误信息，表示系统出现错误，可能影响部分功能或用户体验。	"网络请求失败"，"数据解析错误"
`debug`	调试信息，用于开发阶段的调试和问题排查。通常在生产环境中禁用。	"变量x的值为：123"，"执行到函数A"
`performance`	性能信息，用于记录页面加载时间、接口耗时等性能指标。	"页面加载耗时：200ms"，"接口A耗时：500ms"

2. 日志内容设计：

除了日志类型，日志内容也至关重要。一条好的日志应该包含以下信息：

时间戳： 精确到毫秒的时间戳，用于追踪事件发生的顺序。
日志级别： info、warn、error、debug、performance 等。
日志消息： 描述事件发生的具体内容。
错误堆栈： 如果是 error 日志，必须包含完整的错误堆栈信息，方便定位错误发生的位置。
用户信息： 用户ID、用户名等，用于区分不同用户的行为。
设备信息： 浏览器类型、操作系统、设备型号等，用于区分不同环境下的问题。
页面信息： 页面URL、页面标题等，用于区分不同页面的问题。
自定义字段： 根据业务需求添加的自定义字段，例如：订单ID、商品ID等。

3. 代码实现：

我们可以封装一个 Logger 类，用于统一管理日志的收集和上报。

class Logger {
  constructor(options = {}) {
    this.options = {
      level: 'info', // 默认日志级别
      prefix: '', // 日志前缀
      ...options,
    };
  }

  setLevel(level) {
    this.options.level = level;
  }

  setPrefix(prefix) {
    this.options.prefix = prefix;
  }

  _log(level, message, ...args) {
    if (this.shouldLog(level)) {
      const timestamp = new Date().toISOString();
      const logMessage = `[${timestamp}] [${level.toUpperCase()}] ${this.options.prefix} ${message}`;
      console[level](logMessage, ...args); // 使用console进行输出，方便调试
      // 这里可以添加上报逻辑，稍后会详细讲解
      this.reportLog(level, message, ...args); //调用上报方法
    }
  }

  shouldLog(level) {
    const levels = ['debug', 'info', 'warn', 'error'];
    return levels.indexOf(level) >= levels.indexOf(this.options.level);
  }

  info(message, ...args) {
    this._log('info', message, ...args);
  }

  warn(message, ...args) {
    this._log('warn', message, ...args);
  }

  error(message, ...args) {
    this._log('error', message, ...args);
  }

  debug(message, ...args) {
    this._log('debug', message, ...args);
  }

  performance(message, ...args) {
      this._log('performance', message, ...args);
  }

  reportLog(level, message, ...args) {
    // 收集更详细的信息
    const logData = {
      timestamp: new Date().toISOString(),
      level: level,
      message: message,
      prefix: this.options.prefix,
      url: window.location.href,
      userAgent: navigator.userAgent,
      // 可以添加更多信息，例如用户ID，设备信息等
    };
    // 将日志数据转换为字符串
    const logString = JSON.stringify(logData);

    // 发送日志到服务器 (使用 Beacon API)
    navigator.sendBeacon('/api/log', logString); // 这里假设有一个/api/log接口
  }
}

// 使用示例
const logger = new Logger({ level: 'debug', prefix: '[My App]' });

logger.info('页面加载完成');
logger.warn('使用了已弃用的API');
logger.error('网络请求失败', new Error('Network Error'));
logger.debug('变量x的值为：', 123);

try {
  // 模拟一个错误
  throw new Error('Something went wrong!');
} catch (error) {
  logger.error('捕获到错误', error);
}

4. 错误堆栈获取：

对于 error 日志，错误堆栈信息至关重要。我们可以使用 try...catch 语句捕获错误，并从 Error 对象中获取堆栈信息。

try {
  // 可能会出错的代码
  throw new Error('Something went wrong!');
} catch (error) {
  console.error('Error:', error);
  console.error('Stack:', error.stack);
}

5. 全局错误监听：

为了捕获未被 try...catch 语句捕获的全局错误，我们可以使用 window.onerror 事件。

window.onerror = function(message, source, lineno, colno, error) {
  console.error('Global Error:', message, source, lineno, colno, error);
  logger.error('Global Error:', message, source, lineno, colno, error); // 使用logger上报
  return true; // 阻止浏览器默认的错误处理
};

6. Promise 错误捕获：

对于 Promise 的错误，我们需要使用 .catch() 方法或者 unhandledrejection 事件进行捕获。

// 使用 .catch()
fetch('/api/data')
  .then(response => response.json())
  .catch(error => {
    console.error('Fetch Error:', error);
    logger.error('Fetch Error:', error); //使用logger上报
  });

// 使用 unhandledrejection 事件
window.addEventListener('unhandledrejection', function(event) {
  console.error('Unhandled Rejection:', event.reason);
  logger.error('Unhandled Rejection:', event.reason); //使用logger上报
  event.preventDefault(); // 阻止浏览器默认的错误处理
});

7. 性能监控：

可以使用 Performance API 收集页面加载时间、接口耗时等性能指标。

// 页面加载时间
window.addEventListener('load', () => {
  const loadTime = performance.timing.loadEventEnd - performance.timing.navigationStart;
  console.log('Page Load Time:', loadTime + 'ms');
  logger.performance('Page Load Time:', loadTime + 'ms'); //使用logger上报
});

// 接口耗时
const startTime = performance.now();
fetch('/api/data')
  .then(response => response.json())
  .then(data => {
    const endTime = performance.now();
    const duration = endTime - startTime;
    console.log('API Duration:', duration + 'ms');
    logger.performance('API Duration:', duration + 'ms'); //使用logger上报
  });

二、日志上报：高效稳定传输数据

日志上报是将收集到的日志数据发送到服务器的过程。我们需要选择合适的上报方式，保证数据的可靠性和效率。

1. 上报方式选择：

XMLHttpRequest： 传统的上报方式，兼容性好，但会阻塞页面渲染。
fetch： 基于 Promise 的 API，更简洁易用，但兼容性不如 XMLHttpRequest。
navigator.sendBeacon： 专门用于发送统计数据的 API，不会阻塞页面渲染，即使页面关闭也能发送数据。推荐使用。

2. 数据格式：

JSON： 常用的数据格式，易于解析和处理。
纯文本： 简单易用，但可读性较差。

3. 上报策略：

立即上报： 实时性高，但可能对性能产生影响。
批量上报： 将多条日志合并成一条请求发送，减少请求次数，提高效率。
延迟上报： 在页面空闲时上报，避免阻塞页面渲染。

4. 代码实现（使用 navigator.sendBeacon）：

// 在 Logger 类的 reportLog 方法中添加以下代码
reportLog(level, message, ...args) {
  // 收集更详细的信息
  const logData = {
    timestamp: new Date().toISOString(),
    level: level,
    message: message,
    prefix: this.options.prefix,
    url: window.location.href,
    userAgent: navigator.userAgent,
    // 可以添加更多信息，例如用户ID，设备信息等
  };
  // 将日志数据转换为字符串
  const logString = JSON.stringify(logData);

  // 发送日志到服务器 (使用 Beacon API)
  navigator.sendBeacon('/api/log', logString); // 这里假设有一个/api/log接口
}

5. 错误重试：

如果上报失败，可以尝试重试。可以设置最大重试次数和重试间隔。

// 上报失败后重试
function reportLogWithRetry(logData, maxRetries = 3, retryInterval = 1000) {
  let retries = 0;

  function tryReport() {
    navigator.sendBeacon('/api/log', JSON.stringify(logData)); // 这里假设有一个/api/log接口
    .then(response => {
      if (!response.ok) {
        throw new Error('Network response was not ok.');
      }
    })
    .catch(error => {
      retries++;
      if (retries <= maxRetries) {
        console.warn(`Log report failed, retrying in ${retryInterval}ms (attempt ${retries}/${maxRetries})`);
        setTimeout(tryReport, retryInterval);
      } else {
        console.error('Log report failed after multiple retries:', error);
      }
    });
  }

  tryReport();
}

6. 节流：

限制上报频率，避免对服务器造成过大的压力。可以使用节流函数来实现。

function throttle(func, delay) {
  let timeoutId;
  let lastExecTime = 0;

  return function(...args) {
    const currentTime = Date.now();

    if (!timeoutId) {
      if (currentTime - lastExecTime >= delay) {
        func.apply(this, args);
        lastExecTime = currentTime;
      } else {
        timeoutId = setTimeout(() => {
          func.apply(this, args);
          lastExecTime = Date.now();
          timeoutId = null;
        }, delay);
      }
    }
  };
}

// 使用节流
const throttledReportLog = throttle(this.reportLog.bind(this), 1000); // 每秒最多上报一次
throttledReportLog(level, message, ...args);

三、日志分析：挖掘数据背后的价值

日志分析是将收集到的日志数据进行处理和分析，从中发现问题和规律的过程。

1. 日志存储：

数据库： 适用于存储结构化数据，方便查询和分析。例如：MySQL、MongoDB。
日志服务： 专门用于存储和分析日志数据的云服务，例如：ELK Stack (Elasticsearch, Logstash, Kibana)、Splunk。

2. 数据清洗：

去除重复数据： 避免重复分析相同的数据。
过滤无效数据： 去除无关紧要的日志。
转换数据格式： 将数据转换为统一的格式，方便分析。

3. 分析方法：

统计分析： 统计不同类型的日志数量、错误发生频率等。
趋势分析： 分析日志数据随时间变化的趋势，例如：用户访问量、错误率。
关联分析： 分析不同日志之间的关联关系，例如：某个错误是否与特定用户或设备有关。
异常检测： 自动检测异常日志，例如：突然增多的错误日志。

4. 可视化：

将分析结果以图表的形式展示出来，更直观易懂。可以使用图表库，例如：ECharts、Chart.js。

5. 报警：

当检测到异常情况时，自动发送报警通知，例如：邮件、短信。

6. 代码示例 (Node.js 后端，简单的日志存储和分析)：

假设我们使用 MongoDB 存储日志。

// 安装 MongoDB Node.js 驱动程序
// npm install mongodb

const { MongoClient } = require('mongodb');

// MongoDB 连接字符串
const uri = 'mongodb://localhost:27017';
const client = new MongoClient(uri);

async function main() {
  try {
    // 连接到 MongoDB
    await client.connect();
    console.log('Connected successfully to server');

    const db = client.db('myproject');
    const logsCollection = db.collection('logs');

    // 模拟接收到前端日志
    const logData = {
      timestamp: new Date().toISOString(),
      level: 'error',
      message: '网络请求失败',
      url: 'https://example.com/page1',
      userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)',
    };

    // 插入日志到 MongoDB
    const insertResult = await logsCollection.insertOne(logData);
    console.log('Inserted log:', insertResult);

    // 查询错误日志
    const errorLogs = await logsCollection.find({ level: 'error' }).toArray();
    console.log('Error logs:', errorLogs);

    // 统计错误日志数量
    const errorCount = await logsCollection.countDocuments({ level: 'error' });
    console.log('Error count:', errorCount);

  } catch (e) {
    console.error(e);
  } finally {
    // 关闭连接
    await client.close();
  }
}

main().catch(console.error);

这个例子展示了如何将日志存储到 MongoDB 中，并进行简单的查询和统计。实际的日志分析系统会更加复杂，可能需要使用更强大的工具和技术，例如 Elasticsearch、Logstash、Kibana 等。

四、最佳实践

权限控制： 对日志数据进行权限控制，避免敏感信息泄露。
日志清理： 定期清理过期日志，避免占用过多存储空间。
监控： 监控日志系统的运行状态，确保其正常工作。
标准化： 统一日志格式和规范，方便分析和处理。
隐私保护： 在收集和处理日志数据时，遵守相关的隐私保护法规。
灰度发布: 新的日志系统或者修改在小部分用户上线，观察稳定后再全面推广

五、总结与未来方向

构建一个完善的前端日志系统是一个持续迭代的过程。通过对日志的收集、上报和分析，我们可以更好地了解用户行为，及时发现和解决线上问题，提升用户体验。

未来，我们可以进一步探索以下方向：

智能化分析： 使用机器学习算法自动分析日志数据，预测潜在问题。
实时监控： 实现实时日志监控和报警，更快地响应线上问题。
用户行为分析： 更深入地分析用户行为，为产品优化提供数据支持。

希望本次分享对大家有所帮助。谢谢！