mirror of
https://github.com/yeasy/docker_practice.git
synced 2026-03-11 12:21:17 +00:00
Add performance optimization
This commit is contained in:
638
19_observability/19.3_performance_optimization.md
Normal file
638
19_observability/19.3_performance_optimization.md
Normal file
@@ -0,0 +1,638 @@
|
||||
## 19.3 容器性能优化与故障诊断
|
||||
|
||||
容器的轻量级特性不代表性能问题会自动消失。在实际运维中,性能瓶颈可能来自 CPU 限制、内存溢出、磁盘 I/O、网络拥塞等多个层面。本节深入讨论容器性能监控、诊断方法和优化策略。
|
||||
|
||||
### 19.3.1 容器性能监控指标
|
||||
|
||||
#### 核心性能指标体系
|
||||
|
||||
容器性能监控涉及以下关键指标:
|
||||
|
||||
**CPU 相关指标:**
|
||||
- `cpu.usage_usec`:容器 CPU 使用时间(微秒)
|
||||
- `cpu.stat.nr_throttled`:CPU 限流发生次数
|
||||
- `cpu.stat.throttled_usec`:CPU 限流总时间
|
||||
- `cpu_percent`:CPU 使用百分比
|
||||
- `cpu_quota`:CPU 配额设置(微秒)
|
||||
|
||||
**内存相关指标:**
|
||||
- `memory.usage_bytes`:当前内存使用量
|
||||
- `memory.max_usage_bytes`:内存使用峰值
|
||||
- `memory.limit_in_bytes`:内存限制
|
||||
- `memory.fail_cnt`:OOM(Out of Memory)失败次数
|
||||
- `memory.stat.cache`:页面缓存占用
|
||||
- `memory.stat.rss`:实际内存占用(RSS)
|
||||
- `memory.stat.swap`:SWAP 使用量
|
||||
|
||||
**网络相关指标:**
|
||||
- `rx_bytes`:接收字节数
|
||||
- `tx_bytes`:发送字节数
|
||||
- `rx_packets`:接收包数
|
||||
- `tx_packets`:发送包数
|
||||
- `rx_errors`:接收错误数
|
||||
- `tx_errors`:发送错误数
|
||||
- `rx_dropped`:接收丢包数
|
||||
- `tx_dropped`:发送丢包数
|
||||
|
||||
**I/O 相关指标:**
|
||||
- `io_service_bytes`:I/O 操作字节数
|
||||
- `io_service_time`:I/O 操作耗时
|
||||
- `io_queued`:I/O 队列长度
|
||||
- `fs_limit_bytes`:文件系统限制
|
||||
- `fs_usage_bytes`:文件系统使用量
|
||||
|
||||
### 19.3.2 使用 docker stats 实时监控
|
||||
|
||||
`docker stats` 是最基础但强大的监控工具,提供实时的容器资源使用情况。
|
||||
|
||||
**基本使用:**
|
||||
|
||||
```bash
|
||||
# 实时监控所有运行中的容器
|
||||
docker stats
|
||||
|
||||
# 输出示例:
|
||||
# CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O
|
||||
# abc123def456 nginx 0.45% 24.3 MiB / 256 MiB 9.49% 1.2kB / 3.4kB 0 B / 0 B
|
||||
# def789ghi012 redis 0.23% 12.5 MiB / 512 MiB 2.44% 2.1kB / 1.5kB 0 B / 0 B
|
||||
|
||||
# 只监控特定容器
|
||||
docker stats nginx redis
|
||||
|
||||
# 一次性输出不进入交互模式
|
||||
docker stats --no-stream
|
||||
|
||||
# 指定刷新间隔(单位:秒,默认 1 秒)
|
||||
docker stats --no-stream --interval 2
|
||||
|
||||
# 格式化输出(使用 Go 模板)
|
||||
docker stats --format "table {{.Container}}\t{{.CPUPerc}}\t{{.MemUsage}}" --no-stream
|
||||
|
||||
# 导出为 JSON 格式用于日志记录
|
||||
docker stats --format json --no-stream > stats.json
|
||||
```
|
||||
|
||||
**在脚本中使用:**
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
|
||||
# 持续监控并记录到文件
|
||||
while true; do
|
||||
timestamp=$(date '+%Y-%m-%d %H:%M:%S')
|
||||
docker stats --no-stream --format "{{.Container}},{{.CPUPerc}},{{.MemUsage}}" | \
|
||||
awk -v ts="$timestamp" '{print ts","$0}' >> container_stats.log
|
||||
sleep 10
|
||||
done
|
||||
```
|
||||
|
||||
**性能指标解读:**
|
||||
|
||||
```bash
|
||||
# CPU % 超过 80%:需要增加 CPU 限制或优化应用
|
||||
# MEM % 接近 100%:容器即将 OOM,需要增加内存或排查内存泄漏
|
||||
# 如果 NET I/O 中 dropped 为非零:网络拥塞或丢包
|
||||
```
|
||||
|
||||
### 19.3.3 cAdvisor 容器监控系统
|
||||
|
||||
cAdvisor 是 Google 开发的容器监控工具,提供比 `docker stats` 更详细的性能数据。
|
||||
|
||||
**Docker Compose 部署 cAdvisor:**
|
||||
|
||||
```yaml
|
||||
version: '3.9'
|
||||
|
||||
services:
|
||||
cadvisor:
|
||||
image: gcr.io/cadvisor/cadvisor:v0.47.0
|
||||
container_name: cadvisor
|
||||
ports:
|
||||
- "8080:8080"
|
||||
volumes:
|
||||
- /:/rootfs:ro
|
||||
- /var/run:/var/run:ro
|
||||
- /sys:/sys:ro
|
||||
- /var/lib/docker/:/var/lib/docker:ro
|
||||
- /dev/disk/:/dev/disk:ro
|
||||
privileged: true
|
||||
devices:
|
||||
- /dev/kmsg
|
||||
networks:
|
||||
- monitoring
|
||||
|
||||
networks:
|
||||
monitoring:
|
||||
driver: bridge
|
||||
```
|
||||
|
||||
启动后访问 `http://localhost:8080` 查看:
|
||||
- 容器性能统计
|
||||
- 系统资源使用情况
|
||||
- 历史性能数据
|
||||
|
||||
**从 cAdvisor 提取指标:**
|
||||
|
||||
```bash
|
||||
# 获取所有容器的 JSON 格式性能数据
|
||||
curl http://localhost:8080/api/v1.3/machine | jq .
|
||||
|
||||
# 获取特定容器信息
|
||||
curl http://localhost:8080/api/v1.3/docker | jq '.docker | keys' | head -5
|
||||
|
||||
# 获取容器统计信息
|
||||
curl http://localhost:8080/api/v1.3/docker/abc123/ | jq '.stats[-1]'
|
||||
```
|
||||
|
||||
**与 Prometheus 集成:**
|
||||
|
||||
```yaml
|
||||
# prometheus.yml 配置
|
||||
global:
|
||||
scrape_interval: 15s
|
||||
|
||||
scrape_configs:
|
||||
- job_name: 'cadvisor'
|
||||
static_configs:
|
||||
- targets: ['localhost:8080']
|
||||
metrics_path: '/metrics'
|
||||
```
|
||||
|
||||
### 19.3.4 Prometheus 容器监控配置
|
||||
|
||||
使用 Prometheus 和 node-exporter 进行长期的容器性能监控。
|
||||
|
||||
**完整监控栈部署:**
|
||||
|
||||
```yaml
|
||||
version: '3.9'
|
||||
|
||||
services:
|
||||
prometheus:
|
||||
image: prom/prometheus:latest
|
||||
container_name: prometheus
|
||||
ports:
|
||||
- "9090:9090"
|
||||
volumes:
|
||||
- ./prometheus.yml:/etc/prometheus/prometheus.yml
|
||||
- prometheus_data:/prometheus
|
||||
command:
|
||||
- '--config.file=/etc/prometheus/prometheus.yml'
|
||||
- '--storage.tsdb.path=/prometheus'
|
||||
- '--storage.tsdb.retention.time=30d'
|
||||
networks:
|
||||
- monitoring
|
||||
|
||||
node-exporter:
|
||||
image: prom/node-exporter:latest
|
||||
container_name: node-exporter
|
||||
ports:
|
||||
- "9100:9100"
|
||||
volumes:
|
||||
- /proc:/host/proc:ro
|
||||
- /sys:/host/sys:ro
|
||||
- /:/rootfs:ro
|
||||
command:
|
||||
- '--path.procfs=/host/proc'
|
||||
- '--path.sysfs=/host/sys'
|
||||
- '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'
|
||||
networks:
|
||||
- monitoring
|
||||
|
||||
cadvisor:
|
||||
image: gcr.io/cadvisor/cadvisor:v0.47.0
|
||||
container_name: cadvisor
|
||||
ports:
|
||||
- "8080:8080"
|
||||
volumes:
|
||||
- /:/rootfs:ro
|
||||
- /var/run:/var/run:ro
|
||||
- /sys:/sys:ro
|
||||
- /var/lib/docker/:/var/lib/docker:ro
|
||||
privileged: true
|
||||
networks:
|
||||
- monitoring
|
||||
|
||||
grafana:
|
||||
image: grafana/grafana:latest
|
||||
container_name: grafana
|
||||
ports:
|
||||
- "3000:3000"
|
||||
environment:
|
||||
- GF_SECURITY_ADMIN_PASSWORD=admin
|
||||
- GF_INSTALL_PLUGINS=grafana-piechart-panel
|
||||
volumes:
|
||||
- grafana_data:/var/lib/grafana
|
||||
networks:
|
||||
- monitoring
|
||||
|
||||
volumes:
|
||||
prometheus_data:
|
||||
grafana_data:
|
||||
|
||||
networks:
|
||||
monitoring:
|
||||
driver: bridge
|
||||
```
|
||||
|
||||
**Prometheus 配置文件(prometheus.yml):**
|
||||
|
||||
```yaml
|
||||
global:
|
||||
scrape_interval: 15s
|
||||
evaluation_interval: 15s
|
||||
|
||||
scrape_configs:
|
||||
- job_name: 'prometheus'
|
||||
static_configs:
|
||||
- targets: ['localhost:9090']
|
||||
|
||||
- job_name: 'node-exporter'
|
||||
static_configs:
|
||||
- targets: ['node-exporter:9100']
|
||||
|
||||
- job_name: 'cadvisor'
|
||||
static_configs:
|
||||
- targets: ['cadvisor:8080']
|
||||
|
||||
- job_name: 'docker'
|
||||
static_configs:
|
||||
- targets: ['localhost:9323']
|
||||
```
|
||||
|
||||
**常用的 Prometheus 查询(PromQL):**
|
||||
|
||||
```promql
|
||||
# 容器 CPU 使用百分比
|
||||
rate(container_cpu_usage_seconds_total[5m]) * 100
|
||||
|
||||
# 容器内存使用百分比
|
||||
(container_memory_usage_bytes / container_spec_memory_limit_bytes) * 100
|
||||
|
||||
# 容器网络入站流量(MB/s)
|
||||
rate(container_network_receive_bytes_total[5m]) / 1024 / 1024
|
||||
|
||||
# 容器网络出站流量(MB/s)
|
||||
rate(container_network_transmit_bytes_total[5m]) / 1024 / 1024
|
||||
|
||||
# 容器磁盘读取速率(MB/s)
|
||||
rate(container_fs_io_current[5m]) / 1024 / 1024
|
||||
|
||||
# CPU 限流情况
|
||||
rate(container_cpu_cfs_throttled_seconds_total[5m])
|
||||
|
||||
# 内存缓存占比
|
||||
container_memory_cache_bytes / container_memory_usage_bytes
|
||||
|
||||
# 按镜像统计容器数
|
||||
count(container_memory_usage_bytes) by (image)
|
||||
```
|
||||
|
||||
### 19.3.5 容器 OOM 排查与内存限制调优
|
||||
|
||||
#### OOM 问题诊断
|
||||
|
||||
```bash
|
||||
# 检查容器是否因 OOM 被杀死
|
||||
docker inspect <container_id> | grep OOMKilled
|
||||
|
||||
# 查看容器退出码:137 表示被 OOM 杀死
|
||||
docker ps -a --format "{{.ID}}\t{{.Status}}" | grep "137"
|
||||
|
||||
# 查看容器日志中的 OOM 信息
|
||||
docker logs <container_id> 2>&1 | grep -i "out of memory\|oom"
|
||||
|
||||
# 从宿主机日志查看 OOM 事件
|
||||
dmesg | grep -i "oom\|kill"
|
||||
journalctl -u docker -n 100 | grep -i "oom"
|
||||
```
|
||||
|
||||
#### 内存泄漏检测
|
||||
|
||||
使用专项工具分析应用内存使用:
|
||||
|
||||
**Python 应用内存泄漏检测:**
|
||||
|
||||
```python
|
||||
# Dockerfile
|
||||
FROM python:3.11-slim
|
||||
WORKDIR /app
|
||||
COPY requirements.txt .
|
||||
RUN pip install -r requirements.txt memory_profiler tracemalloc
|
||||
|
||||
COPY app.py .
|
||||
CMD ["python", "-m", "memory_profiler", "app.py"]
|
||||
```
|
||||
|
||||
```python
|
||||
# app.py - 内存泄漏示例
|
||||
from memory_profiler import profile
|
||||
import tracemalloc
|
||||
|
||||
@profile
|
||||
def memory_leak():
|
||||
# 不断创建未释放的列表
|
||||
data = []
|
||||
while True:
|
||||
data.append([0] * 1000000)
|
||||
print(f"List size: {len(data)}")
|
||||
|
||||
# 使用 tracemalloc 跟踪内存分配
|
||||
tracemalloc.start()
|
||||
|
||||
# 执行可能泄漏的代码
|
||||
# ...
|
||||
|
||||
current, peak = tracemalloc.get_traced_memory()
|
||||
print(f"Current: {current / 1024 / 1024:.2f} MB")
|
||||
print(f"Peak: {peak / 1024 / 1024:.2f} MB")
|
||||
```
|
||||
|
||||
**Java 应用内存分析:**
|
||||
|
||||
```bash
|
||||
# 在容器中启用 JVM 远程调试
|
||||
docker run -e JAVA_OPTS="-Xmx512m -Xms256m -XX:+UseG1GC" \
|
||||
-p 5005:5005 \
|
||||
myapp:latest
|
||||
|
||||
# 使用 jstat 检查垃圾回收情况
|
||||
jstat -gc <pid> 1000 # 每秒采样一次
|
||||
|
||||
# 输出示例:
|
||||
# S0C S1C S0U S1U EC EU OC OU MC MU CCSC CCSU
|
||||
# 6144 6144 0 6144 39424 12288 149504 84320 50552 47689 6464 5989
|
||||
```
|
||||
|
||||
#### 内存限制最佳实践
|
||||
|
||||
```bash
|
||||
# 为容器设置内存限制
|
||||
docker run -m 512m --memory-swap 1g myapp:latest
|
||||
|
||||
# 参数说明:
|
||||
# -m / --memory:内存限制(这里是 512MB)
|
||||
# --memory-swap:内存+SWAP 总额(这里是 1GB,意味着 SWAP 为 512MB)
|
||||
# 如果不设置 --memory-swap,则等于 --memory 值
|
||||
|
||||
# Docker Compose 配置
|
||||
version: '3.9'
|
||||
services:
|
||||
app:
|
||||
image: myapp:latest
|
||||
deploy:
|
||||
resources:
|
||||
limits:
|
||||
memory: 512M
|
||||
reservations:
|
||||
memory: 256M
|
||||
```
|
||||
|
||||
**内存超额提交(Memory Overcommit):**
|
||||
|
||||
```bash
|
||||
# 在 Docker Compose 中区分限制和预留
|
||||
# limits:绝不能超过的最大值
|
||||
# reservations:Compose 排期时的参考值
|
||||
|
||||
version: '3.9'
|
||||
services:
|
||||
web:
|
||||
memory: 512M # 限制
|
||||
memswap_limit: 1G # SWAP 限制
|
||||
|
||||
db:
|
||||
memory: 2G
|
||||
memory_reservation: 1G # 预留 1GB,允许突发到 2GB
|
||||
```
|
||||
|
||||
### 19.3.6 镜像体积优化与多阶段构建
|
||||
|
||||
#### 镜像体积分析工具
|
||||
|
||||
**使用 dive 分析镜像层:**
|
||||
|
||||
```bash
|
||||
# 安装 dive
|
||||
wget https://github.com/wagoodman/dive/releases/download/v0.11.0/dive_0.11.0_linux_amd64.deb
|
||||
sudo apt install ./dive_0.11.0_linux_amd64.deb
|
||||
|
||||
# 分析镜像
|
||||
dive myapp:latest
|
||||
|
||||
# 输出详细的分层信息,显示每一层的大小和内容
|
||||
```
|
||||
|
||||
**使用 Dockerfile 分析工具:**
|
||||
|
||||
```bash
|
||||
# 安装 hadolint
|
||||
curl https://github.com/hadolint/hadolint/releases/download/v2.12.0/hadolint-Linux-x86_64 -L -o hadolint
|
||||
chmod +x hadolint
|
||||
|
||||
# 检查 Dockerfile 最佳实践
|
||||
./hadolint Dockerfile
|
||||
```
|
||||
|
||||
#### 多阶段构建最佳实践
|
||||
|
||||
**Go 应用的最小化镜像构建:**
|
||||
|
||||
```dockerfile
|
||||
# Stage 1: 构建阶段
|
||||
FROM golang:1.20-alpine AS builder
|
||||
|
||||
WORKDIR /build
|
||||
|
||||
# 安装依赖
|
||||
RUN apk add --no-cache git ca-certificates tzdata
|
||||
|
||||
COPY go.mod go.sum ./
|
||||
RUN go mod download
|
||||
|
||||
COPY . .
|
||||
|
||||
# 构建静态二进制(支持 scratch 基础镜像)
|
||||
RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build \
|
||||
-a -installsuffix cgo \
|
||||
-ldflags="-w -s" \
|
||||
-o app .
|
||||
|
||||
# Stage 2: 运行阶段
|
||||
FROM scratch
|
||||
|
||||
# 从 builder 复制必要的文件
|
||||
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/
|
||||
COPY --from=builder /usr/share/zoneinfo /usr/share/zoneinfo
|
||||
COPY --from=builder /build/app /app
|
||||
|
||||
EXPOSE 8080
|
||||
ENTRYPOINT ["/app"]
|
||||
|
||||
# 最终镜像大小通常 < 15MB(相比 golang:1.20-alpine 的 ~1GB)
|
||||
```
|
||||
|
||||
**Node.js 应用的多阶段构建:**
|
||||
|
||||
```dockerfile
|
||||
# Stage 1: 依赖安装
|
||||
FROM node:18-alpine AS dependencies
|
||||
|
||||
WORKDIR /app
|
||||
COPY package*.json ./
|
||||
RUN npm ci --only=production && \
|
||||
npm cache clean --force
|
||||
|
||||
# Stage 2: 构建阶段
|
||||
FROM node:18-alpine AS builder
|
||||
|
||||
WORKDIR /app
|
||||
COPY package*.json ./
|
||||
RUN npm ci
|
||||
|
||||
COPY . .
|
||||
RUN npm run build
|
||||
|
||||
# Stage 3: 运行阶段
|
||||
FROM node:18-alpine
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
# 从依赖阶段复制 node_modules
|
||||
COPY --from=dependencies /app/node_modules ./node_modules
|
||||
# 从构建阶段复制构建产物
|
||||
COPY --from=builder /app/dist ./dist
|
||||
COPY --from=builder /app/package*.json ./
|
||||
|
||||
# 删除开发依赖和不必要的文件
|
||||
RUN rm -rf src tests *.config.js
|
||||
|
||||
USER node
|
||||
EXPOSE 3000
|
||||
|
||||
CMD ["node", "dist/index.js"]
|
||||
|
||||
# 镜像大小对比:
|
||||
# 不优化:~500MB
|
||||
# 多阶段构建后:~120MB(减少 76%)
|
||||
```
|
||||
|
||||
**Python 应用的多阶段构建:**
|
||||
|
||||
```dockerfile
|
||||
# Stage 1: 构建阶段
|
||||
FROM python:3.11-slim AS builder
|
||||
|
||||
WORKDIR /build
|
||||
|
||||
RUN apt-get update && apt-get install -y --no-install-recommends \
|
||||
build-essential \
|
||||
&& rm -rf /var/apt/lists/*
|
||||
|
||||
COPY requirements.txt .
|
||||
RUN pip install --user --no-cache-dir -r requirements.txt
|
||||
|
||||
# Stage 2: 运行阶段
|
||||
FROM python:3.11-slim
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
# 从 builder 复制虚拟环境
|
||||
COPY --from=builder /root/.local /root/.local
|
||||
|
||||
# 设置 PATH
|
||||
ENV PATH=/root/.local/bin:$PATH \
|
||||
PYTHONUNBUFFERED=1 \
|
||||
PYTHONDONTWRITEBYTECODE=1
|
||||
|
||||
COPY . .
|
||||
|
||||
USER nobody
|
||||
EXPOSE 5000
|
||||
|
||||
CMD ["python", "app.py"]
|
||||
```
|
||||
|
||||
#### 镜像体积优化检查清单
|
||||
|
||||
```bash
|
||||
# 检查清单
|
||||
□ 使用精简基础镜像(Alpine、Distroless)
|
||||
□ 清理包管理器缓存(apt-get clean、rm -rf /var/cache/*)
|
||||
□ 在同一 RUN 指令中安装和清理依赖
|
||||
□ 使用 .dockerignore 排除不必要的文件
|
||||
□ 多阶段构建避免构建依赖污染最终镜像
|
||||
□ 去除调试符号:-ldflags="-w -s"(Go)、strip 命令(C/C++)
|
||||
□ 压缩静态资源和应用文件
|
||||
□ 使用 BuildKit 缓存优化加速构建
|
||||
|
||||
# 优化示例:
|
||||
FROM ubuntu:22.04
|
||||
|
||||
# ❌ 不推荐
|
||||
RUN apt-get update
|
||||
RUN apt-get install -y curl wget git
|
||||
RUN apt-get clean
|
||||
|
||||
# ✓ 推荐
|
||||
RUN apt-get update && \
|
||||
apt-get install -y --no-install-recommends \
|
||||
curl \
|
||||
wget \
|
||||
git && \
|
||||
apt-get clean && \
|
||||
rm -rf /var/lib/apt/lists/*
|
||||
```
|
||||
|
||||
### 19.3.7 常见性能问题及解决方案
|
||||
|
||||
**问题 1: 容器频繁被 OOM 杀死**
|
||||
|
||||
症状:容器进程被无故杀死,exit code 137
|
||||
解决方案:
|
||||
```bash
|
||||
# 增加内存限制
|
||||
docker update -m 1g <container_id>
|
||||
|
||||
# 排查内存泄漏
|
||||
docker exec <container_id> ps aux | grep -E "VSZ|RSS"
|
||||
|
||||
# 使用 docker stats 实时监控
|
||||
docker stats <container_id>
|
||||
|
||||
# 启用内存交换(作为最后手段)
|
||||
docker run -m 512m --memory-swap 1g myapp:latest
|
||||
```
|
||||
|
||||
**问题 2: CPU 被限流(CPU Throttling)**
|
||||
|
||||
症状:应用性能突然下降,但 CPU 使用率不高
|
||||
诊断:
|
||||
```bash
|
||||
# 查看 CPU 限流统计
|
||||
docker exec <container_id> cat /sys/fs/cgroup/cpu/cpu.stat
|
||||
|
||||
# 如果 throttled_time > 0,说明发生了 CPU 限流
|
||||
# 解决方案:增加 CPU 限制
|
||||
docker update --cpus 2 <container_id>
|
||||
```
|
||||
|
||||
**问题 3: 网络丢包或延迟高**
|
||||
|
||||
诊断:
|
||||
```bash
|
||||
# 进入容器检查网络状态
|
||||
docker exec <container_id> ip -s link show
|
||||
|
||||
# 检查路由和 DNS
|
||||
docker exec <container_id> cat /etc/resolv.conf
|
||||
|
||||
# 测试网络延迟
|
||||
docker exec <container_id> ping 8.8.8.8
|
||||
|
||||
# 检查容器网络驱动
|
||||
docker inspect <container_id> | grep -A 10 NetworkSettings
|
||||
|
||||
# 解决方案:更换网络驱动或调整 MTU
|
||||
docker run --net=host myapp:latest # 使用宿主机网络(性能最佳)
|
||||
```
|
||||
@@ -9,9 +9,20 @@
|
||||
- **容器监控**:以 Prometheus 为主,讲解如何采集和展示容器性能指标。
|
||||
- **日志管理**:以 ELK (Elasticsearch, Logstash, Kibana) 套件为例,介绍集中式日志收集平台。
|
||||
|
||||
为了让读者能够在生产环境中真正用起来,本章会补齐以下“最小闭环”:
|
||||
为了让读者能够在生产环境中真正用起来,本章会补齐以下”最小闭环”:
|
||||
|
||||
* 关键指标与日志的验证方法
|
||||
* 常见故障排查路径
|
||||
* 最小告警闭环 (Prometheus -> Alertmanager -> 接收端)
|
||||
* 日志容量治理的最小实践
|
||||
|
||||
## 本章内容
|
||||
|
||||
* [Prometheus 监控](19.1_prometheus.md)
|
||||
* 容器监控基础、指标采集与告警配置。
|
||||
|
||||
* [ELK 日志管理](19.2_elk.md)
|
||||
* 集中式日志收集、存储与检索。
|
||||
|
||||
* [性能优化](19.3_performance_optimization.md)
|
||||
* 容器和应用性能优化实践。
|
||||
|
||||
Reference in New Issue
Block a user