mirror of
https://github.com/yeasy/docker_practice.git
synced 2026-03-26 11:45:33 +00:00
Remove blank lines after code block markers
This commit is contained in:
@@ -48,7 +48,6 @@ scrape_configs:
|
||||
rule_files:
|
||||
- /etc/prometheus/rules.yml
|
||||
```
|
||||
|
||||
#### 2. 编写 Docker Compose 文件
|
||||
|
||||
创建 `compose.yaml` (或 `docker-compose.yml`):
|
||||
@@ -106,13 +105,11 @@ networks:
|
||||
volumes:
|
||||
prometheus_data:
|
||||
```
|
||||
|
||||
#### 3. 启动服务
|
||||
|
||||
```bash
|
||||
$ docker compose up -d
|
||||
```
|
||||
|
||||
启动后,访问以下地址:
|
||||
|
||||
* Prometheus: `http://localhost:9090`
|
||||
@@ -208,7 +205,6 @@ groups:
|
||||
summary: "磁盘可用空间不足"
|
||||
description: "Instance={{ $labels.instance }}, Mountpoint={{ $labels.mountpoint }}"
|
||||
```
|
||||
|
||||
说明:这里的规则是“可用空间低于 10%”的阈值告警,并非“未来 24 小时写满”的预测。生产环境建议针对特定文件系统与挂载点做更精确的过滤。
|
||||
|
||||
##### 2. 配置 Prometheus 加载规则并接入 Alertmanager
|
||||
@@ -224,7 +220,6 @@ alerting:
|
||||
- static_configs:
|
||||
- targets: ["alertmanager:9093"]
|
||||
```
|
||||
|
||||
并在 Compose 中挂载规则文件。
|
||||
|
||||
##### 3. 部署 Alertmanager
|
||||
@@ -240,7 +235,6 @@ receivers:
|
||||
webhook_configs:
|
||||
- url: http://example.com/webhook
|
||||
```
|
||||
|
||||
再在 `compose.yaml` 增加服务:
|
||||
|
||||
```yaml
|
||||
@@ -253,7 +247,6 @@ receivers:
|
||||
networks:
|
||||
- monitoring
|
||||
```
|
||||
|
||||
生产环境中,建议将告警发送到可追踪的渠道 (如 IM 机器人、事件平台、工单系统),并在告警中附带 Dashboard 链接与排障入口,避免告警成为噪声。
|
||||
|
||||
#### 建议的文件清单
|
||||
|
||||
@@ -71,7 +71,6 @@ volumes:
|
||||
networks:
|
||||
logging:
|
||||
```
|
||||
|
||||
#### 2. 配置 Fluentd
|
||||
|
||||
创建 `fluentd/conf/fluent.conf`:
|
||||
@@ -102,7 +101,6 @@ networks:
|
||||
</store>
|
||||
</match>
|
||||
```
|
||||
|
||||
#### 3. 配置应用容器使用 fluentd 驱动
|
||||
|
||||
启动一个测试容器,指定日志驱动为 `fluentd`:
|
||||
@@ -115,7 +113,6 @@ docker run -d \
|
||||
--name nginx-test \
|
||||
nginx
|
||||
```
|
||||
|
||||
**注意**:确保 `fluentd` 容器已经启动并监听在 `localhost:24224`。在生产环境中,如果你是在不同机器上,需要将 `localhost` 替换为运行 fluentd 的主机 IP。
|
||||
|
||||
#### 4. 在 Kibana 中查看日志
|
||||
|
||||
@@ -48,7 +48,6 @@
|
||||
**基本使用:**
|
||||
|
||||
```bash
|
||||
|
||||
# 实时监控所有运行中的容器
|
||||
docker stats
|
||||
|
||||
@@ -72,7 +71,6 @@ docker stats --format "table {{.Container}}\t{{.CPUPerc}}\t{{.MemUsage}}" --no-s
|
||||
# 导出为 JSON 格式用于日志记录
|
||||
docker stats --format json --no-stream > stats.json
|
||||
```
|
||||
|
||||
**在脚本中使用:**
|
||||
|
||||
```bash
|
||||
@@ -86,16 +84,13 @@ while true; do
|
||||
sleep 10
|
||||
done
|
||||
```
|
||||
|
||||
**性能指标解读:**
|
||||
|
||||
```bash
|
||||
|
||||
# CPU % 超过 80%:需要增加 CPU 限制或优化应用
|
||||
# MEM % 接近 100%:容器即将 OOM,需要增加内存或排查内存泄漏
|
||||
# 如果 NET I/O 中 dropped 为非零:网络拥塞或丢包
|
||||
```
|
||||
|
||||
### 19.3.3 cAdvisor 容器监控系统
|
||||
|
||||
cAdvisor 是 Google 开发的容器监控工具,提供比 `docker stats` 更详细的性能数据。
|
||||
@@ -125,7 +120,6 @@ networks:
|
||||
monitoring:
|
||||
driver: bridge
|
||||
```
|
||||
|
||||
启动后访问 `http://localhost:8080` 查看:
|
||||
- 容器性能统计
|
||||
- 系统资源使用情况
|
||||
@@ -134,7 +128,6 @@ networks:
|
||||
**从 cAdvisor 提取指标:**
|
||||
|
||||
```bash
|
||||
|
||||
# 获取所有容器的 JSON 格式性能数据
|
||||
curl http://localhost:8080/api/v1.3/machine | jq .
|
||||
|
||||
@@ -144,11 +137,9 @@ curl http://localhost:8080/api/v1.3/docker | jq '.docker | keys' | head -5
|
||||
# 获取容器统计信息
|
||||
curl http://localhost:8080/api/v1.3/docker/abc123/ | jq '.stats[-1]'
|
||||
```
|
||||
|
||||
**与 Prometheus 集成:**
|
||||
|
||||
```yaml
|
||||
|
||||
# prometheus.yml 配置
|
||||
global:
|
||||
scrape_interval: 15s
|
||||
@@ -159,7 +150,6 @@ scrape_configs:
|
||||
- targets: ['localhost:8080']
|
||||
metrics_path: '/metrics'
|
||||
```
|
||||
|
||||
### 19.3.4 Prometheus 容器监控配置
|
||||
|
||||
使用 Prometheus 和 node-exporter 进行长期的容器性能监控。
|
||||
@@ -234,7 +224,6 @@ networks:
|
||||
monitoring:
|
||||
driver: bridge
|
||||
```
|
||||
|
||||
**Prometheus 配置文件(prometheus.yml):**
|
||||
|
||||
```yaml
|
||||
@@ -259,11 +248,9 @@ scrape_configs:
|
||||
static_configs:
|
||||
- targets: ['localhost:9323']
|
||||
```
|
||||
|
||||
**常用的 Prometheus 查询(PromQL):**
|
||||
|
||||
```text
|
||||
|
||||
# 容器 CPU 使用百分比
|
||||
rate(container_cpu_usage_seconds_total[5m]) * 100
|
||||
|
||||
@@ -288,13 +275,11 @@ container_memory_cache_bytes / container_memory_usage_bytes
|
||||
# 按镜像统计容器数
|
||||
count(container_memory_usage_bytes) by (image)
|
||||
```
|
||||
|
||||
### 19.3.5 容器 OOM 排查与内存限制调优
|
||||
|
||||
#### OOM 问题诊断
|
||||
|
||||
```bash
|
||||
|
||||
# 检查容器是否因 OOM 被杀死
|
||||
docker inspect <container_id> | grep OOMKilled
|
||||
|
||||
@@ -308,7 +293,6 @@ docker logs <container_id> 2>&1 | grep -i "out of memory\|oom"
|
||||
dmesg | grep -i "oom\|kill"
|
||||
journalctl -u docker -n 100 | grep -i "oom"
|
||||
```
|
||||
|
||||
#### 内存泄漏检测
|
||||
|
||||
使用专项工具分析应用内存使用:
|
||||
@@ -316,7 +300,6 @@ journalctl -u docker -n 100 | grep -i "oom"
|
||||
**Python 应用内存泄漏检测:**
|
||||
|
||||
```python
|
||||
|
||||
# Dockerfile
|
||||
FROM python:3.11-slim
|
||||
WORKDIR /app
|
||||
@@ -326,9 +309,7 @@ RUN pip install -r requirements.txt memory_profiler tracemalloc
|
||||
COPY app.py .
|
||||
CMD ["python", "-m", "memory_profiler", "app.py"]
|
||||
```
|
||||
|
||||
```python
|
||||
|
||||
# app.py - 内存泄漏示例
|
||||
from memory_profiler import profile
|
||||
import tracemalloc
|
||||
@@ -351,11 +332,9 @@ current, peak = tracemalloc.get_traced_memory()
|
||||
print(f"Current: {current / 1024 / 1024:.2f} MB")
|
||||
print(f"Peak: {peak / 1024 / 1024:.2f} MB")
|
||||
```
|
||||
|
||||
**Java 应用内存分析:**
|
||||
|
||||
```bash
|
||||
|
||||
# 在容器中启用 JVM 远程调试
|
||||
docker run -e JAVA_OPTS="-Xmx512m -Xms256m -XX:+UseG1GC" \
|
||||
-p 5005:5005 \
|
||||
@@ -368,11 +347,9 @@ jstat -gc <pid> 1000 # 每秒采样一次
|
||||
# S0C S1C S0U S1U EC EU OC OU MC MU CCSC CCSU
|
||||
# 6144 6144 0 6144 39424 12288 149504 84320 50552 47689 6464 5989
|
||||
```
|
||||
|
||||
#### 内存限制最佳实践
|
||||
|
||||
```bash
|
||||
|
||||
# 为容器设置内存限制
|
||||
docker run -m 512m --memory-swap 1g myapp:latest
|
||||
|
||||
@@ -392,11 +369,9 @@ services:
|
||||
reservations:
|
||||
memory: 256M
|
||||
```
|
||||
|
||||
**内存超额提交(Memory Overcommit):**
|
||||
|
||||
```bash
|
||||
|
||||
# 在 Docker Compose 中区分限制和预留
|
||||
# limits:绝不能超过的最大值
|
||||
# reservations:Compose 排期时的参考值
|
||||
@@ -410,7 +385,6 @@ services:
|
||||
memory: 2G
|
||||
memory_reservation: 1G # 预留 1GB,允许突发到 2GB
|
||||
```
|
||||
|
||||
### 19.3.6 镜像体积优化与多阶段构建
|
||||
|
||||
#### 镜像体积分析工具
|
||||
@@ -418,7 +392,6 @@ services:
|
||||
**使用 dive 分析镜像层:**
|
||||
|
||||
```bash
|
||||
|
||||
# 安装 dive
|
||||
wget https://github.com/wagoodman/dive/releases/download/v0.11.0/dive_0.11.0_linux_amd64.deb
|
||||
sudo apt install ./dive_0.11.0_linux_amd64.deb
|
||||
@@ -428,11 +401,9 @@ dive myapp:latest
|
||||
|
||||
# 输出详细的分层信息,显示每一层的大小和内容
|
||||
```
|
||||
|
||||
**使用 Dockerfile 分析工具:**
|
||||
|
||||
```bash
|
||||
|
||||
# 安装 hadolint
|
||||
curl https://github.com/hadolint/hadolint/releases/download/v2.12.0/hadolint-Linux-x86_64 -L -o hadolint
|
||||
chmod +x hadolint
|
||||
@@ -440,13 +411,11 @@ chmod +x hadolint
|
||||
# 检查 Dockerfile 最佳实践
|
||||
./hadolint Dockerfile
|
||||
```
|
||||
|
||||
#### 多阶段构建最佳实践
|
||||
|
||||
**Go 应用的最小化镜像构建:**
|
||||
|
||||
```dockerfile
|
||||
|
||||
# Stage 1: 构建阶段
|
||||
FROM golang:1.20-alpine AS builder
|
||||
|
||||
@@ -479,11 +448,9 @@ ENTRYPOINT ["/app"]
|
||||
|
||||
# 最终镜像大小通常 < 15MB(相比 golang:1.20-alpine 的 ~1GB)
|
||||
```
|
||||
|
||||
**Node.js 应用的多阶段构建:**
|
||||
|
||||
```dockerfile
|
||||
|
||||
# Stage 1: 依赖安装
|
||||
FROM node:18-alpine AS dependencies
|
||||
|
||||
@@ -526,11 +493,9 @@ CMD ["node", "dist/index.js"]
|
||||
# 不优化:~500MB
|
||||
# 多阶段构建后:~120MB(减少 76%)
|
||||
```
|
||||
|
||||
**Python 应用的多阶段构建:**
|
||||
|
||||
```dockerfile
|
||||
|
||||
# Stage 1: 构建阶段
|
||||
FROM python:3.11-slim AS builder
|
||||
|
||||
@@ -563,11 +528,9 @@ EXPOSE 5000
|
||||
|
||||
CMD ["python", "app.py"]
|
||||
```
|
||||
|
||||
#### 镜像体积优化检查清单
|
||||
|
||||
```bash
|
||||
|
||||
# 检查清单
|
||||
□ 使用精简基础镜像(Alpine、Distroless)
|
||||
□ 清理包管理器缓存(apt-get clean、rm -rf /var/cache/*)
|
||||
@@ -595,7 +558,6 @@ RUN apt-get update && \
|
||||
apt-get clean && \
|
||||
rm -rf /var/lib/apt/lists/*
|
||||
```
|
||||
|
||||
### 19.3.7 常见性能问题及解决方案
|
||||
|
||||
**问题 1: 容器频繁被 OOM 杀死**
|
||||
@@ -603,7 +565,6 @@ RUN apt-get update && \
|
||||
症状:容器进程被无故杀死,exit code 137
|
||||
解决方案:
|
||||
```bash
|
||||
|
||||
# 增加内存限制
|
||||
docker update -m 1g <container_id>
|
||||
|
||||
@@ -616,13 +577,11 @@ docker stats <container_id>
|
||||
# 启用内存交换(作为最后手段)
|
||||
docker run -m 512m --memory-swap 1g myapp:latest
|
||||
```
|
||||
|
||||
**问题 2: CPU 被限流(CPU Throttling)**
|
||||
|
||||
症状:应用性能突然下降,但 CPU 使用率不高
|
||||
诊断:
|
||||
```bash
|
||||
|
||||
# 查看 CPU 限流统计
|
||||
docker exec <container_id> cat /sys/fs/cgroup/cpu/cpu.stat
|
||||
|
||||
@@ -630,12 +589,10 @@ docker exec <container_id> cat /sys/fs/cgroup/cpu/cpu.stat
|
||||
# 解决方案:增加 CPU 限制
|
||||
docker update --cpus 2 <container_id>
|
||||
```
|
||||
|
||||
**问题 3: 网络丢包或延迟高**
|
||||
|
||||
诊断:
|
||||
```bash
|
||||
|
||||
# 进入容器检查网络状态
|
||||
docker exec <container_id> ip -s link show
|
||||
|
||||
|
||||
Reference in New Issue
Block a user