geo/docs/02-模块说明/sensitive-words.md

# 平台敏感词库

## 概述

本文档描述各平台的敏感词分类和配置。

## 敏感词分类

### SENSITIVE_WORDS

位置：`backend/app/services/content/sensitive_filter.py`

```python
SENSITIVE_WORDS = {
    "politics": [
        "台湾", "西藏", "新疆", "香港", "澳门",
        "分裂", "独立", "抗议", "游行", "示威",
        "政治", "敏感词",
    ],
    "medical": [
        "药品", "治疗", "疗效", "治愈",
        "处方", "医生", "医院", "手术",
        "医疗", "敏感词",
    ],
    "finance": [
        "投资", "理财", "收益率", "回报",
        "股票", "基金", "债券", "期货",
    ],
    "adult": [
        "色情", "赌博", "毒品", "暴力",
    ],
}
```

### SENSITIVE_CATEGORIES

位置：`backend/app/services/distribution/platform_rules.py`

```python
SENSITIVE_CATEGORIES = {
    "politics": ["政治敏感词库"],
    "medical": ["医疗敏感词库"],
    "finance": ["金融敏感词库"],
    "adult": ["低俗敏感词库"],
}
```

## 平台敏感词配置

### 知乎 (zhihu)

```python
"sensitive_words": {
    "check_required": True,
    "categories": ["politics", "medical", "finance", "adult"],
    "max_tolerance": 0,   # 零容忍
    "auto_filter": True,
}
```

### 微信公众号 (wechat)

```python
"sensitive_words": {
    "check_required": True,
    "categories": ["politics", "medical", "finance", "adult"],
    "max_tolerance": 0,
    "auto_filter": True,
}
```

### 百家号 (baijiahao)

```python
"sensitive_words": {
    "check_required": True,
    "categories": ["politics", "medical", "finance", "adult"],
    "max_tolerance": 0,
    "auto_filter": True,
}
```

### 今日头条 (toutiao)

```python
"sensitive_words": {
    "check_required": True,
    "categories": ["politics", "medical", "finance", "adult"],
    "max_tolerance": 0,
    "auto_filter": True,
}
```

### 微博 (weibo)

```python
"sensitive_words": {
    "check_required": True,
    "categories": ["politics", "adult"],  # 仅检查政治和低俗
    "max_tolerance": 2,  # 允许少量出现
    "auto_filter": True,
}
```

### 小红书 (xiaohongshu)

```python
"sensitive_words": {
    "check_required": True,
    "categories": ["adult"],  # 仅检查低俗内容
    "max_tolerance": 0,
    "auto_filter": True,
}
```

### B站 (bilibili)

```python
"sensitive_words": {
    "check_required": True,
    "categories": ["politics", "adult"],
    "max_tolerance": 0,
    "auto_filter": True,
}
```

### 简书 (jianshu)

```python
"sensitive_words": {
    "check_required": True,
    "categories": ["politics", "adult"],
    "max_tolerance": 0,
    "auto_filter": True,
}
```

### 掘金 (juejin)

```python
"sensitive_words": {
    "check_required": True,
    "categories": ["politics"],  # 仅检查政治内容
    "max_tolerance": 0,
    "auto_filter": True,
}
```

### 抖音 (douyin)

```python
"sensitive_words": {
    "check_required": True,
    "categories": ["politics", "adult"],
    "max_tolerance": 0,
    "auto_filter": True,
}
```

## 过滤机制

### SensitiveFilter

位置：`backend/app/services/content/sensitive_filter.py`

```python
class SensitiveFilter:
    def filter(self, content: str, platform: str) -> FilterResult:
        """
        过滤敏感词
        1. 获取平台敏感词配置
        2. 合并基础词库和自定义词库
        3. 逐个检查并替换
        """
```

### 过滤结果

```python
@dataclass
class FilterResult:
    filtered_content: str      # 过滤后的内容
    found_words: list         # 发现的敏感词列表
    replacements: dict        # 替换映射
```

### 替换规则

- 敏感词被替换为 `*` 字符
- 替换字符数与原词长度相同

## AI写作特征

### AI_PATTERNS

位置：`backend/app/services/distribution/platform_rules.py`

```python
AI_PATTERNS = {
    "banned_transitions": [
        "总之", "综上所述", "值得注意的是", "让我们",
        "总而言之", "不可否认", "毋庸置疑",
        "首先", "其次", "最后", "最后但同样重要",
        "换句话说", "也就是说", "更重要的是", "可以说",
    ],
    "banned_modifiers": [
        "至关重要", "不可或缺", "举足轻重", "蓬勃发展",
        "日新月异", "深远影响", "全面提升", "显著成效",
        "重大突破", "核心要素",
    ],
    "banned_structures": [
        r"第一[，、].*第二[，、].*第三",  # 对称三段式
        r"一方面[，、].*另一方面",
    ],
    "safe_patterns": [
        "根据研究表明", "调研数据显示", "经验告诉我们",
        "事实上", "说白了", "说实话", "说真的",
    ],
}
```

## 平台AI敏感度

| 平台 | 检测级别 | humanization_required |
|------|----------|---------------------|
| 知乎 | high | true |
| 微信 | medium | true |
| 百家号 | high | true |
| 头条 | high | true |
| 微博 | low | false |
| 小红书 | low | false |
| B站 | medium | true |
| 简书 | medium | true |
| 掘金 | high | true |
| 抖音 | low | false |

## 自定义敏感词

支持添加自定义敏感词：

```python
filter = SensitiveFilter()
filter.add_custom_words("custom_category", ["词1", "词2", "词3"])
```

## 检测流程

```
1. 获取平台配置的敏感词分类
2. 合并基础敏感词库
3. 添加自定义敏感词
4. 遍历检测
5. 替换并记录
6. 返回过滤结果
```