5.3 KiB
5.3 KiB
平台敏感词库
概述
本文档描述各平台的敏感词分类和配置。
敏感词分类
SENSITIVE_WORDS
位置:backend/app/services/content/sensitive_filter.py
SENSITIVE_WORDS = {
"politics": [
"台湾", "西藏", "新疆", "香港", "澳门",
"分裂", "独立", "抗议", "游行", "示威",
"政治", "敏感词",
],
"medical": [
"药品", "治疗", "疗效", "治愈",
"处方", "医生", "医院", "手术",
"医疗", "敏感词",
],
"finance": [
"投资", "理财", "收益率", "回报",
"股票", "基金", "债券", "期货",
],
"adult": [
"色情", "赌博", "毒品", "暴力",
],
}
SENSITIVE_CATEGORIES
位置:backend/app/services/distribution/platform_rules.py
SENSITIVE_CATEGORIES = {
"politics": ["政治敏感词库"],
"medical": ["医疗敏感词库"],
"finance": ["金融敏感词库"],
"adult": ["低俗敏感词库"],
}
平台敏感词配置
知乎 (zhihu)
"sensitive_words": {
"check_required": True,
"categories": ["politics", "medical", "finance", "adult"],
"max_tolerance": 0, # 零容忍
"auto_filter": True,
}
微信公众号 (wechat)
"sensitive_words": {
"check_required": True,
"categories": ["politics", "medical", "finance", "adult"],
"max_tolerance": 0,
"auto_filter": True,
}
百家号 (baijiahao)
"sensitive_words": {
"check_required": True,
"categories": ["politics", "medical", "finance", "adult"],
"max_tolerance": 0,
"auto_filter": True,
}
今日头条 (toutiao)
"sensitive_words": {
"check_required": True,
"categories": ["politics", "medical", "finance", "adult"],
"max_tolerance": 0,
"auto_filter": True,
}
微博 (weibo)
"sensitive_words": {
"check_required": True,
"categories": ["politics", "adult"], # 仅检查政治和低俗
"max_tolerance": 2, # 允许少量出现
"auto_filter": True,
}
小红书 (xiaohongshu)
"sensitive_words": {
"check_required": True,
"categories": ["adult"], # 仅检查低俗内容
"max_tolerance": 0,
"auto_filter": True,
}
B站 (bilibili)
"sensitive_words": {
"check_required": True,
"categories": ["politics", "adult"],
"max_tolerance": 0,
"auto_filter": True,
}
简书 (jianshu)
"sensitive_words": {
"check_required": True,
"categories": ["politics", "adult"],
"max_tolerance": 0,
"auto_filter": True,
}
掘金 (juejin)
"sensitive_words": {
"check_required": True,
"categories": ["politics"], # 仅检查政治内容
"max_tolerance": 0,
"auto_filter": True,
}
抖音 (douyin)
"sensitive_words": {
"check_required": True,
"categories": ["politics", "adult"],
"max_tolerance": 0,
"auto_filter": True,
}
过滤机制
SensitiveFilter
位置:backend/app/services/content/sensitive_filter.py
class SensitiveFilter:
def filter(self, content: str, platform: str) -> FilterResult:
"""
过滤敏感词
1. 获取平台敏感词配置
2. 合并基础词库和自定义词库
3. 逐个检查并替换
"""
过滤结果
@dataclass
class FilterResult:
filtered_content: str # 过滤后的内容
found_words: list # 发现的敏感词列表
replacements: dict # 替换映射
替换规则
- 敏感词被替换为
*字符 - 替换字符数与原词长度相同
AI写作特征
AI_PATTERNS
位置:backend/app/services/distribution/platform_rules.py
AI_PATTERNS = {
"banned_transitions": [
"总之", "综上所述", "值得注意的是", "让我们",
"总而言之", "不可否认", "毋庸置疑",
"首先", "其次", "最后", "最后但同样重要",
"换句话说", "也就是说", "更重要的是", "可以说",
],
"banned_modifiers": [
"至关重要", "不可或缺", "举足轻重", "蓬勃发展",
"日新月异", "深远影响", "全面提升", "显著成效",
"重大突破", "核心要素",
],
"banned_structures": [
r"第一[,、].*第二[,、].*第三", # 对称三段式
r"一方面[,、].*另一方面",
],
"safe_patterns": [
"根据研究表明", "调研数据显示", "经验告诉我们",
"事实上", "说白了", "说实话", "说真的",
],
}
平台AI敏感度
| 平台 | 检测级别 | humanization_required |
|---|---|---|
| 知乎 | high | true |
| 微信 | medium | true |
| 百家号 | high | true |
| 头条 | high | true |
| 微博 | low | false |
| 小红书 | low | false |
| B站 | medium | true |
| 简书 | medium | true |
| 掘金 | high | true |
| 抖音 | low | false |
自定义敏感词
支持添加自定义敏感词:
filter = SensitiveFilter()
filter.add_custom_words("custom_category", ["词1", "词2", "词3"])
检测流程
1. 获取平台配置的敏感词分类
2. 合并基础敏感词库
3. 添加自定义敏感词
4. 遍历检测
5. 替换并记录
6. 返回过滤结果