# 平台敏感词库 ## 概述 本文档描述各平台的敏感词分类和配置。 ## 敏感词分类 ### SENSITIVE_WORDS 位置:`backend/app/services/content/sensitive_filter.py` ```python SENSITIVE_WORDS = { "politics": [ "台湾", "西藏", "新疆", "香港", "澳门", "分裂", "独立", "抗议", "游行", "示威", "政治", "敏感词", ], "medical": [ "药品", "治疗", "疗效", "治愈", "处方", "医生", "医院", "手术", "医疗", "敏感词", ], "finance": [ "投资", "理财", "收益率", "回报", "股票", "基金", "债券", "期货", ], "adult": [ "色情", "赌博", "毒品", "暴力", ], } ``` ### SENSITIVE_CATEGORIES 位置:`backend/app/services/distribution/platform_rules.py` ```python SENSITIVE_CATEGORIES = { "politics": ["政治敏感词库"], "medical": ["医疗敏感词库"], "finance": ["金融敏感词库"], "adult": ["低俗敏感词库"], } ``` ## 平台敏感词配置 ### 知乎 (zhihu) ```python "sensitive_words": { "check_required": True, "categories": ["politics", "medical", "finance", "adult"], "max_tolerance": 0, # 零容忍 "auto_filter": True, } ``` ### 微信公众号 (wechat) ```python "sensitive_words": { "check_required": True, "categories": ["politics", "medical", "finance", "adult"], "max_tolerance": 0, "auto_filter": True, } ``` ### 百家号 (baijiahao) ```python "sensitive_words": { "check_required": True, "categories": ["politics", "medical", "finance", "adult"], "max_tolerance": 0, "auto_filter": True, } ``` ### 今日头条 (toutiao) ```python "sensitive_words": { "check_required": True, "categories": ["politics", "medical", "finance", "adult"], "max_tolerance": 0, "auto_filter": True, } ``` ### 微博 (weibo) ```python "sensitive_words": { "check_required": True, "categories": ["politics", "adult"], # 仅检查政治和低俗 "max_tolerance": 2, # 允许少量出现 "auto_filter": True, } ``` ### 小红书 (xiaohongshu) ```python "sensitive_words": { "check_required": True, "categories": ["adult"], # 仅检查低俗内容 "max_tolerance": 0, "auto_filter": True, } ``` ### B站 (bilibili) ```python "sensitive_words": { "check_required": True, "categories": ["politics", "adult"], "max_tolerance": 0, "auto_filter": True, } ``` ### 简书 (jianshu) ```python "sensitive_words": { "check_required": True, "categories": ["politics", "adult"], "max_tolerance": 0, "auto_filter": True, } ``` ### 掘金 (juejin) ```python "sensitive_words": { "check_required": True, "categories": ["politics"], # 仅检查政治内容 "max_tolerance": 0, "auto_filter": True, } ``` ### 抖音 (douyin) ```python "sensitive_words": { "check_required": True, "categories": ["politics", "adult"], "max_tolerance": 0, "auto_filter": True, } ``` ## 过滤机制 ### SensitiveFilter 位置:`backend/app/services/content/sensitive_filter.py` ```python class SensitiveFilter: def filter(self, content: str, platform: str) -> FilterResult: """ 过滤敏感词 1. 获取平台敏感词配置 2. 合并基础词库和自定义词库 3. 逐个检查并替换 """ ``` ### 过滤结果 ```python @dataclass class FilterResult: filtered_content: str # 过滤后的内容 found_words: list # 发现的敏感词列表 replacements: dict # 替换映射 ``` ### 替换规则 - 敏感词被替换为 `*` 字符 - 替换字符数与原词长度相同 ## AI写作特征 ### AI_PATTERNS 位置:`backend/app/services/distribution/platform_rules.py` ```python AI_PATTERNS = { "banned_transitions": [ "总之", "综上所述", "值得注意的是", "让我们", "总而言之", "不可否认", "毋庸置疑", "首先", "其次", "最后", "最后但同样重要", "换句话说", "也就是说", "更重要的是", "可以说", ], "banned_modifiers": [ "至关重要", "不可或缺", "举足轻重", "蓬勃发展", "日新月异", "深远影响", "全面提升", "显著成效", "重大突破", "核心要素", ], "banned_structures": [ r"第一[,、].*第二[,、].*第三", # 对称三段式 r"一方面[,、].*另一方面", ], "safe_patterns": [ "根据研究表明", "调研数据显示", "经验告诉我们", "事实上", "说白了", "说实话", "说真的", ], } ``` ## 平台AI敏感度 | 平台 | 检测级别 | humanization_required | |------|----------|---------------------| | 知乎 | high | true | | 微信 | medium | true | | 百家号 | high | true | | 头条 | high | true | | 微博 | low | false | | 小红书 | low | false | | B站 | medium | true | | 简书 | medium | true | | 掘金 | high | true | | 抖音 | low | false | ## 自定义敏感词 支持添加自定义敏感词: ```python filter = SensitiveFilter() filter.add_custom_words("custom_category", ["词1", "词2", "词3"]) ``` ## 检测流程 ``` 1. 获取平台配置的敏感词分类 2. 合并基础敏感词库 3. 添加自定义敏感词 4. 遍历检测 5. 替换并记录 6. 返回过滤结果 ```