51 KiB

Raw Blame History

GEO Workflow 业务分析与开发规划

本文档整合了GEO平台Workflow业务合理性分析、缺失节点技术方案和分阶段开发计划，作为后续开发的权威参考。

第一部分：GEO Workflow 业务合理性分析

1.1 当前流程问题诊断

原7步流程：

1. 客户行业/产品/卖点分析
2. AI分析主流AI引擎的回答模式
3. 识别"AI会引用什么样的内容"
4. 生成优化策略
5. 执行优化
6. 定期检测
7. 生成GEO效果报告

问题诊断：

问题	说明	影响
缺少竞品对标环节	原流程只关注自身品牌，缺少与竞品在AI回答中的对比分析	无法衡量相对竞争力，优化方向不明确
缺少内容发布后验证	步骤5执行优化后，缺少对优化效果的即时验证	无法快速验证优化是否生效，浪费迭代周期
步骤2和3可合并	AI回答模式分析和引用模式识别是同一数据源的两个分析维度	流程冗余，增加用户理解成本
缺少反馈闭环	检测结果应反馈到策略生成，形成迭代优化循环	无法持续改进，优化效果递减
缺少客户画像环节	步骤1过于简单，缺少对目标受众AI使用习惯的分析	策略缺乏针对性，无法精准触达目标用户

与现有代码的对应关系：

当前代码库已实现的能力：

CitationEngine — 已实现7个AI平台的查询和引用检测
QueryScheduler — 已实现基于APScheduler的定时查询调度
BrandMatcher — 已实现品牌匹配（精确/别名/模糊）
CompetitorDetector — 已实现竞品检测（但仅限预定义品牌列表）
OptimizationAdvisor — 已实现基于规则和LLM的优化建议生成
CitationRecord — 已实现引用记录数据模型（含情感分析、引用源分析字段）

当前代码库的不足：

缺少引用模式识别引擎（当前仅做品牌匹配，未分析AI引用偏好模式）
缺少GEO效果报告生成（当前仅有评分，无趋势报告）
缺少Workflow编排引擎（当前Pipeline仅用于内容生成，未覆盖GEO全流程）
缺少内容自动优化执行（当前仅生成建议，未执行优化）
缺少效果即时验证（优化后无自动重新查询验证）

1.2 优化后的流程设计（10步闭环）

┌─────────────────────────────────────────────────────────────────────────┐
│                        GEO 10步闭环流程                                  │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  ① 品牌画像构建 ──→ ② 竞品基准设定 ──→ ③ AI引擎查询分析               │
│                                                  │                      │
│                                                  ▼                      │
│  ⑩ 迭代优化 ← ⑨ GEO效果报告 ← ⑧ 定期监测 ← ⑦ 效果即时验证           │
│       │                              ▲                                  │
│       │                              │                                  │
│       └──────→ ⑤ 优化策略生成 ← ④ 引用模式识别                        │
│                      │                                                  │
│                      ▼                                                  │
│                 ⑥ 内容优化执行                                          │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

步骤1：品牌画像构建

属性	说明
输入	品牌信息、行业、产品、卖点
输出	品牌画像（含行业分类、目标受众、差异化价值）
自动化	半自动（AI辅助分析+人工确认）
现有基础	Brand 模型已有 name/aliases/industry 字段，需扩展画像维度

扩展字段设计：

# 在 Brand 模型中新增字段
profile_data: Mapped[dict | None] = mapped_column(
    JSONType, nullable=True,
    comment="品牌画像数据: {target_audience, differentiators, key_products, ai_usage_habits}"
)

步骤2：竞品基准设定

属性	说明
输入	竞品列表、目标查询词
输出	竞品AI可见性基准数据
自动化	半自动（自动查询+人工确认竞品）
现有基础	Competitor 模型已存在，CompetitorDetector 已实现基础竞品检测

改进方向： 当前 CompetitorDetector.KNOWN_BRANDS 仅包含3个行业的预定义品牌，需要改为从数据库动态加载竞品列表。

步骤3：AI引擎查询分析

属性	说明
输入	查询词列表、品牌名、竞品名
输出	各AI引擎的回答数据（含引用来源、引用位置、引用频率）
自动化	全自动（定时并行查询）
现有基础	CitationEngine 已实现7平台查询，BasePlatformAdapter 已定义适配器基类

现有平台适配器：

适配器	文件	API来源
WenxinAdapter	wenxin.py	文心一言 API
KimiAdapter	kimi.py	Moonshot API
TongyiAdapter	tongyi.py	通义千问 API
DoubaoAdapter	doubao.py	豆包 API
QingyanAdapter	qingyan.py	轻颜 API
TiangongAdapter	tiangong.py	天工 API
XinghuoAdapter	xinghuo.py	星火 API

改进方向： 增加ChatGPT和Perplexity适配器，实现批量并行查询，增加查询结果的结构化存储。

步骤4：引用模式识别

属性	说明
输入	AI引擎回答数据
输出	引用模式报告（AI偏好内容类型、结构特征、权威信号）
自动化	全自动（规则引擎+LLM分析）
现有基础	citation_extractor.py 已实现引用源提取，需扩展为模式识别

新增能力： 从单次引用检测升级为跨查询的模式分析，识别AI引擎的内容偏好规律。

步骤5：优化策略生成

属性	说明
输入	品牌画像+引用模式报告+竞品基准
输出	优化策略清单（Schema/FAQ/权威引用/内容格式）
自动化	半自动（AI生成+人工审核）
现有基础	OptimizationAdvisor 已实现5类建议生成，需扩展输入源

改进方向： 当前策略生成仅基于评分差距，需增加引用模式分析结果作为输入，使策略更精准。

步骤6：内容优化执行

属性	说明
输入	优化策略清单
输出	优化后的内容/结构化数据
自动化	半自动（AI生成+人工审核发布）
现有基础	ContentPipeline 已实现内容处理流水线

改进方向： 增加 Schema 生成器和 FAQ 生成器，实现从策略到内容的自动化执行。

步骤7：效果即时验证

属性	说明
输入	优化后的内容
输出	AI引擎重新查询结果对比
自动化	全自动
现有基础	CitationEngine 可复用，需增加前后对比逻辑

新增能力： 优化执行后自动触发AI引擎重新查询，对比优化前后的引用变化。

步骤8：定期监测

属性	说明
输入	监测配置（频率、指标、阈值）
输出	监测数据、告警通知
自动化	全自动（定时任务+告警）
现有基础	QueryScheduler 已实现定时查询，AlertEngine 已实现告警

改进方向： 增加自定义监测指标和阈值配置，支持基于引用率变化的智能告警。

步骤9：GEO效果报告

属性	说明
输入	监测数据+优化记录
输出	效果报告（趋势、对比、ROI）
自动化	半自动（自动生成+人工解读）
现有基础	ScoringService 已实现评分，需扩展为趋势报告

新增能力： 多时间维度的趋势分析、优化前后对比、竞品横向对比。

步骤10：迭代优化（反馈闭环）

属性	说明
输入	效果报告+监测数据
输出	新一轮优化策略
自动化	半自动（AI建议+人工决策）
现有基础	OptimizationAdvisor 可复用，需增加反馈输入

新增能力： 将效果报告自动反馈到策略生成，形成闭环迭代。

1.3 与行业最佳实践对比

维度	我们的流程	Profound	Otterly AI	新榜智汇
多引擎查询	7+引擎（可扩展）	3引擎	4引擎	国内3引擎
竞品对标	✅ 闭环	✅	✅	❌
引用模式分析	✅ 规则+LLM	✅	❌	❌
自动优化执行	半自动	半自动	手动	手动
效果报告	✅ 趋势+对比	✅	✅	✅
反馈闭环	✅ 10步闭环	❌	❌	❌
效果即时验证	✅	❌	❌	❌

差异化优势： 反馈闭环和效果即时验证是我们的核心差异化能力，竞品均未实现。

第二部分：缺失节点必要性与技术方案

2.1 节点优先级矩阵

节点	重要性	紧迫性	技术难度	推荐优先级	现有基础
AI引擎查询分析	P0	高	中	Phase 1	CitationEngine + 7个适配器
引用模式识别	P0	高	高	Phase 1	citation_extractor
定时自动检测	P1	高	低	Phase 1	QueryScheduler + APScheduler
GEO效果报告	P1	中	中	Phase 2	ScoringService
基于AI分析的策略生成	P1	中	中	Phase 2	OptimizationAdvisor
Workflow引擎	P2	低	高	Phase 3	PipelineEngine
网站内容自动优化	P2	低	高	Phase 3	ContentPipeline

2.2 各节点详细技术方案

节点1：AI引擎查询分析

现状分析： CitationEngine 已实现7个平台的查询和引用检测，但存在以下不足：

不支持批量并行查询（当前串行执行）
不支持ChatGPT和Perplexity等国际平台
查询结果仅存储为CitationRecord，缺少结构化的回答分析

数据模型： 扩展现有 CitationRecord

class AIQueryResult(Base):
    """AI引擎查询结果 - 存储完整的查询响应供后续分析"""
    __tablename__ = "ai_query_results"

    id: Mapped[uuid.UUID] = mapped_column(Uuid(as_uuid=True), primary_key=True, default=uuid.uuid4)
    brand_id: Mapped[uuid.UUID] = mapped_column(Uuid(as_uuid=True), ForeignKey("brands.id", ondelete="CASCADE"), nullable=False)
    query_keyword: Mapped[str] = mapped_column(String(200), nullable=False)
    engine: Mapped[str] = mapped_column(String(50), nullable=False, comment="AI引擎标识")
    raw_response: Mapped[str | None] = mapped_column(Text, nullable=True)
    response_length: Mapped[int | None] = mapped_column(Integer, nullable=True)
    cited: Mapped[bool] = mapped_column(Boolean, default=False, nullable=False)
    citation_position: Mapped[int | None] = mapped_column(Integer, nullable=True)
    citation_text: Mapped[str | None] = mapped_column(Text, nullable=True)
    source_urls: Mapped[list | None] = mapped_column(JSON, nullable=True)
    source_titles: Mapped[list | None] = mapped_column(JSON, nullable=True)
    confidence: Mapped[float | None] = mapped_column(Float, nullable=True)
    match_type: Mapped[str | None] = mapped_column(String(20), nullable=True)
    competitor_brands: Mapped[list] = mapped_column(JSON, default=list)
    queried_at: Mapped[datetime] = mapped_column(server_default=func.now(), nullable=False)

    __table_args__ = (
        Index("idx_ai_query_results_brand_id", "brand_id"),
        Index("idx_ai_query_results_engine", "engine"),
        Index("idx_ai_query_results_queried_at", "queried_at"),
        Index("idx_ai_query_results_brand_engine", "brand_id", "engine"),
    )

API设计：

POST /api/v1/ai-engines/query          # 单引擎查询
POST /api/v1/ai-engines/query-batch    # 批量并行查询
GET  /api/v1/ai-engines/results        # 获取查询结果（支持分页、过滤）
GET  /api/v1/ai-engines/results/{id}   # 获取单条查询结果详情

核心逻辑： 基于现有 BasePlatformAdapter 扩展适配器模式

# 扩展现有 BasePlatformAdapter
class EnhancedPlatformAdapter(BasePlatformAdapter):
    """增强版平台适配器 - 支持结构化响应"""

    @abstractmethod
    async def query_structured(self, keyword: str) -> StructuredQueryResult:
        """查询并返回结构化结果"""
        pass

    @abstractmethod
    async def health_check(self) -> bool:
        """检查平台API是否可用"""
        pass

@dataclass
class StructuredQueryResult:
    engine: str
    keyword: str
    raw_response: str
    citations: list[CitationInfo]
    response_length: int
    queried_at: datetime
    success: bool
    error_message: str | None = None

新增适配器：

# ChatGPT适配器 - 基于OpenAI API
class ChatGPTAdapter(EnhancedPlatformAdapter):
    platform_name = "chatgpt"
    platform_url = "https://chat.openai.com"
    _api_base = "https://api.openai.com/v1"
    _model = "gpt-4o"

# Perplexity适配器 - 基于Perplexity API
class PerplexityAdapter(EnhancedPlatformAdapter):
    platform_name = "perplexity"
    platform_url = "https://www.perplexity.ai"
    _api_base = "https://api.perplexity.ai"
    _model = "pplx-70b-online"

批量并行查询服务：

class BatchQueryService:
    """批量并行查询服务"""

    def __init__(self):
        self.engine = CitationEngine()
        self.enhanced_adapters: dict[str, EnhancedPlatformAdapter] = {}

    async def query_batch(
        self,
        brand_id: uuid.UUID,
        keywords: list[str],
        engines: list[str],
        target_brand: str,
        brand_aliases: list[str] | None = None,
    ) -> list[AIQueryResult]:
        """批量并行查询多个引擎"""
        tasks = []
        for keyword in keywords:
            for engine_name in engines:
                tasks.append(
                    self._query_single(brand_id, keyword, engine_name, target_brand, brand_aliases)
                )
        results = await asyncio.gather(*tasks, return_exceptions=True)
        return [r for r in results if isinstance(r, AIQueryResult)]

    async def _query_single(self, brand_id, keyword, engine, target_brand, brand_aliases):
        adapter = self.enhanced_adapters.get(engine)
        if not adapter:
            raise ValueError(f"不支持的引擎: {engine}")
        result = await adapter.query_structured(keyword)
        # ... 存储到数据库

外部依赖：

OpenAI API（ChatGPT适配器）
Perplexity API（Perplexity适配器）
现有7个平台API（已集成）

节点2：引用模式识别

现状分析： citation_extractor.py 已实现引用源提取（URL、标题、上下文），但缺少跨查询的模式分析能力。

数据模型：

class CitationPattern(Base):
    """引用模式 - AI引擎的引用偏好规律"""
    __tablename__ = "citation_patterns"

    id: Mapped[uuid.UUID] = mapped_column(Uuid(as_uuid=True), primary_key=True, default=uuid.uuid4)
    brand_id: Mapped[uuid.UUID] = mapped_column(Uuid(as_uuid=True), ForeignKey("brands.id", ondelete="CASCADE"), nullable=False)
    pattern_type: Mapped[str] = mapped_column(
        String(50), nullable=False,
        comment="模式类型: content_structure/authority_signal/citation_format/topic_preference"
    )
    engine: Mapped[str] = mapped_column(String(50), nullable=False, comment="AI引擎标识")
    frequency: Mapped[float] = mapped_column(Float, default=0.0, comment="该模式出现的频率")
    content_features: Mapped[dict | None] = mapped_column(JSON, nullable=True, comment="内容特征描述")
    engine_preference: Mapped[dict | None] = mapped_column(JSON, nullable=True, comment="引擎偏好数据")
    sample_count: Mapped[int] = mapped_column(Integer, default=0, comment="分析样本数")
    confidence: Mapped[float] = mapped_column(Float, default=0.0, comment="模式置信度")
    analyzed_at: Mapped[datetime] = mapped_column(server_default=func.now(), nullable=False)

    __table_args__ = (
        Index("idx_citation_patterns_brand_id", "brand_id"),
        Index("idx_citation_patterns_pattern_type", "pattern_type"),
        Index("idx_citation_patterns_engine", "engine"),
    )

API设计：

POST /api/v1/citation-patterns/analyze    # 触发引用模式分析
GET  /api/v1/citation-patterns            # 获取模式列表（支持分页、过滤）
GET  /api/v1/citation-patterns/{id}       # 获取模式详情
GET  /api/v1/citation-patterns/summary    # 获取模式摘要（按引擎聚合）

核心逻辑： 规则引擎（结构化特征提取）+ LLM（语义分析）

class CitationPatternEngine:
    """引用模式识别引擎"""

    def __init__(self):
        self.rule_analyzers = {
            "content_structure": ContentStructureAnalyzer(),
            "authority_signal": AuthoritySignalAnalyzer(),
            "citation_format": CitationFormatAnalyzer(),
            "topic_preference": TopicPreferenceAnalyzer(),
        }

    async def analyze(self, brand_id: uuid.UUID, db: AsyncSession) -> list[CitationPattern]:
        """分析引用模式"""
        # 1. 获取该品牌的所有查询结果
        results = await self._fetch_query_results(brand_id, db)

        # 2. 规则引擎分析结构化特征
        patterns = []
        for pattern_type, analyzer in self.rule_analyzers.items():
            rule_patterns = analyzer.analyze(results)
            patterns.extend(rule_patterns)

        # 3. LLM语义分析（补充规则引擎无法发现的模式）
        if settings.ENABLE_LLM:
            llm_patterns = await self._llm_analyze(results)
            patterns.extend(llm_patterns)

        # 4. 存储并返回
        for pattern in patterns:
            db.add(pattern)
        await db.commit()
        return patterns


class ContentStructureAnalyzer:
    """内容结构分析器 - 识别AI偏好的内容结构"""

    def analyze(self, results: list[AIQueryResult]) -> list[CitationPattern]:
        patterns = []
        # 分析引用内容中的结构特征
        # - 列表/表格出现频率
        # - FAQ格式出现频率
        # - 标题层级结构
        # - 数据/统计信息出现频率
        return patterns


class AuthoritySignalAnalyzer:
    """权威信号分析器 - 识别AI偏好的权威性信号"""

    def analyze(self, results: list[AIQueryResult]) -> list[CitationPattern]:
        patterns = []
        # 分析引用来源中的权威信号
        # - .gov/.edu/.org 域名占比
        # - 维基百科引用频率
        # - 学术论文引用频率
        # - 官方网站引用频率
        return patterns

节点3：定时自动检测

现状分析： QueryScheduler 已实现基于APScheduler的定时查询调度，支持每小时检查和每分钟pending任务兜底。需扩展为支持自定义频率和智能调度。

数据模型： 扩展现有 Query 模型

# 在 Query 模型中新增字段
detection_config: Mapped[dict | None] = mapped_column(
    JSONType, nullable=True,
    comment="检测配置: {metrics: [...], thresholds: {...}, alert_rules: [...]}"
)

API设计：

POST /api/v1/detection/tasks          # 创建检测任务
GET  /api/v1/detection/tasks          # 获取任务列表
PUT  /api/v1/detection/tasks/{id}     # 更新任务配置
DELETE /api/v1/detection/tasks/{id}   # 删除检测任务
POST /api/v1/detection/tasks/{id}/trigger  # 手动触发检测

核心逻辑： 扩展现有 QueryScheduler

class EnhancedQueryScheduler(QueryScheduler):
    """增强版查询调度器 - 支持自定义频率和智能调度"""

    def __init__(self):
        super().__init__()
        self._detection_configs: dict[str, DetectionConfig] = {}

    async def create_detection_task(self, config: DetectionConfig, db: AsyncSession):
        """创建检测任务"""
        # 1. 创建 Query 记录
        query = Query(
            user_id=config.user_id,
            keyword=config.keyword,
            target_brand=config.target_brand,
            brand_aliases=config.brand_aliases,
            platforms=config.platforms,
            frequency=config.frequency,
            detection_config=config.to_dict(),
        )
        db.add(query)
        await db.commit()

        # 2. 注册定时任务
        self._register_job(query)
        return query

    def _register_job(self, query: Query):
        """根据频率注册定时任务"""
        freq_map = {
            "hourly": IntervalTrigger(hours=1),
            "daily": IntervalTrigger(hours=24),
            "weekly": IntervalTrigger(days=7),
            "monthly": IntervalTrigger(days=30),
        }
        trigger = freq_map.get(query.frequency, IntervalTrigger(days=7))
        self.scheduler.add_job(
            self._execute_detection,
            trigger=trigger,
            id=f"detection_{query.id}",
            name=f"检测任务: {query.keyword}",
            args=[query.id],
            replace_existing=True,
        )

节点4：GEO效果报告

现状分析： ScoringService 已实现V2评分（5维度），但缺少趋势分析和报告生成能力。

数据模型：

class GEOResultReport(Base):
    """GEO效果报告"""
    __tablename__ = "geo_result_reports"

    id: Mapped[uuid.UUID] = mapped_column(Uuid(as_uuid=True), primary_key=True, default=uuid.uuid4)
    brand_id: Mapped[uuid.UUID] = mapped_column(Uuid(as_uuid=True), ForeignKey("brands.id", ondelete="CASCADE"), nullable=False)
    period_start: Mapped[datetime] = mapped_column(nullable=False, comment="报告周期起始")
    period_end: Mapped[datetime] = mapped_column(nullable=False, comment="报告周期结束")
    overall_score: Mapped[float] = mapped_column(Float, nullable=False)
    previous_score: Mapped[float | None] = mapped_column(Float, nullable=True, comment="上期评分")
    score_change: Mapped[float | None] = mapped_column(Float, nullable=True, comment="评分变化")
    metrics: Mapped[dict] = mapped_column(JSON, nullable=False, comment="各维度指标数据")
    trends: Mapped[dict | None] = mapped_column(JSON, nullable=True, comment="趋势数据")
    competitor_comparison: Mapped[dict | None] = mapped_column(JSON, nullable=True, comment="竞品对比数据")
    recommendations: Mapped[dict | None] = mapped_column(JSON, nullable=True, comment="AI生成的建议")
    generated_at: Mapped[datetime] = mapped_column(server_default=func.now(), nullable=False)

    __table_args__ = (
        Index("idx_geo_result_reports_brand_id", "brand_id"),
        Index("idx_geo_result_reports_period", "period_start", "period_end"),
    )

API设计：

POST /api/v1/reports/generate       # 生成报告
GET  /api/v1/reports                # 获取报告列表
GET  /api/v1/reports/{id}           # 获取报告详情
GET  /api/v1/reports/trends         # 获取趋势数据

核心逻辑： 数据聚合 + 趋势计算 + LLM解读

class GEOReportService:
    """GEO效果报告生成服务"""

    async def generate_report(
        self, brand_id: uuid.UUID, period: str, db: AsyncSession
    ) -> GEOResultReport:
        """生成GEO效果报告"""
        # 1. 聚合周期内数据
        current_metrics = await self._aggregate_metrics(brand_id, period, db)
        previous_metrics = await self._aggregate_metrics(brand_id, self._previous_period(period), db)

        # 2. 计算趋势
        trends = self._calculate_trends(current_metrics, previous_metrics)

        # 3. 竞品对比
        competitor_comparison = await self._compare_competitors(brand_id, period, db)

        # 4. LLM解读（可选）
        recommendations = None
        if settings.ENABLE_LLM:
            recommendations = await self._llm_interpret(current_metrics, trends, competitor_comparison)

        # 5. 生成报告
        report = GEOResultReport(
            brand_id=brand_id,
            period_start=current_metrics["period_start"],
            period_end=current_metrics["period_end"],
            overall_score=current_metrics["overall_score"],
            previous_score=previous_metrics.get("overall_score"),
            score_change=current_metrics["overall_score"] - previous_metrics.get("overall_score", 0),
            metrics=current_metrics,
            trends=trends,
            competitor_comparison=competitor_comparison,
            recommendations=recommendations,
        )
        db.add(report)
        await db.commit()
        return report

节点5：基于AI分析的策略生成

现状分析： OptimizationAdvisor 已实现5类建议生成（content_optimization/platform_targeting/competitor_gap/query_expansion/citation_improvement），支持规则生成和LLM生成两种模式。需扩展输入源，增加引用模式分析结果。

改进方案： 扩展 BrandAnalysisContext 数据结构

@dataclass
class EnhancedBrandAnalysisContext(BrandAnalysisContext):
    """增强版品牌分析上下文 - 增加引用模式分析输入"""
    citation_patterns: list[dict] = field(default_factory=list)
    ai_engine_preferences: dict[str, dict] = field(default_factory=dict)
    content_structure_insights: dict[str, Any] = field(default_factory=dict)
    authority_signal_insights: dict[str, Any] = field(default_factory=dict)

API设计： 扩展现有 /api/v1/suggestions 端点

POST /api/v1/suggestions/generate-with-patterns   # 基于引用模式生成策略
GET  /api/v1/suggestions                          # 获取建议列表（已有）
PUT  /api/v1/suggestions/{id}/status              # 更新建议状态（已有）

核心逻辑： 引用模式 → 策略模板匹配 → LLM生成具体建议

async def generate_pattern_based_suggestions(
    ctx: EnhancedBrandAnalysisContext,
) -> list[SuggestionItem]:
    """基于引用模式分析生成优化策略"""
    suggestions = []

    # 1. 基于内容结构模式生成策略
    for pattern in ctx.citation_patterns:
        if pattern["pattern_type"] == "content_structure":
            if pattern.get("features", {}).get("faq_frequency", 0) > 0.5:
                suggestions.append(SuggestionItem(
                    type="content_optimization",
                    priority="high",
                    title="增加FAQ结构化内容",
                    description=f"AI引擎偏好FAQ格式内容（出现频率{pattern['frequency']:.0%}）",
                    action="1. 在核心页面添加FAQPage Schema标记\n2. 创建常见问题解答页面\n3. 使用问答式标题结构",
                    expected_impact="预计提升引用率15-25%",
                    difficulty="medium",
                ))

    # 2. 基于权威信号模式生成策略
    # 3. 基于引用格式模式生成策略
    # 4. LLM补充生成

    return suggestions

节点6：Workflow引擎

现状分析： PipelineEngine 已实现DAG执行引擎，支持条件执行、重试机制和变量解析。但当前仅用于内容生成Pipeline，需扩展为通用Workflow引擎。

数据模型：

class WorkflowDefinition(Base):
    """Workflow定义"""
    __tablename__ = "workflow_definitions"

    id: Mapped[uuid.UUID] = mapped_column(Uuid(as_uuid=True), primary_key=True, default=uuid.uuid4)
    name: Mapped[str] = mapped_column(String(100), nullable=False)
    description: Mapped[str | None] = mapped_column(Text, nullable=True)
    steps: Mapped[dict] = mapped_column(JSON, nullable=False, comment="Workflow步骤定义（DAG）")
    version: Mapped[int] = mapped_column(Integer, default=1)
    is_active: Mapped[bool] = mapped_column(Boolean, default=True)
    created_at: Mapped[datetime] = mapped_column(server_default=func.now(), nullable=False)
    updated_at: Mapped[datetime] = mapped_column(server_default=func.now(), onupdate=func.now(), nullable=False)


class WorkflowExecution(Base):
    """Workflow执行记录"""
    __tablename__ = "workflow_executions"

    id: Mapped[uuid.UUID] = mapped_column(Uuid(as_uuid=True), primary_key=True, default=uuid.uuid4)
    workflow_id: Mapped[uuid.UUID] = mapped_column(Uuid(as_uuid=True), ForeignKey("workflow_definitions.id", ondelete="CASCADE"), nullable=False)
    brand_id: Mapped[uuid.UUID] = mapped_column(Uuid(as_uuid=True), ForeignKey("brands.id", ondelete="CASCADE"), nullable=False)
    status: Mapped[str] = mapped_column(String(20), default="pending", comment="pending/running/completed/failed")
    current_step: Mapped[str | None] = mapped_column(String(100), nullable=True)
    step_results: Mapped[dict | None] = mapped_column(JSON, nullable=True)
    error_message: Mapped[str | None] = mapped_column(Text, nullable=True)
    started_at: Mapped[datetime | None] = mapped_column(nullable=True)
    completed_at: Mapped[datetime | None] = mapped_column(nullable=True)
    created_at: Mapped[datetime] = mapped_column(server_default=func.now(), nullable=False)

    __table_args__ = (
        Index("idx_workflow_executions_workflow_id", "workflow_id"),
        Index("idx_workflow_executions_brand_id", "brand_id"),
        Index("idx_workflow_executions_status", "status"),
    )

API设计：

POST /api/v1/workflows                    # 创建工作流
GET  /api/v1/workflows                    # 获取工作流列表
GET  /api/v1/workflows/{id}               # 获取工作流详情
POST /api/v1/workflows/{id}/execute       # 执行工作流
GET  /api/v1/workflows/{id}/status        # 获取执行状态
POST /api/v1/workflows/{id}/pause         # 暂停执行
POST /api/v1/workflows/{id}/resume        # 恢复执行

核心逻辑： 基于现有 PipelineEngine 扩展

class GEOWorkflowEngine:
    """GEO Workflow引擎 - 编排10步闭环流程"""

    def __init__(self):
        self.pipeline_engine = PipelineEngine()

    async def execute_geo_workflow(
        self, brand_id: uuid.UUID, db: AsyncSession
    ) -> WorkflowExecution:
        """执行完整的GEO Workflow"""
        execution = WorkflowExecution(
            workflow_id=self._get_geo_workflow_id(),
            brand_id=brand_id,
            status="running",
        )
        db.add(execution)
        await db.commit()

        try:
            # Step 1: 品牌画像构建
            execution.current_step = "brand_profile"
            await db.commit()
            brand_profile = await self._build_brand_profile(brand_id, db)

            # Step 2: 竞品基准设定
            execution.current_step = "competitor_baseline"
            await db.commit()
            competitor_baseline = await self._set_competitor_baseline(brand_id, db)

            # Step 3: AI引擎查询分析
            execution.current_step = "ai_engine_query"
            await db.commit()
            query_results = await self._execute_ai_queries(brand_id, brand_profile, db)

            # Step 4: 引用模式识别
            execution.current_step = "citation_pattern"
            await db.commit()
            patterns = await self._analyze_citation_patterns(brand_id, query_results, db)

            # Step 5: 优化策略生成
            execution.current_step = "strategy_generation"
            await db.commit()
            strategies = await self._generate_strategies(brand_profile, patterns, competitor_baseline, db)

            # Step 6: 内容优化执行
            execution.current_step = "content_optimization"
            await db.commit()
            optimization_results = await self._execute_optimization(strategies, db)

            # Step 7: 效果即时验证
            execution.current_step = "immediate_verification"
            await db.commit()
            verification = await self._verify_immediately(brand_id, optimization_results, db)

            # Step 8-10: 由定时任务驱动
            execution.status = "completed"
            execution.completed_at = datetime.utcnow()

        except Exception as e:
            execution.status = "failed"
            execution.error_message = str(e)
            execution.completed_at = datetime.utcnow()

        await db.commit()
        return execution

节点7：网站内容自动优化

现状分析： ContentPipeline 已实现内容处理流水线（规则校验→敏感词过滤→SEO优化→HTML生成），但缺少 Schema 生成和 FAQ 生成能力。

数据模型：

class ContentOptimizationTask(Base):
    """内容优化任务"""
    __tablename__ = "content_optimization_tasks"

    id: Mapped[uuid.UUID] = mapped_column(Uuid(as_uuid=True), primary_key=True, default=uuid.uuid4)
    brand_id: Mapped[uuid.UUID] = mapped_column(Uuid(as_uuid=True), ForeignKey("brands.id", ondelete="CASCADE"), nullable=False)
    suggestion_id: Mapped[uuid.UUID | None] = mapped_column(Uuid(as_uuid=True), ForeignKey("suggestions.id", ondelete="SET NULL"), nullable=True)
    optimization_type: Mapped[str] = mapped_column(
        String(50), nullable=False,
        comment="优化类型: schema_markup/faq_content/content_rewrite/authority_link"
    )
    target_url: Mapped[str | None] = mapped_column(String(500), nullable=True, comment="目标页面URL")
    original_content: Mapped[str | None] = mapped_column(Text, nullable=True)
    optimized_content: Mapped[str | None] = mapped_column(Text, nullable=True)
    changes: Mapped[dict | None] = mapped_column(JSON, nullable=True, comment="变更详情")
    status: Mapped[str] = mapped_column(
        String(20), nullable=False, default="pending",
        comment="pending/ai_generated/human_reviewed/published/rejected"
    )
    reviewer_id: Mapped[uuid.UUID | None] = mapped_column(Uuid(as_uuid=True), nullable=True)
    created_at: Mapped[datetime] = mapped_column(server_default=func.now(), nullable=False)
    updated_at: Mapped[datetime] = mapped_column(server_default=func.now(), onupdate=func.now(), nullable=False)

    __table_args__ = (
        Index("idx_content_optimization_brand_id", "brand_id"),
        Index("idx_content_optimization_status", "status"),
        Index("idx_content_optimization_type", "optimization_type"),
    )

API设计：

POST /api/v1/content-optimization              # 创建优化任务
GET  /api/v1/content-optimization              # 获取任务列表
GET  /api/v1/content-optimization/{id}         # 获取任务详情
PUT  /api/v1/content-optimization/{id}/review  # 人工审核
POST /api/v1/content-optimization/{id}/publish # 发布优化内容

核心逻辑： 半自动模式，AI生成+人工审核发布

class ContentOptimizer:
    """内容自动优化服务"""

    async def generate_schema_markup(self, brand: Brand, db: AsyncSession) -> dict:
        """生成Schema.org结构化数据"""
        schema_types = {
            "Organization": self._generate_organization_schema,
            "Product": self._generate_product_schema,
            "FAQPage": self._generate_faq_schema,
            "Article": self._generate_article_schema,
        }
        results = {}
        for schema_type, generator in schema_types.items():
            results[schema_type] = await generator(brand)
        return results

    async def generate_faq_content(self, brand: Brand, patterns: list[CitationPattern]) -> list[dict]:
        """基于引用模式生成FAQ内容"""
        # 从引用模式中提取高频问题
        # 使用LLM生成FAQ回答
        # 返回结构化FAQ数据
        pass

    async def optimize_content(
        self, task: ContentOptimizationTask, db: AsyncSession
    ) -> ContentOptimizationTask:
        """执行内容优化"""
        if task.optimization_type == "schema_markup":
            task.optimized_content = await self._apply_schema_optimization(task)
        elif task.optimization_type == "faq_content":
            task.optimized_content = await self._apply_faq_optimization(task)
        elif task.optimization_type == "content_rewrite":
            task.optimized_content = await self._apply_content_rewrite(task)

        task.status = "ai_generated"
        await db.commit()
        return task

第三部分：开发计划

Phase 1 - 核心能力建设（4周）

Week 1-2：AI引擎查询分析

任务	类型	说明
AIQueryResult 数据模型	后端	新增查询结果模型，含迁移脚本
EnhancedPlatformAdapter	后端	扩展适配器基类，支持结构化响应
ChatGPT 适配器	后端	基于 OpenAI API 实现
Perplexity 适配器	后端	基于 Perplexity API 实现
BatchQueryService	后端	批量并行查询服务
AI引擎查询API	后端	3个端点（单查询/批量/结果）
AI引擎查询结果展示页	前端	查询结果表格+详情弹窗
单元测试+集成测试	测试	覆盖适配器和服务层

关键交付物：

9个AI平台适配器（现有7个+新增2个）
批量并行查询服务
查询结果API和前端展示

Week 3-4：引用模式识别+定时检测

任务	类型	说明
CitationPattern 数据模型	后端	新增引用模式模型，含迁移脚本
CitationPatternEngine	后端	引用模式识别引擎（4类分析器）
ContentStructureAnalyzer	后端	内容结构特征分析
AuthoritySignalAnalyzer	后端	权威信号分析
EnhancedQueryScheduler	后端	扩展定时调度器
DetectionConfig 扩展	后端	Query模型增加检测配置字段
引用模式分析API	后端	4个端点（分析/列表/详情/摘要）
检测任务配置API	后端	5个端点（CRUD+手动触发）
引用模式分析页面	前端	模式可视化+引擎偏好对比
检测任务配置页面	前端	任务创建/编辑/触发
单元测试+集成测试	测试	覆盖分析器和调度器

关键交付物：

引用模式识别引擎（4类分析器）
增强版定时调度器
分析和配置API及前端页面

Phase 2 - 策略与报告（3周）

Week 5-6：策略生成+效果报告

任务	类型	说明
EnhancedBrandAnalysisContext	后端	扩展分析上下文，增加引用模式输入
generate_pattern_based_suggestions	后端	基于引用模式的策略生成
GEOResultReport 数据模型	后端	新增报告模型，含迁移脚本
GEOReportService	后端	报告生成服务（聚合+趋势+LLM解读）
策略生成API扩展	后端	新增基于模式的策略生成端点
报告API	后端	4个端点（生成/列表/详情/趋势）
策略生成结果页面	前端	策略卡片+引用模式关联展示
效果报告展示页面	前端	趋势图表+竞品对比+AI解读
单元测试+集成测试	测试	覆盖策略生成和报告服务

Week 7：集成测试+优化

任务	类型	说明
端到端测试	测试	品牌创建→查询→分析→策略→报告全流程
性能优化	后端	批量查询并发优化、缓存策略
Bug修复	全栈	修复集成测试发现的问题

Phase 3 - 自动化引擎（3周）

Week 8-9：Workflow引擎

任务	类型	说明
WorkflowDefinition 数据模型	后端	Workflow定义模型
WorkflowExecution 数据模型	后端	Workflow执行记录模型
GEOWorkflowEngine	后端	基于PipelineEngine扩展的Workflow引擎
Workflow API	后端	7个端点（CRUD+执行+暂停+恢复）
Workflow配置页面	前端	可视化Workflow编辑器
Workflow监控页面	前端	执行状态+步骤进度+日志
单元测试+集成测试	测试	覆盖Workflow引擎

Week 10：内容自动优化

任务	类型	说明
ContentOptimizationTask 数据模型	后端	内容优化任务模型
ContentOptimizer	后端	内容优化服务
Schema生成器	后端	Organization/Product/FAQPage/Article
FAQ生成器	后端	基于引用模式的FAQ内容生成
半自动审核流程	后端	AI生成→人工审核→发布
内容优化API	后端	5个端点（创建/列表/详情/审核/发布）
单元测试+集成测试	测试	覆盖优化器和审核流程

Phase 4 - 打磨上线（2周）

Week 11-12：全面测试+上线准备

任务	类型	说明
全量E2E测试	测试	覆盖所有用户流程
性能压测	测试	批量查询并发、报告生成性能
安全审计	安全	API权限、数据隔离、敏感信息
文档完善	文档	API文档更新、用户手册更新
上线部署	运维	数据库迁移、服务部署、监控配置

开发时间线总览

Week 1  ────┬─── Phase 1: AI引擎查询分析
Week 2  ────┤
Week 3  ────┼─── Phase 1: 引用模式识别+定时检测
Week 4  ────┤
Week 5  ────┼─── Phase 2: 策略生成+效果报告
Week 6  ────┤
Week 7  ────┤    Phase 2: 集成测试+优化
Week 8  ────┼─── Phase 3: Workflow引擎
Week 9  ────┤
Week 10 ────┤    Phase 3: 内容自动优化
Week 11 ────┼─── Phase 4: 全面测试+上线
Week 12 ────┘

第四部分：技术架构

4.1 新增模块架构

backend/app/
├── services/
│   ├── ai_engine_query.py          # AI引擎查询服务（批量并行）
│   ├── citation_pattern.py         # 引用模式识别引擎
│   ├── detection_scheduler.py      # 增强版检测任务调度
│   ├── geo_report.py               # GEO报告生成服务
│   ├── workflow_engine.py          # GEO Workflow引擎
│   └── content_optimizer.py        # 内容自动优化服务
├── workers/
│   ├── platforms/
│   │   ├── base.py                 # 适配器基类（扩展为EnhancedPlatformAdapter）
│   │   ├── chatgpt.py              # ChatGPT适配器 [新增]
│   │   ├── perplexity.py           # Perplexity适配器 [新增]
│   │   ├── kimi.py                 # Kimi适配器（已有）
│   │   ├── wenxin.py               # 文心一言适配器（已有）
│   │   ├── tongyi.py               # 通义千问适配器（已有）
│   │   ├── doubao.py               # 豆包适配器（已有）
│   │   ├── qingyan.py              # 轻颜适配器（已有）
│   │   ├── tiangong.py             # 天工适配器（已有）
│   │   └── xinghuo.py              # 星火适配器（已有）
│   └── citation_engine.py          # 引用检测引擎（扩展）
├── models/
│   ├── ai_query_result.py          # 查询结果模型 [新增]
│   ├── citation_pattern.py         # 引用模式模型 [新增]
│   ├── geo_report.py               # 报告模型 [新增]
│   ├── workflow.py                 # Workflow模型 [新增]
│   └── content_optimization.py     # 内容优化模型 [新增]
├── schemas/
│   ├── ai_query.py                 # 查询相关Schema [新增]
│   ├── citation_pattern.py         # 引用模式Schema [新增]
│   ├── geo_report.py               # 报告Schema [新增]
│   ├── workflow.py                 # Workflow Schema [新增]
│   └── content_optimization.py     # 内容优化Schema [新增]
└── api/
    ├── ai_engines.py               # AI引擎API [新增]
    ├── citation_patterns.py        # 引用模式API [新增]
    ├── detection.py                # 检测任务API [新增]
    ├── reports.py                  # 报告API [扩展]
    ├── workflows.py                # Workflow API [新增]
    └── content_optimization.py     # 内容优化API [新增]

4.2 数据流架构

品牌画像 ──→ 查询词 ──→ AI引擎查询 ──→ 引用模式识别 ──→ 策略生成 ──→ 内容优化
   ↑                                                              ↓
   │                                                         效果即时验证
   │                                                              ↓
   └──────────── 反馈闭环 ←── 效果报告 ←── 定期监测 ←─── 验证结果

详细数据流：

┌──────────────┐     ┌──────────────┐     ┌──────────────────┐
│  Brand       │────→│  Query       │────→│  AIQueryResult   │
│  (品牌画像)   │     │  (查询词)     │     │  (查询结果)       │
└──────────────┘     └──────────────┘     └────────┬─────────┘
                                                   │
                                                   ▼
┌──────────────┐     ┌──────────────┐     ┌──────────────────┐
│  Suggestion  │←────│  Citation    │←────│  CitationPattern │
│  (优化建议)   │     │  Record      │     │  (引用模式)       │
└──────┬───────┘     └──────────────┘     └──────────────────┘
       │
       ▼
┌──────────────────┐     ┌──────────────┐     ┌──────────────┐
│  ContentOptim    │────→│  Verification│────→│  Detection    │
│  Task (优化任务)  │     │  (即时验证)   │     │  Task (定期监测)│
└──────────────────┘     └──────┬───────┘     └──────┬───────┘
                                │                     │
                                ▼                     ▼
                         ┌──────────────────────────────┐
                         │     GEOResultReport          │
                         │     (GEO效果报告)              │
                         └──────────────┬───────────────┘
                                        │
                                        ▼
                                 ┌──────────────┐
                                 │  新一轮       │
                                 │  Suggestion   │
                                 │  (迭代优化)    │
                                 └──────────────┘

4.3 与现有系统的集成点

新模块	集成点	集成方式
AI引擎查询服务	CitationEngine	扩展：增加批量查询和结构化响应
引用模式识别	citation_extractor	扩展：从单次提取升级为模式分析
检测任务调度	QueryScheduler	扩展：增加自定义频率和智能调度
GEO报告	ScoringService	扩展：增加趋势计算和报告生成
策略生成	OptimizationAdvisor	扩展：增加引用模式输入源
Workflow引擎	PipelineEngine	扩展：从内容生成Pipeline升级为通用Workflow
内容优化	ContentPipeline	扩展：增加Schema/FAQ生成器

4.4 数据库迁移计划

Phase	迁移脚本	新增表	修改表
Phase 1	`add_ai_query_results_table.py`	ai_query_results	-
Phase 1	`add_citation_patterns_table.py`	citation_patterns	-
Phase 1	`add_detection_config_to_queries.py`	-	queries（新增detection_config字段）
Phase 2	`add_geo_result_reports_table.py`	geo_result_reports	-
Phase 2	`add_brand_profile_data.py`	-	brands（新增profile_data字段）
Phase 3	`add_workflow_tables.py`	workflow_definitions, workflow_executions	-
Phase 3	`add_content_optimization_tasks_table.py`	content_optimization_tasks	-

4.5 前端新增页面

Phase	页面	路由	说明
Phase 1	AI引擎查询结果	`/dashboard/ai-queries`	查询结果表格+详情弹窗
Phase 1	引用模式分析	`/dashboard/citation-patterns`	模式可视化+引擎偏好对比
Phase 1	检测任务配置	`/dashboard/detection`	任务创建/编辑/触发
Phase 2	效果报告	`/dashboard/reports`	趋势图表+竞品对比+AI解读
Phase 2	策略详情	`/dashboard/strategy`	策略卡片+引用模式关联
Phase 3	Workflow配置	`/dashboard/workflows`	可视化Workflow编辑器
Phase 3	Workflow监控	`/dashboard/workflows/[id]`	执行状态+步骤进度+日志
Phase 3	内容优化	`/dashboard/content-optimization`	优化任务管理+审核发布

51 KiB Raw Blame History Unescape Escape

GEO Workflow 业务分析与开发规划

第一部分：GEO Workflow 业务合理性分析

1.1 当前流程问题诊断

1.2 优化后的流程设计（10步闭环）

步骤1：品牌画像构建

步骤2：竞品基准设定

步骤3：AI引擎查询分析

步骤4：引用模式识别

步骤5：优化策略生成

步骤6：内容优化执行

步骤7：效果即时验证

步骤8：定期监测

步骤9：GEO效果报告

步骤10：迭代优化（反馈闭环）

1.3 与行业最佳实践对比

第二部分：缺失节点必要性与技术方案

2.1 节点优先级矩阵

2.2 各节点详细技术方案

节点1：AI引擎查询分析

节点2：引用模式识别

节点3：定时自动检测

节点4：GEO效果报告

节点5：基于AI分析的策略生成

节点6：Workflow引擎

节点7：网站内容自动优化

第三部分：开发计划

Phase 1 - 核心能力建设（4周）

Week 1-2：AI引擎查询分析

Week 3-4：引用模式识别+定时检测

Phase 2 - 策略与报告（3周）

Week 5-6：策略生成+效果报告

Week 7：集成测试+优化

Phase 3 - 自动化引擎（3周）

Week 8-9：Workflow引擎

Week 10：内容自动优化

Phase 4 - 打磨上线（2周）

Week 11-12：全面测试+上线准备

开发时间线总览

第四部分：技术架构

4.1 新增模块架构

4.2 数据流架构

4.3 与现有系统的集成点

4.4 数据库迁移计划

4.5 前端新增页面

51 KiB

Raw Blame History