# 知识图谱实体类型定义

## 概述

本文档描述知识图谱中使用的实体类型和关系类型。

## 实体类型 (EntityType)

位置：`backend/app/models/knowledge_graph.py`

### 类型定义

| 类型 | 枚举值 | 说明 | 示例 |
|------|--------|------|------|
| ORGANIZATION | organization | 公司/组织 | 腾讯、阿里巴巴 |
| PRODUCT | product | 产品 | 微信、王者荣耀 |
| PERSON | person | 人物 | 马化腾、张小龙 |
| LOCATION | location | 地点 | 深圳、广州 |
| TECHNOLOGY | technology | 技术 | AI、区块链 |
| BRAND | brand | 品牌 | 微信支付、腾讯云 |
| EVENT | event | 事件 | 2020腾讯年会 |
| CONCEPT | concept | 概念 | 数字化转型 |

### 属性说明

```python
class Entity:
    id: str              # UUID
    name: str            # 实体名称
    entity_type: str     # 实体类型
    description: str     # 描述
    properties: dict     # 自定义属性
    confidence: str      # 置信度 (high/medium/low)
    source_chunk_id: str # 来源Chunk ID
```

## 关系类型 (RelationType)

位置：`backend/app/models/knowledge_graph.py`

### 类型定义

| 类型 | 枚举值 | 说明 | 示例 |
|------|--------|------|------|
| COMPETES_WITH | competes_with | 竞争对手 | 微信 ↔ 支付宝 |
| PARTNERS_WITH | partners_with | 合作伙伴 | 腾讯 ↔ 京东 |
| PRODUCES | produces | 生产 | 苹果 ↔ iPhone |
| USES_TECHNOLOGY | uses_technology | 使用技术 | 微信 ↔ AI |
| LOCATED_IN | located_in | 位于 | 腾讯 ↔ 深圳 |
| FOUNDED_IN | founded_in | 成立于 | 腾讯 ↔ 1998 |
| CEO_OF | ceo_of | CEO | 马化腾 ↔ 腾讯 |
| FOUNDER_OF | founder_of | 创始人 | 马化腾 ↔ 腾讯 |
| RELATED_TO | related_to | 相关 | AI ↔ 机器学习 |
| PART_OF | part_of | 属于 | iPhone ↔ 苹果 |

### 属性说明

```python
class Relation:
    id: str              # UUID
    source_entity_id: str    # 源实体ID
    target_entity_id: str   # 目标实体ID
    relation_type: str   # 关系类型
    properties: dict     # 自定义属性
    confidence: str      # 置信度 (high/medium/low)
    source_chunk_id: str # 来源Chunk ID
```

## 抽取流程

位置：`backend/app/services/knowledge/entity_extractor.py`

### EntityExtractor

```python
class EntityExtractor:
    ENTITY_TYPES = [
        "ORGANIZATION",
        "PRODUCT",
        "PERSON",
        "LOCATION",
        "TECHNOLOGY",
        "BRAND",
        "EVENT",
        "CONCEPT",
    ]

    RELATION_TYPES = [
        "COMPETES_WITH",
        "PARTNERS_WITH",
        "PRODUCES",
        "USES_TECHNOLOGY",
        "LOCATED_IN",
        "FOUNDED_IN",
        "CEO_OF",
        "FOUNDER_OF",
        "RELATED_TO",
        "PART_OF",
    ]

    async def extract(text: str, context: str) -> ExtractionResult:
        """
        从文本中抽取实体和关系
        1. 构建抽取Prompt
        2. 调用LLM
        3. 解析返回结果
        """
```

### 抽取Prompt示例

```
从以下文本中抽取知识图谱的实体和关系。

实体类型：
- ORGANIZATION (公司/组织)
- PRODUCT (产品)
- PERSON (人物)
...

关系类型：
- COMPETES_WITH (竞争对手)
- PARTNERS_WITH (合作伙伴)
...

文本内容：
{text}

请以JSON格式返回结果：
{
    "entities": [
        {"name": "实体名称", "entity_type": "类型", "confidence": "high/medium/low"}
    ],
    "relations": [
        {"source_entity": "源", "target_entity": "目标", "relation_type": "类型", "confidence": "high/medium/low"}
    ]
}
```

## 图谱构建

位置：`backend/app/services/knowledge/graph_builder.py`

### GraphBuilder

```python
class GraphBuilder:
    async def build_from_chunk(
        session: AsyncSession,
        chunk_id: str,
        context: str = None,
    ) -> dict:
        """
        从Chunk构建知识图谱
        1. 获取Chunk内容
        2. 调用EntityExtractor抽取
        3. 存储到图谱
        """
```

### 构建统计

返回统计信息：

```python
{
    "entities_created": 5,      # 新建实体数
    "entities_existing": 2,      # 已存在实体数
    "relations_created": 3,     # 新建关系数
    "relations_existing": 1,    # 已存在关系数
}
```

## 图谱查询

### 实体查询

```python
# 查询实体
GET /api/v1/knowledge-graph/entities?type=brand&limit=20

# 响应
{
    "entities": [
        {
            "id": "uuid",
            "name": "微信",
            "type": "brand",
            "description": "...",
            "properties": {...}
        }
    ]
}
```

### 关系查询

```python
# 查询关系
GET /api/v1/knowledge-graph/relations?source_id=xxx&type=competes_with

# 响应
{
    "relations": [
        {
            "id": "uuid",
            "source_id": "xxx",
            "target_id": "yyy",
            "type": "competes_with",
            "properties": {...}
        }
    ]
}
```

### 语义搜索

```python
# 语义搜索
GET /api/v1/knowledge-graph/search?query=微信的竞争对手

# 响应
{
    "results": [
        {"entity": {...}, "score": 0.95},
        {"entity": {...}, "score": 0.87}
    ]
}
```

## 置信度评估

| 级别 | 说明 | 使用场景 |
|------|------|----------|
| high | 高置信度 | 直接使用 |
| medium | 中置信度 | 建议人工审核 |
| low | 低置信度 | 仅供参考 |

评估因素：
- 文本中提及的明确性
- 上下文的支持程度
- LLM模型的确定性