feat(v0.7.0): 验收通过 - Dashboard统计增强、流程测试、对话追踪

验收通过的标准:
- AC-ASA-59~64: 前端话术流程和护栏监控功能验收
- AC-AISVC-91~95: Dashboard统计增强和完整流程测试验收
- AC-AISVC-108~110: 对话追踪和导出功能验收

修复问题:
- flow_test.py: 修复OrchestratorService导入和调用
- 前后端字段不一致: orderstep_no, wait_for_inputwait_input
- 数据库迁移: 添加chat_messages缺失的监控字段

新增文件:
- ai-service/app/api/admin/flow_test.py
- ai-service/scripts/migrations/add_chat_message_fields.py
- ai-service-admin/src/views/admin/prompt-template/components/VariableManager.vue
This commit is contained in:
MerCry 2026-02-28 12:52:50 +08:00
parent 5fbb72aa82
commit 6b21ba8351
84 changed files with 9933 additions and 855 deletions

View File

@ -188,6 +188,25 @@
- 对话追踪页面
- 监控导航路由
### 会话 3 (2026-02-28) - 验收会话
- 完成: 前端验收标准 AC-ASA-59 ~ AC-ASA-64 验收通过
- AC-ASA-59: 流程模拟对话框 - 步骤可视化 ✅
- AC-ASA-60: 话术流程监控页面 - 流程激活统计 ✅
- AC-ASA-61: 流程执行记录详情弹窗 - 分页支持 ✅
- AC-ASA-62: 护栏测试对话框 - 禁词检测结果 ✅
- AC-ASA-63: 输出护栏监控页面 - 护栏拦截统计 ✅
- AC-ASA-64: 护栏拦截记录详情弹窗 - 分页支持 ✅
- 完成: 后端验收标准 AC-AISVC-91 ~ AC-AISVC-95, AC-AISVC-108 ~ AC-AISVC-110 验收通过
- AC-AISVC-91/92: Dashboard统计增强 - 四个监控统计卡片+时间筛选 ✅
- AC-AISVC-93/94/95: 完整流程测试 - 12步执行时间线+步骤详情 ✅
- AC-AISVC-108/109/110: 对话追踪 - 列表+详情+导出 ✅
- 修复问题:
- flow_test.py 导入错误: Orchestrator → OrchestratorService
- flow_test.py ChatRequest 导入路径修正
- flow_test.py ChatResponse.sources 属性不存在
- 数据库迁移: 创建 add_chat_message_fields.py 添加缺失字段
- 前后端字段不一致: order → step_no, wait_for_input → wait_input
## 7. 下一步行动
**任务已全部完成**

2
.gitignore vendored
View File

@ -165,3 +165,5 @@ ai-service/uploads/
ai-service/config/
*.local
/.trae/
/.claude/

View File

@ -6,10 +6,15 @@
- `spec/<module>/requirements.md`(当前模块需求)
- `spec/<module>/openapi.provider.yaml`(本模块提供)
- `spec/<module>/openapi.deps.yaml`(本模块依赖,如存在)
- **版本化迭代规则**CRITICAL
- 读取 `requirements.md` 的 frontmatter识别 `active_version` 字段(如 `”0.6.0-0.7.0”`
- **仅关注活跃版本的 AC**:历史版本(折叠在 `<details>` 中)的 AC 可跳过,不影响当前实现
- 在代码注释、测试用例、commit message 中引用的 AC 编号,必须在活跃版本范围内
- 若需要追加新需求,按 `docs/spec-product-zh.md` 第 4 节”版本化迭代规则”执行
- **长会话/复杂任务接续**:若当前任务满足 `docs/session-handoff-protocol.md` 中的触发条件,**必须**先读取并持续更新 `docs/progress/{module}-{feature}-progress.md`
- 若上述任一文档缺失、冲突或内容不明确:
- **禁止开始实现**
- 必须在 `spec/<module>/tasks.md` 记录“待澄清”并停止
- 必须在 `spec/<module>/tasks.md` 记录待澄清”并停止
## 1. 提交与同步Git Cadence必须
- **提交粒度**

View File

@ -48,7 +48,8 @@ export interface PromptTemplateCreate {
name: string
scene: string
description?: string
content: string
system_instruction: string
variables?: PromptVariable[]
is_default?: boolean
}
@ -56,7 +57,8 @@ export interface PromptTemplateUpdate {
name?: string
scene?: string
description?: string
content: string
system_instruction?: string
variables?: PromptVariable[]
}
export interface PromptTemplateListResponse {

View File

@ -21,9 +21,9 @@ export interface ScriptFlowDetail {
export interface FlowStep {
step_id: string
order: number
step_no: number
content: string
wait_for_input: boolean
wait_input: boolean
timeout_seconds?: number
timeout_action?: 'repeat' | 'skip' | 'transfer'
next_conditions?: NextCondition[]

View File

@ -14,7 +14,7 @@
<h4>变量设置</h4>
<el-form label-width="100px" size="small">
<el-form-item
v-for="variable in variables"
v-for="variable in allVariables"
:key="variable.name"
:label="variable.name"
>
@ -102,9 +102,9 @@
<script setup lang="ts">
import { ref, watch, computed } from 'vue'
import { ElMessage } from 'element-plus'
import { previewPromptTemplate, type PromptPreviewResponse } from '@/api/monitoring'
import type { PromptVariable } from '@/types/prompt-template'
import { BUILTIN_VARIABLES } from '@/types/prompt-template'
const props = defineProps<{
visible: boolean
@ -116,11 +116,26 @@ const emit = defineEmits<{
'update:visible': [value: boolean]
}>()
const allVariables = computed(() => {
const builtin = BUILTIN_VARIABLES.map(v => ({
name: v.name,
description: v.description,
default_value: v.default_value || ''
}))
const custom = (props.templateVariables || []).map(v => ({
name: v.name,
description: v.description,
default_value: v.default_value || ''
}))
return [...builtin, ...custom]
})
const variables = computed(() => props.templateVariables || [])
const variableValues = ref<Record<string, string>>({})
const sampleMessage = ref('')
const previewing = ref(false)
const previewResult = ref<PromptPreviewResponse | null>(null)
let debounceTimer: ReturnType<typeof setTimeout> | null = null
watch(() => props.visible, (val) => {
if (val) {
@ -128,13 +143,19 @@ watch(() => props.visible, (val) => {
sampleMessage.value = ''
previewResult.value = null
variables.value.forEach(v => {
allVariables.value.forEach(v => {
variableValues.value[v.name] = v.default_value || ''
})
}
})
const handleVariableChange = () => {
if (debounceTimer) {
clearTimeout(debounceTimer)
}
debounceTimer = setTimeout(() => {
handlePreview()
}, 500)
}
const handlePreview = async () => {
@ -144,9 +165,7 @@ const handlePreview = async () => {
variables: variableValues.value,
sampleMessage: sampleMessage.value || undefined
})
ElMessage.success('预览生成成功')
} catch (error) {
ElMessage.error('预览生成失败')
previewResult.value = null
} finally {
previewing.value = false

View File

@ -34,7 +34,7 @@
<el-table :data="template.variables" size="small" border>
<el-table-column prop="name" label="变量名" width="180">
<template #default="{ row }">
<code class="var-code">{{ '{{' + row.name + '}}' }}</code>
<code class="var-code">{{ getVarSyntax(row.name) }}</code>
</template>
</el-table-column>
<el-table-column prop="description" label="描述" />
@ -109,6 +109,10 @@ const getSceneLabel = (scene: string) => {
return opt?.label || scene
}
const getVarSyntax = (name: string) => {
return `{{${name}}}`
}
const formatDate = (dateStr: string) => {
if (!dateStr) return ''
const date = new Date(dateStr)

View File

@ -0,0 +1,283 @@
<template>
<div class="variable-manager">
<div class="manager-header">
<span class="header-title">自定义变量</span>
<el-button type="primary" size="small" @click="handleAdd">
<el-icon><Plus /></el-icon>
添加变量
</el-button>
</div>
<div class="variable-list" v-if="localVariables.length > 0">
<div
v-for="(variable, index) in localVariables"
:key="index"
class="variable-item"
>
<div class="variable-content">
<div class="variable-name">
<code>{{ getVarSyntax(variable.name) }}</code>
</div>
<div class="variable-info">
<span class="info-label">默认值:</span>
<span class="info-value">{{ variable.default_value || '-' }}</span>
</div>
<div class="variable-info" v-if="variable.description">
<span class="info-label">描述:</span>
<span class="info-value">{{ variable.description }}</span>
</div>
</div>
<div class="variable-actions">
<el-button type="primary" link size="small" @click="handleEdit(index)">
<el-icon><Edit /></el-icon>
</el-button>
<el-button type="danger" link size="small" @click="handleDelete(index)">
<el-icon><Delete /></el-icon>
</el-button>
</div>
</div>
</div>
<el-empty v-else description="暂无自定义变量" :image-size="80" />
<el-dialog
v-model="dialogVisible"
:title="isEdit ? '编辑变量' : '添加变量'"
width="500px"
:close-on-click-modal="false"
>
<el-form :model="formData" :rules="formRules" ref="formRef" label-width="80px">
<el-form-item label="变量名" prop="name">
<el-input
v-model="formData.name"
placeholder="例如: company_name"
:disabled="isEdit"
>
<template #prepend><span v-text="'{{'"></span></template>
<template #append><span v-text="'}}'"></span></template>
</el-input>
<div class="form-tip">只能包含字母数字和下划线</div>
</el-form-item>
<el-form-item label="默认值" prop="default_value">
<el-input
v-model="formData.default_value"
placeholder="请输入默认值"
type="textarea"
:rows="2"
/>
</el-form-item>
<el-form-item label="描述">
<el-input
v-model="formData.description"
placeholder="请输入变量描述(可选)"
type="textarea"
:rows="2"
/>
</el-form-item>
</el-form>
<template #footer>
<el-button @click="dialogVisible = false">取消</el-button>
<el-button type="primary" @click="handleSubmit">确定</el-button>
</template>
</el-dialog>
</div>
</template>
<script setup lang="ts">
import { ref, watch } from 'vue'
import { ElMessage } from 'element-plus'
import { Plus, Edit, Delete } from '@element-plus/icons-vue'
import type { PromptVariable } from '@/types/prompt-template'
const props = defineProps<{
modelValue: PromptVariable[]
}>()
const emit = defineEmits<{
(e: 'update:modelValue', value: PromptVariable[]): void
(e: 'insert', varName: string): void
}>()
const localVariables = ref<PromptVariable[]>([...props.modelValue])
const dialogVisible = ref(false)
const isEdit = ref(false)
const currentIndex = ref(-1)
const formRef = ref()
const formData = ref<PromptVariable>({
name: '',
default_value: '',
description: ''
})
const validateVarName = (rule: any, value: string, callback: any) => {
if (!value) {
callback(new Error('请输入变量名'))
return
}
if (!/^[a-zA-Z_][a-zA-Z0-9_]*$/.test(value)) {
callback(new Error('变量名只能包含字母、数字和下划线,且不能以数字开头'))
return
}
const exists = localVariables.value.some(
(v, idx) => v.name === value && idx !== currentIndex.value
)
if (exists) {
callback(new Error('变量名已存在'))
return
}
callback()
}
const formRules = {
name: [{ required: true, validator: validateVarName, trigger: 'blur' }],
default_value: [{ required: true, message: '请输入默认值', trigger: 'blur' }]
}
const getVarSyntax = (name: string) => {
return `{{${name}}}`
}
const handleAdd = () => {
isEdit.value = false
currentIndex.value = -1
formData.value = {
name: '',
default_value: '',
description: ''
}
dialogVisible.value = true
}
const handleEdit = (index: number) => {
isEdit.value = true
currentIndex.value = index
formData.value = { ...localVariables.value[index] }
dialogVisible.value = true
}
const handleDelete = (index: number) => {
localVariables.value.splice(index, 1)
emit('update:modelValue', localVariables.value)
ElMessage.success('删除成功')
}
const handleSubmit = async () => {
try {
await formRef.value.validate()
} catch {
return
}
if (isEdit.value) {
localVariables.value[currentIndex.value] = { ...formData.value }
ElMessage.success('修改成功')
} else {
localVariables.value.push({ ...formData.value })
ElMessage.success('添加成功')
}
emit('update:modelValue', localVariables.value)
dialogVisible.value = false
}
watch(
() => props.modelValue,
(newVal) => {
localVariables.value = [...newVal]
},
{ deep: true }
)
</script>
<style scoped>
.variable-manager {
border: 1px solid var(--el-border-color-light);
border-radius: 6px;
overflow: hidden;
}
.manager-header {
display: flex;
justify-content: space-between;
align-items: center;
padding: 12px 16px;
background-color: var(--el-fill-color-light);
border-bottom: 1px solid var(--el-border-color-light);
}
.header-title {
font-size: 14px;
font-weight: 600;
color: var(--el-text-color-primary);
}
.variable-list {
max-height: 300px;
overflow-y: auto;
}
.variable-item {
display: flex;
justify-content: space-between;
align-items: flex-start;
padding: 12px 16px;
border-bottom: 1px solid var(--el-border-color-extra-light);
transition: background-color 0.2s;
}
.variable-item:last-child {
border-bottom: none;
}
.variable-item:hover {
background-color: var(--el-fill-color-extra-light);
}
.variable-content {
flex: 1;
min-width: 0;
}
.variable-name {
margin-bottom: 6px;
}
.variable-name code {
font-family: monospace;
font-size: 13px;
color: var(--el-color-primary);
background-color: var(--el-color-primary-light-9);
padding: 2px 6px;
border-radius: 3px;
}
.variable-info {
display: flex;
gap: 8px;
margin-bottom: 4px;
font-size: 12px;
}
.info-label {
color: var(--el-text-color-secondary);
flex-shrink: 0;
}
.info-value {
color: var(--el-text-color-regular);
word-break: break-all;
}
.variable-actions {
display: flex;
gap: 4px;
margin-left: 12px;
}
.form-tip {
font-size: 12px;
color: var(--el-text-color-placeholder);
margin-top: 4px;
}
</style>

View File

@ -100,11 +100,11 @@
<el-form-item label="描述">
<el-input v-model="formData.description" type="textarea" :rows="2" placeholder="请输入模板描述(可选)" />
</el-form-item>
<el-form-item label="系统指令" prop="content">
<el-form-item label="系统指令" prop="system_instruction">
<div class="content-editor">
<div class="editor-main">
<el-input
v-model="formData.content"
v-model="formData.system_instruction"
type="textarea"
:rows="12"
placeholder="请输入系统指令内容,支持 {{variable}} 变量语法"
@ -207,14 +207,14 @@ const formData = ref<PromptTemplateCreate>({
name: '',
scene: '',
description: '',
content: '',
system_instruction: '',
variables: []
})
const formRules = {
name: [{ required: true, message: '请输入模板名称', trigger: 'blur' }],
scene: [{ required: true, message: '请选择场景', trigger: 'change' }],
content: [{ required: true, message: '请输入系统指令内容', trigger: 'blur' }]
system_instruction: [{ required: true, message: '请输入系统指令内容', trigger: 'blur' }]
}
const getSceneLabel = (scene: string) => {
@ -269,7 +269,7 @@ const handleCreate = () => {
name: '',
scene: '',
description: '',
content: '',
system_instruction: '',
variables: []
}
dialogVisible.value = true
@ -284,7 +284,7 @@ const handleEdit = async (row: PromptTemplate) => {
name: detail.name,
scene: detail.scene,
description: detail.description || '',
content: detail.current_content || '',
system_instruction: detail.current_content || '',
variables: detail.variables || []
}
dialogVisible.value = true
@ -365,7 +365,7 @@ const handleSubmit = async () => {
name: formData.value.name,
scene: formData.value.scene,
description: formData.value.description,
content: formData.value.content,
system_instruction: formData.value.system_instruction,
variables: formData.value.variables
}
await updatePromptTemplate(currentEditId.value, updateData)
@ -388,15 +388,15 @@ const insertVariable = (varName: string) => {
if (textarea) {
const start = textarea.selectionStart
const end = textarea.selectionEnd
const text = formData.value.content || ''
const text = formData.value.system_instruction || ''
const insertText = `{{${varName}}}`
formData.value.content = text.substring(0, start) + insertText + text.substring(end)
formData.value.system_instruction = text.substring(0, start) + insertText + text.substring(end)
setTimeout(() => {
textarea.focus()
textarea.setSelectionRange(start + insertText.length, start + insertText.length)
}, 0)
} else {
formData.value.content = (formData.value.content || '') + `{{${varName}}}`
formData.value.system_instruction = (formData.value.system_instruction || '') + `{{${varName}}}`
}
}

View File

@ -16,12 +16,12 @@
<div class="step-card" :class="{ active: index === currentStep }">
<div class="step-header">
<span class="step-number">步骤 {{ index + 1 }}</span>
<el-tag v-if="step.wait_for_input" size="small" type="info">等待输入</el-tag>
<el-tag v-if="step.wait_input" size="small" type="info">等待输入</el-tag>
</div>
<div class="step-content">
<pre>{{ step.content }}</pre>
</div>
<div class="step-config" v-if="step.wait_for_input">
<div class="step-config" v-if="step.wait_input">
<div class="config-item">
<span class="label">超时时间:</span>
<span class="value">{{ step.timeout_seconds }}</span>

View File

@ -118,15 +118,15 @@
<el-row :gutter="16">
<el-col :span="8">
<el-form-item label="等待输入">
<el-switch v-model="element.wait_for_input" />
<el-switch v-model="element.wait_input" />
</el-form-item>
</el-col>
<el-col :span="8" v-if="element.wait_for_input">
<el-col :span="8" v-if="element.wait_input">
<el-form-item label="超时(秒)">
<el-input-number v-model="element.timeout_seconds" :min="5" :max="300" />
</el-form-item>
</el-col>
<el-col :span="8" v-if="element.wait_for_input">
<el-col :span="8" v-if="element.wait_input">
<el-form-item label="超时动作">
<el-select v-model="element.timeout_action" style="width: 100%;">
<el-option
@ -307,9 +307,9 @@ const handleSimulate = (row: ScriptFlow) => {
const addStep = () => {
formData.value.steps.push({
step_id: generateStepId(),
order: formData.value.steps.length + 1,
step_no: formData.value.steps.length + 1,
content: '',
wait_for_input: true,
wait_input: true,
timeout_seconds: 30,
timeout_action: 'repeat',
next_conditions: []
@ -319,7 +319,7 @@ const addStep = () => {
const removeStep = (index: number) => {
formData.value.steps.splice(index, 1)
formData.value.steps.forEach((step, i) => {
step.order = i + 1
step.step_no = i + 1
})
}
@ -341,7 +341,7 @@ const handleSubmit = async () => {
...formData.value,
steps: formData.value.steps.map((step, index) => ({
...step,
order: index + 1
step_no: index + 1
}))
}

View File

@ -71,7 +71,7 @@ async def list_api_keys(
"""
service = get_api_key_service()
keys = await service.list_keys(session)
return ApiKeyListResponse(
keys=[api_key_to_response(k) for k in keys],
total=len(keys),
@ -87,18 +87,18 @@ async def create_api_key(
[AC-AISVC-50] Create a new API key.
"""
service = get_api_key_service()
key_value = request.key or service.generate_key()
key_create = ApiKeyCreate(
key=key_value,
name=request.name,
is_active=True,
)
api_key = await service.create_key(session, key_create)
logger.info(f"[AC-AISVC-50] Created API key: {api_key.name}")
return api_key_to_response(api_key)
@ -111,9 +111,9 @@ async def delete_api_key(
[AC-AISVC-50] Delete an API key.
"""
service = get_api_key_service()
deleted = await service.delete_key(session, key_id)
if not deleted:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
@ -131,15 +131,15 @@ async def toggle_api_key(
[AC-AISVC-50] Toggle API key active status.
"""
service = get_api_key_service()
api_key = await service.toggle_key(session, key_id, request.is_active)
if not api_key:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail="API key not found",
)
return api_key_to_response(api_key)

View File

@ -1,14 +1,16 @@
"""
Dashboard statistics endpoints.
Provides overview statistics for the admin dashboard.
[AC-AISVC-91, AC-AISVC-92] Provides overview statistics for the admin dashboard.
Enhanced with monitoring metrics for intent rules, templates, flows, and guardrails.
"""
import logging
from typing import Annotated
from datetime import datetime, timedelta
from typing import Annotated, Any
from fastapi import APIRouter, Depends, Query
from fastapi.responses import JSONResponse
from sqlalchemy import select, func, desc
from sqlalchemy import desc, func, select
from sqlalchemy.ext.asyncio import AsyncSession
from app.core.database import get_session
@ -16,6 +18,8 @@ from app.core.exceptions import MissingTenantIdException
from app.core.tenant import get_tenant_id
from app.models import ErrorResponse
from app.models.entities import ChatMessage, ChatSession, Document, KnowledgeBase
from app.services.monitoring.cache import get_monitoring_cache
from app.services.monitoring.dashboard_service import DashboardService
logger = logging.getLogger(__name__)
@ -32,11 +36,21 @@ def get_current_tenant_id() -> str:
return tenant_id
def parse_date_param(date_str: str | None) -> datetime | None:
"""Parse ISO 8601 date string to datetime."""
if not date_str:
return None
try:
return datetime.fromisoformat(date_str.replace("Z", "+00:00"))
except ValueError:
return None
@router.get(
"/stats",
operation_id="getDashboardStats",
summary="Get dashboard statistics",
description="Get overview statistics for the admin dashboard.",
description="[AC-AISVC-91, AC-AISVC-92] Get overview statistics for the admin dashboard with enhanced monitoring metrics.",
responses={
200: {"description": "Dashboard statistics"},
401: {"description": "Unauthorized", "model": ErrorResponse},
@ -47,11 +61,21 @@ async def get_dashboard_stats(
tenant_id: Annotated[str, Depends(get_current_tenant_id)],
session: Annotated[AsyncSession, Depends(get_session)],
latency_threshold: int = Query(default=LATENCY_THRESHOLD_MS, description="Latency threshold in ms"),
start_date: str | None = Query(default=None, description="Start date filter (ISO 8601)"),
end_date: str | None = Query(default=None, description="End date filter (ISO 8601)"),
include_enhanced: bool = Query(default=True, description="Include enhanced monitoring stats"),
) -> JSONResponse:
"""
Get dashboard statistics including knowledge bases, messages, and activity.
[AC-AISVC-91, AC-AISVC-92] Get dashboard statistics including:
- Basic stats: knowledge bases, messages, documents, sessions
- Token statistics
- Latency statistics
- Enhanced stats (v0.7.0): intent rules, templates, flows, guardrails
"""
logger.info(f"Getting dashboard stats: tenant={tenant_id}")
logger.info(f"Getting dashboard stats: tenant={tenant_id}, start={start_date}, end={end_date}")
start_dt = parse_date_param(start_date)
end_dt = parse_date_param(end_date)
kb_count_stmt = select(func.count()).select_from(KnowledgeBase).where(
KnowledgeBase.tenant_id == tenant_id
@ -62,6 +86,10 @@ async def get_dashboard_stats(
msg_count_stmt = select(func.count()).select_from(ChatMessage).where(
ChatMessage.tenant_id == tenant_id
)
if start_dt:
msg_count_stmt = msg_count_stmt.where(ChatMessage.created_at >= start_dt)
if end_dt:
msg_count_stmt = msg_count_stmt.where(ChatMessage.created_at <= end_dt)
msg_result = await session.execute(msg_count_stmt)
msg_count = msg_result.scalar() or 0
@ -74,24 +102,40 @@ async def get_dashboard_stats(
session_count_stmt = select(func.count()).select_from(ChatSession).where(
ChatSession.tenant_id == tenant_id
)
if start_dt:
session_count_stmt = session_count_stmt.where(ChatSession.created_at >= start_dt)
if end_dt:
session_count_stmt = session_count_stmt.where(ChatSession.created_at <= end_dt)
session_result = await session.execute(session_count_stmt)
session_count = session_result.scalar() or 0
total_tokens_stmt = select(func.coalesce(func.sum(ChatMessage.total_tokens), 0)).select_from(
ChatMessage
).where(ChatMessage.tenant_id == tenant_id)
if start_dt:
total_tokens_stmt = total_tokens_stmt.where(ChatMessage.created_at >= start_dt)
if end_dt:
total_tokens_stmt = total_tokens_stmt.where(ChatMessage.created_at <= end_dt)
total_tokens_result = await session.execute(total_tokens_stmt)
total_tokens = total_tokens_result.scalar() or 0
prompt_tokens_stmt = select(func.coalesce(func.sum(ChatMessage.prompt_tokens), 0)).select_from(
ChatMessage
).where(ChatMessage.tenant_id == tenant_id)
if start_dt:
prompt_tokens_stmt = prompt_tokens_stmt.where(ChatMessage.created_at >= start_dt)
if end_dt:
prompt_tokens_stmt = prompt_tokens_stmt.where(ChatMessage.created_at <= end_dt)
prompt_tokens_result = await session.execute(prompt_tokens_stmt)
prompt_tokens = prompt_tokens_result.scalar() or 0
completion_tokens_stmt = select(func.coalesce(func.sum(ChatMessage.completion_tokens), 0)).select_from(
ChatMessage
).where(ChatMessage.tenant_id == tenant_id)
if start_dt:
completion_tokens_stmt = completion_tokens_stmt.where(ChatMessage.created_at >= start_dt)
if end_dt:
completion_tokens_stmt = completion_tokens_stmt.where(ChatMessage.created_at <= end_dt)
completion_tokens_result = await session.execute(completion_tokens_stmt)
completion_tokens = completion_tokens_result.scalar() or 0
@ -99,6 +143,10 @@ async def get_dashboard_stats(
ChatMessage.tenant_id == tenant_id,
ChatMessage.role == "assistant"
)
if start_dt:
ai_requests_stmt = ai_requests_stmt.where(ChatMessage.created_at >= start_dt)
if end_dt:
ai_requests_stmt = ai_requests_stmt.where(ChatMessage.created_at <= end_dt)
ai_requests_result = await session.execute(ai_requests_stmt)
ai_requests_count = ai_requests_result.scalar() or 0
@ -109,6 +157,10 @@ async def get_dashboard_stats(
ChatMessage.role == "assistant",
ChatMessage.latency_ms.isnot(None)
)
if start_dt:
avg_latency_stmt = avg_latency_stmt.where(ChatMessage.created_at >= start_dt)
if end_dt:
avg_latency_stmt = avg_latency_stmt.where(ChatMessage.created_at <= end_dt)
avg_latency_result = await session.execute(avg_latency_stmt)
avg_latency_ms = float(avg_latency_result.scalar() or 0)
@ -127,6 +179,10 @@ async def get_dashboard_stats(
ChatMessage.latency_ms.isnot(None),
ChatMessage.latency_ms >= latency_threshold
)
if start_dt:
slow_requests_stmt = slow_requests_stmt.where(ChatMessage.created_at >= start_dt)
if end_dt:
slow_requests_stmt = slow_requests_stmt.where(ChatMessage.created_at <= end_dt)
slow_requests_result = await session.execute(slow_requests_stmt)
slow_requests_count = slow_requests_result.scalar() or 0
@ -135,6 +191,10 @@ async def get_dashboard_stats(
ChatMessage.role == "assistant",
ChatMessage.is_error == True
)
if start_dt:
error_requests_stmt = error_requests_stmt.where(ChatMessage.created_at >= start_dt)
if end_dt:
error_requests_stmt = error_requests_stmt.where(ChatMessage.created_at <= end_dt)
error_requests_result = await session.execute(error_requests_stmt)
error_requests_count = error_requests_result.scalar() or 0
@ -145,6 +205,10 @@ async def get_dashboard_stats(
ChatMessage.role == "assistant",
ChatMessage.latency_ms.isnot(None)
)
if start_dt:
p95_latency_stmt = p95_latency_stmt.where(ChatMessage.created_at >= start_dt)
if end_dt:
p95_latency_stmt = p95_latency_stmt.where(ChatMessage.created_at <= end_dt)
p95_latency_result = await session.execute(p95_latency_stmt)
p95_latency_ms = float(p95_latency_result.scalar() or 0)
@ -155,6 +219,10 @@ async def get_dashboard_stats(
ChatMessage.role == "assistant",
ChatMessage.latency_ms.isnot(None)
)
if start_dt:
p99_latency_stmt = p99_latency_stmt.where(ChatMessage.created_at >= start_dt)
if end_dt:
p99_latency_stmt = p99_latency_stmt.where(ChatMessage.created_at <= end_dt)
p99_latency_result = await session.execute(p99_latency_stmt)
p99_latency_ms = float(p99_latency_result.scalar() or 0)
@ -165,6 +233,10 @@ async def get_dashboard_stats(
ChatMessage.role == "assistant",
ChatMessage.latency_ms.isnot(None)
)
if start_dt:
min_latency_stmt = min_latency_stmt.where(ChatMessage.created_at >= start_dt)
if end_dt:
min_latency_stmt = min_latency_stmt.where(ChatMessage.created_at <= end_dt)
min_latency_result = await session.execute(min_latency_stmt)
min_latency_ms = float(min_latency_result.scalar() or 0)
@ -175,28 +247,58 @@ async def get_dashboard_stats(
ChatMessage.role == "assistant",
ChatMessage.latency_ms.isnot(None)
)
if start_dt:
max_latency_stmt = max_latency_stmt.where(ChatMessage.created_at >= start_dt)
if end_dt:
max_latency_stmt = max_latency_stmt.where(ChatMessage.created_at <= end_dt)
max_latency_result = await session.execute(max_latency_stmt)
max_latency_ms = float(max_latency_result.scalar() or 0)
return JSONResponse(
content={
"knowledgeBases": kb_count,
"totalMessages": msg_count,
"totalDocuments": doc_count,
"totalSessions": session_count,
"totalTokens": total_tokens,
"promptTokens": prompt_tokens,
"completionTokens": completion_tokens,
"aiRequestsCount": ai_requests_count,
"avgLatencyMs": round(avg_latency_ms, 2),
"lastLatencyMs": last_latency_ms,
"lastRequestTime": last_request_time,
"slowRequestsCount": slow_requests_count,
"errorRequestsCount": error_requests_count,
"p95LatencyMs": round(p95_latency_ms, 2),
"p99LatencyMs": round(p99_latency_ms, 2),
"minLatencyMs": round(min_latency_ms, 2),
"maxLatencyMs": round(max_latency_ms, 2),
"latencyThresholdMs": latency_threshold,
}
)
response_data: dict[str, Any] = {
"knowledgeBases": kb_count,
"totalMessages": msg_count,
"totalDocuments": doc_count,
"totalSessions": session_count,
"totalTokens": total_tokens,
"promptTokens": prompt_tokens,
"completionTokens": completion_tokens,
"aiRequestsCount": ai_requests_count,
"avgLatencyMs": round(avg_latency_ms, 2),
"lastLatencyMs": last_latency_ms,
"lastRequestTime": last_request_time,
"slowRequestsCount": slow_requests_count,
"errorRequestsCount": error_requests_count,
"p95LatencyMs": round(p95_latency_ms, 2),
"p99LatencyMs": round(p99_latency_ms, 2),
"minLatencyMs": round(min_latency_ms, 2),
"maxLatencyMs": round(max_latency_ms, 2),
"latencyThresholdMs": latency_threshold,
}
if include_enhanced:
try:
cache = get_monitoring_cache()
dashboard_service = DashboardService(session, cache)
enhanced_stats = await dashboard_service.get_enhanced_stats(
tenant_id=tenant_id,
start_date=start_dt,
end_date=end_dt,
)
response_data.update(enhanced_stats.to_dict())
except Exception as e:
logger.warning(f"Failed to get enhanced stats: {e}")
response_data.update({
"intentRuleHitRate": 0.0,
"intentRuleHitCount": 0,
"topIntentRules": [],
"promptTemplateUsageCount": 0,
"topPromptTemplates": [],
"scriptFlowActivationCount": 0,
"scriptFlowCompletionRate": 0.0,
"topScriptFlows": [],
"guardrailBlockCount": 0,
"guardrailBlockRate": 0.0,
"topGuardrailWords": [],
})
return JSONResponse(content=response_data)

View File

@ -39,7 +39,7 @@ async def list_embedding_providers(
for name in EmbeddingProviderFactory.get_available_providers():
info = EmbeddingProviderFactory.get_provider_info(name)
providers.append(info)
return {"providers": providers}
@ -66,32 +66,32 @@ async def update_embedding_config(
"""
provider = request.get("provider")
config = request.get("config", {})
if not provider:
raise InvalidRequestException("provider is required")
if provider not in EmbeddingProviderFactory.get_available_providers():
raise InvalidRequestException(
f"Unknown provider: {provider}. "
f"Available: {EmbeddingProviderFactory.get_available_providers()}"
)
manager = get_embedding_config_manager()
old_config = manager.get_full_config()
old_provider = old_config.get("provider")
old_model = old_config.get("config", {}).get("model", "")
new_model = config.get("model", "")
try:
await manager.update_config(provider, config)
response = {
"success": True,
"message": f"Configuration updated to use {provider}",
}
if old_provider != provider or old_model != new_model:
response["warning"] = (
"嵌入模型已更改。由于不同模型生成的向量不兼容,"
@ -102,7 +102,7 @@ async def update_embedding_config(
f"[EMBEDDING] Model changed from {old_provider}/{old_model} to {provider}/{new_model}. "
f"Documents need to be re-uploaded."
)
return response
except EmbeddingException as e:
raise InvalidRequestException(str(e))
@ -121,15 +121,15 @@ async def test_embedding(
test_text = request.get("test_text", "这是一个测试文本")
config = request.get("config")
provider = request.get("provider")
manager = get_embedding_config_manager()
result = await manager.test_connection(
test_text=test_text,
provider=provider,
config=config,
)
return result
@ -141,11 +141,11 @@ async def get_supported_document_formats(
Get supported document formats for embedding.
Returns list of supported file extensions.
"""
from app.services.document import get_supported_document_formats, DocumentParserFactory
from app.services.document import DocumentParserFactory, get_supported_document_formats
formats = get_supported_document_formats()
parser_info = DocumentParserFactory.get_parser_info()
return {
"formats": formats,
"parsers": parser_info,

View File

@ -0,0 +1,402 @@
"""
Flow test API for AI Service Admin.
[AC-AISVC-93~AC-AISVC-95] Complete 12-step flow execution testing.
"""
import logging
import uuid
from datetime import datetime
from typing import Any
from fastapi import APIRouter, Depends, Header, HTTPException, Query
from pydantic import BaseModel
from sqlalchemy import desc, func, select
from sqlalchemy.ext.asyncio import AsyncSession
from app.core.database import get_session
from app.models.entities import FlowTestRecord, FlowTestRecordStatus
logger = logging.getLogger(__name__)
router = APIRouter(prefix="/admin/test", tags=["Flow Test"])
def get_tenant_id(x_tenant_id: str = Header(..., alias="X-Tenant-Id")) -> str:
"""Extract tenant ID from header."""
if not x_tenant_id:
raise HTTPException(status_code=400, detail="X-Tenant-Id header is required")
return x_tenant_id
class FlowExecutionRequest(BaseModel):
"""Request for flow execution test."""
message: str
session_id: str | None = None
user_id: str | None = None
enable_flow: bool = True
enable_intent: bool = True
enable_rag: bool = True
enable_guardrail: bool = True
enable_memory: bool = True
compare_mode: bool = False
class FlowExecutionResponse(BaseModel):
"""Response for flow execution test."""
test_id: str
session_id: str
status: str
steps: list[dict[str, Any]]
final_response: dict[str, Any] | None
total_duration_ms: int
created_at: str
@router.post(
"/flow-execution",
operation_id="executeFlowTest",
summary="Execute complete 12-step flow",
description="[AC-AISVC-93] Execute complete 12-step generation flow with detailed step logging.",
)
async def execute_flow_test(
request: FlowExecutionRequest,
tenant_id: str = Depends(get_tenant_id),
session: AsyncSession = Depends(get_session),
) -> FlowExecutionResponse:
"""
[AC-AISVC-93] Execute complete 12-step flow for testing.
Steps:
1. InputScanner - Scan input for forbidden words
2. FlowEngine - Check if flow is active
3. IntentRouter - Match intent rules
4. QueryRewriter - Rewrite query for better retrieval
5. MultiKBRetrieval - Retrieve from multiple knowledge bases
6. ResultRanker - Rank and filter results
7. PromptBuilder - Build prompt from template
8. LLMGenerate - Generate response via LLM
9. OutputFilter - Filter output for forbidden words
10. Confidence - Calculate confidence score
11. Memory - Store conversation in memory
12. Response - Return final response
"""
import time
from app.models import ChatRequest, ChannelType
from app.services.llm.factory import get_llm_config_manager
from app.services.memory import MemoryService
from app.services.orchestrator import OrchestratorService
from app.services.retrieval.optimized_retriever import get_optimized_retriever
logger.info(
f"[AC-AISVC-93] Executing flow test for tenant={tenant_id}, "
f"message={request.message[:50]}..."
)
test_session_id = request.session_id or f"test_{uuid.uuid4().hex[:8]}"
start_time = time.time()
memory_service = MemoryService(session)
llm_config_manager = get_llm_config_manager()
llm_client = llm_config_manager.get_client()
retriever = await get_optimized_retriever()
orchestrator = OrchestratorService(
llm_client=llm_client,
memory_service=memory_service,
retriever=retriever,
)
try:
chat_request = ChatRequest(
session_id=test_session_id,
current_message=request.message,
channel_type=ChannelType.WECHAT,
history=[],
)
result = await orchestrator.generate(
tenant_id=tenant_id,
request=chat_request,
)
steps = result.metadata.get("execution_steps", []) if result.metadata else []
total_duration_ms = int((time.time() - start_time) * 1000)
has_failure = any(s.get("status") == "failed" for s in steps)
has_partial = any(s.get("status") == "skipped" for s in steps)
if has_failure:
status = FlowTestRecordStatus.FAILED.value
elif has_partial:
status = FlowTestRecordStatus.PARTIAL.value
else:
status = FlowTestRecordStatus.SUCCESS.value
test_record = FlowTestRecord(
tenant_id=tenant_id,
session_id=test_session_id,
status=status,
steps=steps,
final_response={
"reply": result.reply,
"confidence": result.confidence,
"should_transfer": result.should_transfer,
},
total_duration_ms=total_duration_ms,
)
try:
session.add(test_record)
await session.commit()
await session.refresh(test_record)
except Exception as db_error:
logger.warning(f"Failed to save test record: {db_error}")
await session.rollback()
logger.info(
f"[AC-AISVC-93] Flow test completed: id={test_record.id}, "
f"status={status}, duration={total_duration_ms}ms"
)
return FlowExecutionResponse(
test_id=str(test_record.id),
session_id=test_session_id,
status=status,
steps=steps,
final_response=test_record.final_response,
total_duration_ms=total_duration_ms,
created_at=test_record.created_at.isoformat(),
)
except Exception as e:
logger.error(f"[AC-AISVC-93] Flow test failed: {e}")
total_duration_ms = int((time.time() - start_time) * 1000)
await session.rollback()
test_record = FlowTestRecord(
tenant_id=tenant_id,
session_id=test_session_id,
status=FlowTestRecordStatus.FAILED.value,
steps=[{
"step": 0,
"name": "Error",
"status": "failed",
"error": str(e),
}],
final_response=None,
total_duration_ms=total_duration_ms,
)
session.add(test_record)
await session.commit()
await session.refresh(test_record)
raise HTTPException(status_code=500, detail=str(e))
@router.get(
"/flow-execution/{test_id}",
operation_id="getFlowTestResult",
summary="Get flow test result",
description="[AC-AISVC-94] Get detailed result of a flow execution test.",
)
async def get_flow_test_result(
test_id: uuid.UUID,
tenant_id: str = Depends(get_tenant_id),
session: AsyncSession = Depends(get_session),
) -> dict[str, Any]:
"""
[AC-AISVC-94] Get detailed result of a flow execution test.
Returns step-by-step execution details for debugging.
"""
logger.info(
f"[AC-AISVC-94] Getting flow test result for tenant={tenant_id}, "
f"test_id={test_id}"
)
stmt = select(FlowTestRecord).where(
FlowTestRecord.id == test_id,
FlowTestRecord.tenant_id == tenant_id,
)
result = await session.execute(stmt)
record = result.scalar_one_or_none()
if not record:
raise HTTPException(status_code=404, detail="Test record not found")
return {
"testId": str(record.id),
"sessionId": record.session_id,
"status": record.status,
"steps": record.steps,
"finalResponse": record.final_response,
"totalDurationMs": record.total_duration_ms,
"createdAt": record.created_at.isoformat(),
}
@router.get(
"/flow-executions",
operation_id="listFlowTests",
summary="List flow test records",
description="[AC-AISVC-95] List flow test execution records.",
)
async def list_flow_tests(
tenant_id: str = Depends(get_tenant_id),
session_id: str | None = Query(None, description="Filter by session ID"),
status: str | None = Query(None, description="Filter by status"),
page: int = Query(1, ge=1, description="Page number"),
page_size: int = Query(20, ge=1, le=100, description="Page size"),
session: AsyncSession = Depends(get_session),
) -> dict[str, Any]:
"""
[AC-AISVC-95] List flow test execution records.
Records are retained for 7 days.
"""
logger.info(
f"[AC-AISVC-95] Listing flow tests for tenant={tenant_id}, "
f"session={session_id}, page={page}"
)
stmt = select(FlowTestRecord).where(
FlowTestRecord.tenant_id == tenant_id,
)
if session_id:
stmt = stmt.where(FlowTestRecord.session_id == session_id)
if status:
stmt = stmt.where(FlowTestRecord.status == status)
count_stmt = select(func.count()).select_from(stmt.subquery())
total_result = await session.execute(count_stmt)
total = total_result.scalar() or 0
stmt = stmt.order_by(desc(FlowTestRecord.created_at))
stmt = stmt.offset((page - 1) * page_size).limit(page_size)
result = await session.execute(stmt)
records = result.scalars().all()
return {
"data": [
{
"testId": str(r.id),
"sessionId": r.session_id,
"status": r.status,
"stepCount": len(r.steps),
"totalDurationMs": r.total_duration_ms,
"createdAt": r.created_at.isoformat(),
}
for r in records
],
"page": page,
"pageSize": page_size,
"total": total,
}
class CompareRequest(BaseModel):
"""Request for comparison test."""
message: str
baseline_config: dict[str, Any] | None = None
test_config: dict[str, Any] | None = None
@router.post(
"/compare",
operation_id="compareFlowTest",
summary="Compare two flow executions",
description="[AC-AISVC-95] Compare baseline and test configurations.",
)
async def compare_flow_test(
request: CompareRequest,
tenant_id: str = Depends(get_tenant_id),
session: AsyncSession = Depends(get_session),
) -> dict[str, Any]:
"""
[AC-AISVC-95] Compare two flow executions with different configurations.
Useful for:
- A/B testing prompt templates
- Comparing RAG retrieval strategies
- Testing guardrail effectiveness
"""
import time
from app.models import ChatRequest, ChannelType
from app.services.llm.factory import get_llm_config_manager
from app.services.memory import MemoryService
from app.services.orchestrator import OrchestratorService
from app.services.retrieval.optimized_retriever import get_optimized_retriever
logger.info(
f"[AC-AISVC-95] Running comparison test for tenant={tenant_id}"
)
baseline_session_id = f"compare_baseline_{uuid.uuid4().hex[:8]}"
test_session_id = f"compare_test_{uuid.uuid4().hex[:8]}"
memory_service = MemoryService(session)
llm_config_manager = get_llm_config_manager()
llm_client = llm_config_manager.get_client()
retriever = await get_optimized_retriever()
orchestrator = OrchestratorService(
llm_client=llm_client,
memory_service=memory_service,
retriever=retriever,
)
baseline_chat_request = ChatRequest(
session_id=baseline_session_id,
current_message=request.message,
channel_type=ChannelType.WECHAT,
history=[],
)
baseline_start = time.time()
baseline_result = await orchestrator.generate(
tenant_id=tenant_id,
request=baseline_chat_request,
)
baseline_duration = int((time.time() - baseline_start) * 1000)
test_chat_request = ChatRequest(
session_id=test_session_id,
current_message=request.message,
channel_type=ChannelType.WECHAT,
history=[],
)
test_start = time.time()
test_result = await orchestrator.generate(
tenant_id=tenant_id,
request=test_chat_request,
)
test_duration = int((time.time() - test_start) * 1000)
return {
"baseline": {
"sessionId": baseline_session_id,
"reply": baseline_result.reply,
"confidence": baseline_result.confidence,
"durationMs": baseline_duration,
"steps": baseline_result.metadata.get("execution_steps", []) if baseline_result.metadata else [],
},
"test": {
"sessionId": test_session_id,
"reply": test_result.reply,
"confidence": test_result.confidence,
"durationMs": test_duration,
"steps": test_result.metadata.get("execution_steps", []) if test_result.metadata else [],
},
"comparison": {
"durationDiffMs": test_duration - baseline_duration,
"confidenceDiff": (test_result.confidence or 0) - (baseline_result.confidence or 0),
},
}

View File

@ -1,16 +1,16 @@
"""
Knowledge Base management endpoints.
[AC-ASA-01, AC-ASA-02, AC-ASA-08] Document upload, list, and index job status.
[AC-AISVC-59~AC-AISVC-64] Multi-knowledge-base management.
"""
import logging
import os
import uuid
from dataclasses import dataclass
from typing import Annotated, Optional
import tiktoken
from fastapi import APIRouter, BackgroundTasks, Depends, Query, UploadFile, File, Form
from fastapi import APIRouter, BackgroundTasks, Depends, File, Form, Query, UploadFile
from fastapi.responses import JSONResponse
from sqlalchemy import select
from sqlalchemy.ext.asyncio import AsyncSession
@ -19,8 +19,15 @@ from app.core.database import get_session
from app.core.exceptions import MissingTenantIdException
from app.core.tenant import get_tenant_id
from app.models import ErrorResponse
from app.models.entities import DocumentStatus, IndexJob, IndexJobStatus
from app.models.entities import (
IndexJob,
IndexJobStatus,
KBType,
KnowledgeBaseCreate,
KnowledgeBaseUpdate,
)
from app.services.kb import KBService
from app.services.knowledge_base_service import KnowledgeBaseService
logger = logging.getLogger(__name__)
@ -44,24 +51,24 @@ def chunk_text_by_lines(
) -> list[TextChunk]:
"""
按行分块每行作为一个独立的检索单元
Args:
text: 要分块的文本
min_line_length: 最小行长度低于此长度的行会被跳过
source: 来源文件路径可选
Returns:
分块列表每个块对应一行文本
"""
lines = text.split('\n')
chunks: list[TextChunk] = []
for i, line in enumerate(lines):
line = line.strip()
if len(line) < min_line_length:
continue
chunks.append(TextChunk(
text=line,
start_token=i,
@ -69,7 +76,7 @@ def chunk_text_by_lines(
page=None,
source=source,
))
return chunks
@ -82,14 +89,14 @@ def chunk_text_with_tiktoken(
) -> list[TextChunk]:
"""
使用 tiktoken token 数分块支持重叠分块
Args:
text: 要分块的文本
chunk_size: 每个块的最大 token
overlap: 块之间的重叠 token
page: 页码可选
source: 来源文件路径可选
Returns:
分块列表每个块包含文本及起始/结束位置
"""
@ -97,7 +104,7 @@ def chunk_text_with_tiktoken(
tokens = encoding.encode(text)
chunks: list[TextChunk] = []
start = 0
while start < len(tokens):
end = min(start + chunk_size, len(tokens))
chunk_tokens = tokens[start:end]
@ -112,7 +119,7 @@ def chunk_text_with_tiktoken(
if end == len(tokens):
break
start += chunk_size - overlap
return chunks
@ -128,7 +135,7 @@ def get_current_tenant_id() -> str:
"/knowledge-bases",
operation_id="listKnowledgeBases",
summary="Query knowledge base list",
description="Get list of knowledge bases for the current tenant.",
description="[AC-AISVC-60] Get list of knowledge bases for the current tenant with type and status filters.",
responses={
200: {"description": "Knowledge base list"},
401: {"description": "Unauthorized", "model": ErrorResponse},
@ -138,41 +145,264 @@ def get_current_tenant_id() -> str:
async def list_knowledge_bases(
tenant_id: Annotated[str, Depends(get_current_tenant_id)],
session: Annotated[AsyncSession, Depends(get_session)],
kb_type: Annotated[Optional[str], Query()] = None,
is_enabled: Annotated[Optional[bool], Query()] = None,
) -> JSONResponse:
"""
List all knowledge bases for the current tenant.
[AC-AISVC-60] List all knowledge bases for the current tenant.
Supports filtering by kb_type and is_enabled status.
"""
logger.info(f"Listing knowledge bases: tenant={tenant_id}")
try:
logger.info(f"[AC-AISVC-60] Listing knowledge bases: tenant={tenant_id}, kb_type={kb_type}, is_enabled={is_enabled}")
kb_service = KBService(session)
knowledge_bases = await kb_service.list_knowledge_bases(tenant_id)
kb_ids = [str(kb.id) for kb in knowledge_bases]
doc_counts = {}
if kb_ids:
from sqlalchemy import func
from app.models.entities import Document
count_stmt = (
select(Document.kb_id, func.count(Document.id).label("count"))
.where(Document.tenant_id == tenant_id, Document.kb_id.in_(kb_ids))
.group_by(Document.kb_id)
kb_service = KnowledgeBaseService(session)
logger.info(f"[AC-AISVC-60] KnowledgeBaseService created, calling list_knowledge_bases...")
knowledge_bases = await kb_service.list_knowledge_bases(
tenant_id=tenant_id,
kb_type=kb_type,
is_enabled=is_enabled,
)
count_result = await session.execute(count_stmt)
for row in count_result:
doc_counts[row.kb_id] = row.count
logger.info(f"[AC-AISVC-60] Found {len(knowledge_bases)} knowledge bases")
data = []
for kb in knowledge_bases:
kb_id_str = str(kb.id)
data.append({
"id": kb_id_str,
data = []
for kb in knowledge_bases:
data.append({
"id": str(kb.id),
"name": kb.name,
"kbType": kb.kb_type,
"description": kb.description,
"priority": kb.priority,
"isEnabled": kb.is_enabled,
"docCount": kb.doc_count,
"createdAt": kb.created_at.isoformat() + "Z",
"updatedAt": kb.updated_at.isoformat() + "Z",
})
logger.info(f"[AC-AISVC-60] Returning {len(data)} knowledge bases")
return JSONResponse(content={"data": data})
except Exception as e:
import traceback
logger.error(f"[AC-AISVC-60] Error listing knowledge bases: {type(e).__name__}: {e}\n{traceback.format_exc()}")
raise
@router.post(
"/knowledge-bases",
operation_id="createKnowledgeBase",
summary="Create knowledge base",
description="[AC-AISVC-59] Create a new knowledge base with specified type and priority.",
responses={
201: {"description": "Knowledge base created"},
400: {"description": "Bad Request - invalid kb_type"},
401: {"description": "Unauthorized", "model": ErrorResponse},
403: {"description": "Forbidden", "model": ErrorResponse},
},
status_code=201,
)
async def create_knowledge_base(
tenant_id: Annotated[str, Depends(get_current_tenant_id)],
session: Annotated[AsyncSession, Depends(get_session)],
kb_create: KnowledgeBaseCreate,
) -> JSONResponse:
"""
[AC-AISVC-59] Create a new knowledge base.
Initializes corresponding Qdrant Collection.
"""
valid_types = [t.value for t in KBType]
if kb_create.kb_type not in valid_types:
return JSONResponse(
status_code=400,
content={
"code": "INVALID_KB_TYPE",
"message": f"Invalid kb_type: {kb_create.kb_type}",
"details": {"valid_types": valid_types},
},
)
logger.info(
f"[AC-AISVC-59] Creating knowledge base: tenant={tenant_id}, "
f"name={kb_create.name}, type={kb_create.kb_type}"
)
kb_service = KnowledgeBaseService(session)
kb = await kb_service.create_knowledge_base(tenant_id, kb_create)
await session.commit()
return JSONResponse(
status_code=201,
content={
"id": str(kb.id),
"name": kb.name,
"documentCount": doc_counts.get(kb_id_str, 0),
"kbType": kb.kb_type,
"description": kb.description,
"priority": kb.priority,
"isEnabled": kb.is_enabled,
"docCount": kb.doc_count,
"createdAt": kb.created_at.isoformat() + "Z",
})
"updatedAt": kb.updated_at.isoformat() + "Z",
},
)
return JSONResponse(content={"data": data})
@router.get(
"/knowledge-bases/{kb_id}",
operation_id="getKnowledgeBase",
summary="Get knowledge base details",
description="Get detailed information about a specific knowledge base.",
responses={
200: {"description": "Knowledge base details"},
404: {"description": "Knowledge base not found"},
401: {"description": "Unauthorized", "model": ErrorResponse},
403: {"description": "Forbidden", "model": ErrorResponse},
},
)
async def get_knowledge_base(
tenant_id: Annotated[str, Depends(get_current_tenant_id)],
session: Annotated[AsyncSession, Depends(get_session)],
kb_id: str,
) -> JSONResponse:
"""
Get a specific knowledge base by ID.
"""
logger.info(f"Getting knowledge base: tenant={tenant_id}, kb_id={kb_id}")
kb_service = KnowledgeBaseService(session)
kb = await kb_service.get_knowledge_base(tenant_id, kb_id)
if not kb:
return JSONResponse(
status_code=404,
content={
"code": "KB_NOT_FOUND",
"message": f"Knowledge base {kb_id} not found",
},
)
return JSONResponse(
content={
"id": str(kb.id),
"name": kb.name,
"kbType": kb.kb_type,
"description": kb.description,
"priority": kb.priority,
"isEnabled": kb.is_enabled,
"docCount": kb.doc_count,
"createdAt": kb.created_at.isoformat() + "Z",
"updatedAt": kb.updated_at.isoformat() + "Z",
}
)
@router.put(
"/knowledge-bases/{kb_id}",
operation_id="updateKnowledgeBase",
summary="Update knowledge base",
description="[AC-AISVC-61] Update knowledge base name, type, description, priority, or enabled status.",
responses={
200: {"description": "Knowledge base updated"},
400: {"description": "Bad Request - invalid kb_type"},
404: {"description": "Knowledge base not found"},
401: {"description": "Unauthorized", "model": ErrorResponse},
403: {"description": "Forbidden", "model": ErrorResponse},
},
)
async def update_knowledge_base(
tenant_id: Annotated[str, Depends(get_current_tenant_id)],
session: Annotated[AsyncSession, Depends(get_session)],
kb_id: str,
kb_update: KnowledgeBaseUpdate,
) -> JSONResponse:
"""
[AC-AISVC-61] Update a knowledge base.
"""
if kb_update.kb_type is not None:
valid_types = [t.value for t in KBType]
if kb_update.kb_type not in valid_types:
return JSONResponse(
status_code=400,
content={
"code": "INVALID_KB_TYPE",
"message": f"Invalid kb_type: {kb_update.kb_type}",
"details": {"valid_types": valid_types},
},
)
logger.info(
f"[AC-AISVC-61] Updating knowledge base: tenant={tenant_id}, kb_id={kb_id}"
)
kb_service = KnowledgeBaseService(session)
kb = await kb_service.update_knowledge_base(tenant_id, kb_id, kb_update)
if not kb:
return JSONResponse(
status_code=404,
content={
"code": "KB_NOT_FOUND",
"message": f"Knowledge base {kb_id} not found",
},
)
await session.commit()
return JSONResponse(
content={
"id": str(kb.id),
"name": kb.name,
"kbType": kb.kb_type,
"description": kb.description,
"priority": kb.priority,
"isEnabled": kb.is_enabled,
"docCount": kb.doc_count,
"createdAt": kb.created_at.isoformat() + "Z",
"updatedAt": kb.updated_at.isoformat() + "Z",
}
)
@router.delete(
"/knowledge-bases/{kb_id}",
operation_id="deleteKnowledgeBase",
summary="Delete knowledge base",
description="[AC-AISVC-62] Delete a knowledge base and its associated documents and Qdrant Collection.",
responses={
204: {"description": "Knowledge base deleted"},
404: {"description": "Knowledge base not found"},
401: {"description": "Unauthorized", "model": ErrorResponse},
403: {"description": "Forbidden", "model": ErrorResponse},
},
)
async def delete_knowledge_base(
tenant_id: Annotated[str, Depends(get_current_tenant_id)],
session: Annotated[AsyncSession, Depends(get_session)],
kb_id: str,
) -> JSONResponse:
"""
[AC-AISVC-62] Delete a knowledge base.
Also deletes associated documents and Qdrant Collection.
"""
logger.info(
f"[AC-AISVC-62] Deleting knowledge base: tenant={tenant_id}, kb_id={kb_id}"
)
kb_service = KnowledgeBaseService(session)
deleted = await kb_service.delete_knowledge_base(tenant_id, kb_id)
if not deleted:
return JSONResponse(
status_code=404,
content={
"code": "KB_NOT_FOUND",
"message": f"Knowledge base {kb_id} not found",
},
)
await session.commit()
return JSONResponse(
status_code=204,
content=None,
)
@router.get(
@ -221,7 +451,7 @@ async def list_documents(
).order_by(IndexJob.created_at.desc())
job_result = await session.execute(job_stmt)
latest_job = job_result.scalar_one_or_none()
data.append({
"docId": str(doc.id),
"kbId": doc.kb_id,
@ -249,10 +479,10 @@ async def list_documents(
"/documents",
operation_id="uploadDocument",
summary="Upload/import document",
description="[AC-ASA-01] Upload document and trigger indexing job.",
description="[AC-ASA-01, AC-AISVC-63] Upload document to specified knowledge base and trigger indexing job.",
responses={
202: {"description": "Accepted - async indexing job started"},
400: {"description": "Bad Request - unsupported format"},
400: {"description": "Bad Request - unsupported format or invalid kb_id"},
401: {"description": "Unauthorized", "model": ErrorResponse},
403: {"description": "Forbidden", "model": ErrorResponse},
},
@ -265,20 +495,21 @@ async def upload_document(
kb_id: str = Form(...),
) -> JSONResponse:
"""
[AC-ASA-01] Upload document and create indexing job.
[AC-AISVC-33, AC-AISVC-34, AC-AISVC-35, AC-AISVC-37] Support multiple document formats.
[AC-ASA-01, AC-AISVC-63] Upload document to specified knowledge base.
Creates KB if not exists, indexes to corresponding Qdrant Collection.
"""
from app.services.document import get_supported_document_formats, UnsupportedFormatError
from pathlib import Path
from app.services.document import get_supported_document_formats
logger.info(
f"[AC-ASA-01] Uploading document: tenant={tenant_id}, "
f"[AC-AISVC-63] Uploading document: tenant={tenant_id}, "
f"kb_id={kb_id}, filename={file.filename}"
)
file_ext = Path(file.filename or "").suffix.lower()
supported_formats = get_supported_document_formats()
if file_ext and file_ext not in supported_formats:
return JSONResponse(
status_code=400,
@ -291,23 +522,35 @@ async def upload_document(
},
)
kb_service = KBService(session)
kb_service = KnowledgeBaseService(session)
kb = await kb_service.get_or_create_kb(tenant_id, kb_id)
try:
kb = await kb_service.get_knowledge_base(tenant_id, kb_id)
if not kb:
kb = await kb_service.get_or_create_default_kb(tenant_id)
kb_id = str(kb.id)
logger.info(f"[AC-AISVC-63] KB not found, using default: {kb_id}")
else:
kb_id = str(kb.id)
except Exception:
kb = await kb_service.get_or_create_default_kb(tenant_id)
kb_id = str(kb.id)
doc_kb_service = KBService(session)
file_content = await file.read()
document, job = await kb_service.upload_document(
document, job = await doc_kb_service.upload_document(
tenant_id=tenant_id,
kb_id=str(kb.id),
kb_id=kb_id,
file_name=file.filename or "unknown",
file_content=file_content,
file_type=file.content_type,
)
await kb_service.update_doc_count(tenant_id, kb_id, delta=1)
await session.commit()
background_tasks.add_task(
_index_document, tenant_id, str(job.id), str(document.id), file_content, file.filename
_index_document, tenant_id, kb_id, str(job.id), str(document.id), file_content, file.filename
)
return JSONResponse(
@ -315,27 +558,38 @@ async def upload_document(
content={
"jobId": str(job.id),
"docId": str(document.id),
"kbId": kb_id,
"status": job.status,
},
)
async def _index_document(tenant_id: str, job_id: str, doc_id: str, content: bytes, filename: str | None = None):
async def _index_document(
tenant_id: str,
kb_id: str,
job_id: str,
doc_id: str,
content: bytes,
filename: str | None = None,
):
"""
Background indexing task.
[AC-AISVC-33, AC-AISVC-34, AC-AISVC-35] Uses document parsing and pluggable embedding.
[AC-AISVC-33, AC-AISVC-34, AC-AISVC-35, AC-AISVC-63] Uses document parsing and pluggable embedding.
Indexes to the specified knowledge base's Qdrant Collection.
"""
from app.core.database import async_session_maker
from app.services.kb import KBService
from app.core.qdrant_client import get_qdrant_client
from app.services.embedding import get_embedding_provider
from app.services.document import parse_document, UnsupportedFormatError, DocumentParseException, PageText
from qdrant_client.models import PointStruct
import asyncio
import tempfile
from pathlib import Path
logger.info(f"[INDEX] Starting indexing: tenant={tenant_id}, job_id={job_id}, doc_id={doc_id}, filename={filename}")
from qdrant_client.models import PointStruct
from app.core.database import async_session_maker
from app.core.qdrant_client import get_qdrant_client
from app.services.document import DocumentParseException, UnsupportedFormatError, parse_document
from app.services.embedding import get_embedding_provider
from app.services.kb import KBService
logger.info(f"[INDEX] Starting indexing: tenant={tenant_id}, kb_id={kb_id}, job_id={job_id}, doc_id={doc_id}, filename={filename}")
await asyncio.sleep(1)
async with async_session_maker() as session:
@ -350,11 +604,11 @@ async def _index_document(tenant_id: str, job_id: str, doc_id: str, content: byt
text = None
file_ext = Path(filename or "").suffix.lower()
logger.info(f"[INDEX] File extension: {file_ext}, content size: {len(content)} bytes")
text_extensions = {".txt", ".md", ".markdown", ".rst", ".log", ".json", ".xml", ".yaml", ".yml"}
if file_ext in text_extensions or not file_ext:
logger.info(f"[INDEX] Treating as text file, trying multiple encodings")
logger.info("[INDEX] Treating as text file, trying multiple encodings")
text = None
for encoding in ["utf-8", "gbk", "gb2312", "gb18030", "big5", "utf-16", "latin-1"]:
try:
@ -363,23 +617,23 @@ async def _index_document(tenant_id: str, job_id: str, doc_id: str, content: byt
break
except (UnicodeDecodeError, LookupError):
continue
if text is None:
text = content.decode("utf-8", errors="replace")
logger.warning(f"[INDEX] Failed to decode with known encodings, using utf-8 with replacement")
logger.warning("[INDEX] Failed to decode with known encodings, using utf-8 with replacement")
else:
logger.info(f"[INDEX] Binary file detected, will parse with document parser")
logger.info("[INDEX] Binary file detected, will parse with document parser")
await kb_service.update_job_status(
tenant_id, job_id, IndexJobStatus.PROCESSING.value, progress=15
)
await session.commit()
with tempfile.NamedTemporaryFile(delete=False, suffix=file_ext) as tmp_file:
tmp_file.write(content)
tmp_path = tmp_file.name
logger.info(f"[INDEX] Temp file created: {tmp_path}")
try:
logger.info(f"[INDEX] Starting document parsing for {file_ext}...")
parse_result = parse_document(tmp_path)
@ -403,23 +657,23 @@ async def _index_document(tenant_id: str, job_id: str, doc_id: str, content: byt
text = content.decode("utf-8", errors="ignore")
finally:
Path(tmp_path).unlink(missing_ok=True)
logger.info(f"[INDEX] Temp file cleaned up")
logger.info("[INDEX] Temp file cleaned up")
logger.info(f"[INDEX] Final text length: {len(text)} chars")
if len(text) < 50:
logger.warning(f"[INDEX] Text too short, preview: {repr(text[:200])}")
await kb_service.update_job_status(
tenant_id, job_id, IndexJobStatus.PROCESSING.value, progress=20
)
await session.commit()
logger.info(f"[INDEX] Getting embedding provider...")
logger.info("[INDEX] Getting embedding provider...")
embedding_provider = await get_embedding_provider()
logger.info(f"[INDEX] Embedding provider: {type(embedding_provider).__name__}")
all_chunks: list[TextChunk] = []
if parse_result and parse_result.pages:
logger.info(f"[INDEX] PDF with {len(parse_result.pages)} pages, using line-based chunking with page metadata")
for page in parse_result.pages:
@ -433,7 +687,7 @@ async def _index_document(tenant_id: str, job_id: str, doc_id: str, content: byt
all_chunks.extend(page_chunks)
logger.info(f"[INDEX] Total chunks from PDF: {len(all_chunks)}")
else:
logger.info(f"[INDEX] Using line-based chunking")
logger.info("[INDEX] Using line-based chunking")
all_chunks = chunk_text_by_lines(
text,
min_line_length=10,
@ -442,7 +696,7 @@ async def _index_document(tenant_id: str, job_id: str, doc_id: str, content: byt
logger.info(f"[INDEX] Total chunks: {len(all_chunks)}")
qdrant = await get_qdrant_client()
await qdrant.ensure_collection_exists(tenant_id, use_multi_vector=True)
await qdrant.ensure_kb_collection_exists(tenant_id, kb_id, use_multi_vector=True)
from app.services.embedding.nomic_provider import NomicEmbeddingProvider
use_multi_vector = isinstance(embedding_provider, NomicEmbeddingProvider)
@ -454,6 +708,7 @@ async def _index_document(tenant_id: str, job_id: str, doc_id: str, content: byt
payload = {
"text": chunk.text,
"source": doc_id,
"kb_id": kb_id,
"chunk_index": i,
"start_token": chunk.start_token,
"end_token": chunk.end_token,
@ -462,7 +717,7 @@ async def _index_document(tenant_id: str, job_id: str, doc_id: str, content: byt
payload["page"] = chunk.page
if chunk.source:
payload["filename"] = chunk.source
if use_multi_vector:
embedding_result = await embedding_provider.embed_document(chunk.text)
points.append({
@ -483,7 +738,7 @@ async def _index_document(tenant_id: str, job_id: str, doc_id: str, content: byt
payload=payload,
)
)
progress = 20 + int((i + 1) / total_chunks * 70)
if i % 10 == 0 or i == total_chunks - 1:
await kb_service.update_job_status(
@ -492,11 +747,11 @@ async def _index_document(tenant_id: str, job_id: str, doc_id: str, content: byt
await session.commit()
if points:
logger.info(f"[INDEX] Upserting {len(points)} vectors to Qdrant...")
logger.info(f"[INDEX] Upserting {len(points)} vectors to Qdrant for kb_id={kb_id}...")
if use_multi_vector:
await qdrant.upsert_multi_vector(tenant_id, points)
await qdrant.upsert_multi_vector(tenant_id, points, kb_id=kb_id)
else:
await qdrant.upsert_vectors(tenant_id, points)
await qdrant.upsert_vectors(tenant_id, points, kb_id=kb_id)
await kb_service.update_job_status(
tenant_id, job_id, IndexJobStatus.COMPLETED.value, progress=100
@ -504,7 +759,7 @@ async def _index_document(tenant_id: str, job_id: str, doc_id: str, content: byt
await session.commit()
logger.info(
f"[INDEX] COMPLETED: tenant={tenant_id}, "
f"[INDEX] COMPLETED: tenant={tenant_id}, kb_id={kb_id}, "
f"job_id={job_id}, chunks={len(all_chunks)}, text_len={len(text)}"
)

View File

@ -4,7 +4,6 @@ Reference: rag-optimization/spec.md Section 4.2
"""
import logging
from datetime import date
from typing import Any
from fastapi import APIRouter, Depends, HTTPException, status
@ -16,9 +15,6 @@ from app.core.database import get_session
from app.services.retrieval import (
ChunkMetadata,
ChunkMetadataModel,
IndexingProgress,
IndexingResult,
KnowledgeIndexer,
MetadataFilter,
RetrievalStrategy,
get_knowledge_indexer,
@ -100,7 +96,7 @@ async def index_document(
):
"""
Index a document with optimized embedding.
Features:
- Task prefixes (search_document:) for document embedding
- Multi-dimensional vectors (256/512/768)
@ -108,7 +104,7 @@ async def index_document(
"""
try:
index = get_knowledge_indexer()
chunk_metadata = None
if request.metadata:
chunk_metadata = ChunkMetadata(
@ -121,14 +117,14 @@ async def index_document(
priority=request.metadata.priority,
keywords=request.metadata.keywords,
)
result = await index.index_document(
tenant_id=request.tenant_id,
document_id=request.document_id,
text=request.text,
metadata=chunk_metadata,
)
return IndexDocumentResponse(
success=result.success,
total_chunks=result.total_chunks,
@ -137,7 +133,7 @@ async def index_document(
elapsed_seconds=result.elapsed_seconds,
error_message=result.error_message,
)
except Exception as e:
logger.error(f"[KB-API] Failed to index document: {e}")
raise HTTPException(
@ -152,10 +148,10 @@ async def get_indexing_progress():
try:
index = get_knowledge_indexer()
progress = index.get_progress()
if progress is None:
return None
return IndexingProgressResponse(
total_chunks=progress.total_chunks,
processed_chunks=progress.processed_chunks,
@ -164,7 +160,7 @@ async def get_indexing_progress():
elapsed_seconds=progress.elapsed_seconds,
current_document=progress.current_document,
)
except Exception as e:
logger.error(f"[KB-API] Failed to get progress: {e}")
raise HTTPException(
@ -177,7 +173,7 @@ async def get_indexing_progress():
async def retrieve_knowledge(request: RetrieveRequest):
"""
Retrieve knowledge using optimized RAG.
Strategies:
- vector: Simple vector search
- bm25: BM25 keyword search
@ -185,26 +181,26 @@ async def retrieve_knowledge(request: RetrieveRequest):
- two_stage: Two-stage retrieval with Matryoshka dimensions
"""
try:
from app.services.retrieval.optimized_retriever import get_optimized_retriever
from app.services.retrieval.base import RetrievalContext
from app.services.retrieval.optimized_retriever import get_optimized_retriever
retriever = await get_optimized_retriever()
metadata_filter = None
if request.filters:
filter_dict = request.filters.model_dump(exclude_none=True)
metadata_filter = MetadataFilter(**filter_dict)
ctx = RetrievalContext(
tenant_id=request.tenant_id,
query=request.query,
)
if metadata_filter:
ctx.metadata = {"filter": metadata_filter.to_qdrant_filter()}
result = await retriever.retrieve(ctx)
return RetrieveResponse(
hits=[
{
@ -220,7 +216,7 @@ async def retrieve_knowledge(request: RetrieveRequest):
is_insufficient=result.diagnostics.get("is_insufficient", False),
diagnostics=result.diagnostics or {},
)
except Exception as e:
logger.error(f"[KB-API] Failed to retrieve: {e}")
raise HTTPException(
@ -266,7 +262,7 @@ async def get_metadata_options():
],
priorities=list(range(1, 11)),
)
except Exception as e:
logger.error(f"[KB-API] Failed to get metadata options: {e}")
raise HTTPException(
@ -286,42 +282,42 @@ async def reindex_all(
"""
try:
from app.models.entities import Document, DocumentStatus
stmt = select(Document).where(
Document.tenant_id == tenant_id,
Document.status == DocumentStatus.COMPLETED.value,
)
result = await session.execute(stmt)
documents = result.scalars().all()
index = get_knowledge_indexer()
total_indexed = 0
total_failed = 0
for doc in documents:
if doc.file_path:
import os
if os.path.exists(doc.file_path):
with open(doc.file_path, 'r', encoding='utf-8') as f:
with open(doc.file_path, encoding='utf-8') as f:
text = f.read()
result = await index.index_document(
tenant_id=tenant_id,
document_id=str(doc.id),
text=text,
)
total_indexed += result.indexed_chunks
total_failed += result.failed_chunks
return {
"success": True,
"total_documents": len(documents),
"total_indexed": total_indexed,
"total_failed": total_failed,
}
except Exception as e:
logger.error(f"[KB-API] Failed to reindex: {e}")
raise HTTPException(

View File

@ -9,7 +9,6 @@ from typing import Any
from fastapi import APIRouter, Depends, Header, HTTPException
from app.services.llm.factory import (
LLMConfigManager,
LLMProviderFactory,
get_llm_config_manager,
)

View File

@ -33,6 +33,7 @@ from app.models.entities import (
from app.services.monitoring.flow_monitor import FlowMonitor
from app.services.monitoring.guardrail_monitor import GuardrailMonitor
from app.services.monitoring.intent_monitor import IntentMonitor
from app.services.monitoring.prompt_monitor import PromptMonitor
logger = logging.getLogger(__name__)
@ -190,6 +191,28 @@ async def get_guardrail_blocks(
}
@router.get("/prompt-templates")
async def get_prompt_template_stats(
tenant_id: str = Depends(get_tenant_id),
start_date: datetime | None = Query(None, description="Start date filter"),
end_date: datetime | None = Query(None, description="End date filter"),
scene: str | None = Query(None, description="Scene filter"),
session: AsyncSession = Depends(get_session),
) -> dict[str, Any]:
"""
[AC-AISVC-100] Get aggregated statistics for all prompt templates.
"""
logger.info(
f"[AC-AISVC-100] Getting prompt template stats for tenant={tenant_id}, "
f"start={start_date}, end={end_date}, scene={scene}"
)
monitor = PromptMonitor(session)
result = await monitor.get_template_stats(tenant_id, scene, start_date, end_date)
return result.to_dict()
@router.get("/conversations")
async def list_conversations(
tenant_id: str = Depends(get_tenant_id),

View File

@ -6,21 +6,20 @@ RAG Lab endpoints for debugging and experimentation.
import json
import logging
import time
from typing import Annotated, Any, List
from typing import Annotated, Any
from fastapi import APIRouter, Depends, Body
from fastapi import APIRouter, Body, Depends
from fastapi.responses import JSONResponse, StreamingResponse
from pydantic import BaseModel, Field
from app.core.config import get_settings
from app.core.exceptions import MissingTenantIdException
from app.core.prompts import format_evidence_for_prompt, build_user_prompt_with_evidence
from app.core.prompts import build_user_prompt_with_evidence, format_evidence_for_prompt
from app.core.tenant import get_tenant_id
from app.models import ErrorResponse
from app.services.retrieval.vector_retriever import get_vector_retriever
from app.services.retrieval.optimized_retriever import get_optimized_retriever
from app.services.retrieval.base import RetrievalContext
from app.services.llm.factory import get_llm_config_manager
from app.services.retrieval.base import RetrievalContext
from app.services.retrieval.optimized_retriever import get_optimized_retriever
logger = logging.getLogger(__name__)
@ -37,7 +36,7 @@ def get_current_tenant_id() -> str:
class RAGExperimentRequest(BaseModel):
query: str = Field(..., description="Query text for retrieval")
kb_ids: List[str] | None = Field(default=None, description="Knowledge base IDs to search")
kb_ids: list[str] | None = Field(default=None, description="Knowledge base IDs to search")
top_k: int = Field(default=5, description="Number of results to retrieve")
score_threshold: float = Field(default=0.5, description="Minimum similarity score")
generate_response: bool = Field(default=True, description="Whether to generate AI response")
@ -55,7 +54,7 @@ class AIResponse(BaseModel):
class RAGExperimentResult(BaseModel):
query: str
retrieval_results: List[dict] = []
retrieval_results: list[dict] = []
final_prompt: str = ""
ai_response: AIResponse | None = None
total_latency_ms: float = 0
@ -227,10 +226,10 @@ async def run_rag_experiment_stream(
final_prompt = _build_final_prompt(request.query, retrieval_results)
logger.info(f"[AC-ASA-20] ========== RAG LAB STREAM FULL PROMPT ==========")
logger.info("[AC-ASA-20] ========== RAG LAB STREAM FULL PROMPT ==========")
logger.info(f"[AC-ASA-20] Prompt length: {len(final_prompt)}")
logger.info(f"[AC-ASA-20] Prompt content:\n{final_prompt}")
logger.info(f"[AC-ASA-20] ==============================================")
logger.info("[AC-ASA-20] ==============================================")
yield f"event: retrieval\ndata: {json.dumps({'results': retrieval_results, 'count': len(retrieval_results)})}\n\n"
@ -276,10 +275,10 @@ async def _generate_ai_response(
"""
import time
logger.info(f"[AC-ASA-19] ========== RAG LAB FULL PROMPT ==========")
logger.info("[AC-ASA-19] ========== RAG LAB FULL PROMPT ==========")
logger.info(f"[AC-ASA-19] Prompt length: {len(prompt)}")
logger.info(f"[AC-ASA-19] Prompt content:\n{prompt}")
logger.info(f"[AC-ASA-19] ==========================================")
logger.info("[AC-ASA-19] ==========================================")
try:
manager = get_llm_config_manager()

View File

@ -4,12 +4,13 @@ Session monitoring and management endpoints.
"""
import logging
from typing import Annotated, Optional, Sequence
from collections.abc import Sequence
from datetime import datetime
from typing import Annotated, Optional
from fastapi import APIRouter, Depends, Query
from fastapi.responses import JSONResponse
from sqlalchemy import select, func
from sqlalchemy import func, select
from sqlalchemy.ext.asyncio import AsyncSession
from sqlmodel import col
@ -17,7 +18,7 @@ from app.core.database import get_session
from app.core.exceptions import MissingTenantIdException
from app.core.tenant import get_tenant_id
from app.models import ErrorResponse
from app.models.entities import ChatSession, ChatMessage, SessionStatus
from app.models.entities import ChatMessage, ChatSession, SessionStatus
logger = logging.getLogger(__name__)

View File

@ -8,14 +8,14 @@ from typing import Annotated, Any
from fastapi import APIRouter, Depends, Header, Request
from fastapi.responses import JSONResponse
from sse_starlette.sse import EventSourceResponse
from sqlalchemy.ext.asyncio import AsyncSession
from sse_starlette.sse import EventSourceResponse
from app.core.database import get_session
from app.core.middleware import get_response_mode, is_sse_request
from app.core.sse import SSEStateMachine, create_error_event
from app.core.tenant import get_tenant_id
from app.models import ChatRequest, ChatResponse, ErrorResponse
from app.models import ChatRequest, ErrorResponse
from app.services.memory import MemoryService
from app.services.orchestrator import OrchestratorService
@ -33,12 +33,12 @@ async def get_orchestrator_service_with_memory(
"""
from app.services.llm.factory import get_llm_config_manager
from app.services.retrieval.optimized_retriever import get_optimized_retriever
memory_service = MemoryService(session)
llm_config_manager = get_llm_config_manager()
llm_client = llm_config_manager.get_client()
retriever = await get_optimized_retriever()
return OrchestratorService(
llm_client=llm_client,
memory_service=memory_service,
@ -52,7 +52,7 @@ async def get_orchestrator_service_with_memory(
summary="Generate AI reply",
description="""
[AC-AISVC-01, AC-AISVC-02, AC-AISVC-06] Generate AI reply based on user message.
Response mode is determined by Accept header:
- Accept: text/event-stream -> SSE streaming response
- Other -> JSON response
@ -78,7 +78,7 @@ async def generate_reply(
) -> Any:
"""
[AC-AISVC-06] Generate AI reply with automatic response mode switching.
Based on Accept header:
- text/event-stream: Returns SSE stream with message/final/error events
- Other: Returns JSON ChatResponse
@ -134,11 +134,11 @@ async def _handle_streaming_request(
) -> EventSourceResponse:
"""
[AC-AISVC-06, AC-AISVC-07, AC-AISVC-08, AC-AISVC-09] Handle SSE streaming request.
SSE Event Sequence (per design.md Section 6.2):
- message* (0 or more) -> final (exactly 1) -> close
- OR message* (0 or more) -> error (exactly 1) -> close
State machine ensures:
- No events after final/error
- Only one final OR one error event

View File

@ -4,7 +4,7 @@ Core module - Configuration, dependencies, and utilities.
"""
from app.core.config import Settings, get_settings
from app.core.database import async_session_maker, get_session, init_db, close_db
from app.core.database import async_session_maker, close_db, get_session, init_db
from app.core.qdrant_client import QdrantClient, get_qdrant_client
__all__ = [

View File

@ -47,7 +47,7 @@ class Settings(BaseSettings):
rag_score_threshold: float = 0.01
rag_min_hits: int = 1
rag_max_evidence_tokens: int = 2000
rag_two_stage_enabled: bool = True
rag_two_stage_expand_factor: int = 10
rag_hybrid_enabled: bool = True
@ -60,6 +60,11 @@ class Settings(BaseSettings):
confidence_insufficient_penalty: float = 0.3
max_history_tokens: int = 4000
redis_url: str = "redis://localhost:6379/0"
redis_enabled: bool = True
dashboard_cache_ttl: int = 60
stats_counter_ttl: int = 7776000
@lru_cache
def get_settings() -> Settings:

View File

@ -4,13 +4,13 @@ Database client for AI Service.
"""
import logging
from typing import AsyncGenerator
from collections.abc import AsyncGenerator
from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker, create_async_engine
from sqlalchemy.pool import NullPool
from sqlmodel import SQLModel
from app.core.config import get_settings
from app.models import entities # noqa: F401 - Import to register all SQLModel tables
logger = logging.getLogger(__name__)

View File

@ -90,6 +90,9 @@ async def http_exception_handler(request: Request, exc: HTTPException) -> JSONRe
async def generic_exception_handler(request: Request, exc: Exception) -> JSONResponse:
import logging
logger = logging.getLogger(__name__)
logger.error(f"Unhandled exception: {type(exc).__name__}: {exc}", exc_info=True)
return JSONResponse(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
content=ErrorResponse(

View File

@ -5,13 +5,13 @@ Middleware for AI Service.
import logging
import re
from typing import Callable
from collections.abc import Callable
from fastapi import Request, Response, status
from fastapi.responses import JSONResponse
from starlette.middleware.base import BaseHTTPMiddleware
from app.core.exceptions import ErrorCode, ErrorResponse, MissingTenantIdException
from app.core.exceptions import ErrorCode, ErrorResponse
from app.core.tenant import clear_tenant_context, set_tenant_context
logger = logging.getLogger(__name__)
@ -52,7 +52,7 @@ def parse_tenant_id(tenant_id: str) -> tuple[str, str]:
class ApiKeyMiddleware(BaseHTTPMiddleware):
"""
[AC-AISVC-50] Middleware to validate API Key for all requests.
Features:
- Validates X-API-Key header against in-memory cache
- Skips validation for health/docs endpoints
@ -80,6 +80,16 @@ class ApiKeyMiddleware(BaseHTTPMiddleware):
from app.services.api_key import get_api_key_service
service = get_api_key_service()
if not service._initialized:
logger.warning("[AC-AISVC-50] API key service not initialized, attempting lazy initialization...")
try:
from app.core.database import async_session_maker
async with async_session_maker() as session:
await service.initialize(session)
logger.info(f"[AC-AISVC-50] API key service lazy initialized with {len(service._keys_cache)} keys")
except Exception as e:
logger.error(f"[AC-AISVC-50] Failed to initialize API key service: {e}")
if not service.validate_key(api_key):
logger.warning(f"[AC-AISVC-50] Invalid API key for {request.url.path}")
return JSONResponse(
@ -148,10 +158,16 @@ class TenantContextMiddleware(BaseHTTPMiddleware):
set_tenant_context(tenant_id)
request.state.tenant_id = tenant_id
logger.info(f"[AC-AISVC-10] Tenant context set: tenant_id={tenant_id}")
logger.info(f"[AC-AISVC-10] Tenant context set: tenant_id={tenant_id}, path={request.url.path}")
try:
logger.info(f"[MIDDLEWARE] Calling next handler for path={request.url.path}")
response = await call_next(request)
logger.info(f"[MIDDLEWARE] Response received for path={request.url.path}, status={response.status_code}")
except Exception as e:
import traceback
logger.error(f"[MIDDLEWARE] Exception in call_next for path={request.url.path}: {type(e).__name__}: {e}\n{traceback.format_exc()}")
raise
finally:
clear_tenant_context()
@ -162,7 +178,6 @@ class TenantContextMiddleware(BaseHTTPMiddleware):
[AC-AISVC-10] Ensure tenant exists in database, create if not.
"""
from sqlalchemy import select
from sqlalchemy.ext.asyncio import AsyncSession
from app.core.database import async_session_maker
from app.models.entities import Tenant

View File

@ -15,26 +15,26 @@ SYSTEM_PROMPT = """你是一名经验丰富的客服专员,名字叫"小N"。
def format_evidence_for_prompt(
retrieval_results: list,
max_results: int = 5,
retrieval_results: list,
max_results: int = 5,
max_content_length: int = 500
) -> str:
"""
Format retrieval results as evidence text for prompts.
Args:
retrieval_results: List of retrieval hits. Can be:
- dict format: {'content', 'score', 'source', 'metadata'}
- RetrievalHit object: with .text, .score, .source, .metadata attributes
max_results: Maximum number of results to include
max_content_length: Maximum length of each content snippet
Returns:
Formatted evidence text
"""
if not retrieval_results:
return ""
evidence_parts = []
for i, hit in enumerate(retrieval_results[:max_results]):
if hasattr(hit, 'text'):
@ -47,15 +47,15 @@ def format_evidence_for_prompt(
score = hit.get('score', 0)
source = hit.get('source', '知识库')
metadata = hit.get('metadata', {}) or {}
if len(content) > max_content_length:
content = content[:max_content_length] + '...'
nested_meta = metadata.get('metadata', {})
source_doc = nested_meta.get('source_doc', source) if nested_meta else source
category = nested_meta.get('category', '') if nested_meta else ''
department = nested_meta.get('department', '') if nested_meta else ''
header = f"[文档{i+1}]"
if source_doc and source_doc != "知识库":
header += f" 来源:{source_doc}"
@ -63,25 +63,25 @@ def format_evidence_for_prompt(
header += f" | 类别:{category}"
if department:
header += f" | 部门:{department}"
evidence_parts.append(f"{header}\n相关度:{score:.2f}\n内容:{content}")
return "\n\n".join(evidence_parts)
def build_system_prompt_with_evidence(evidence_text: str) -> str:
"""
Build system prompt with knowledge base evidence.
Args:
evidence_text: Formatted evidence from retrieval results
Returns:
Complete system prompt
"""
if not evidence_text:
return SYSTEM_PROMPT
return f"""{SYSTEM_PROMPT}
知识库参考内容
@ -91,11 +91,11 @@ def build_system_prompt_with_evidence(evidence_text: str) -> str:
def build_user_prompt_with_evidence(query: str, evidence_text: str) -> str:
"""
Build user prompt with knowledge base evidence (for single-message format).
Args:
query: User's question
evidence_text: Formatted evidence from retrieval results
Returns:
Complete user prompt
"""
@ -103,7 +103,7 @@ def build_user_prompt_with_evidence(query: str, evidence_text: str) -> str:
return f"""用户问题:{query}
未找到相关检索结果请基于通用知识回答用户问题"""
return f"""【系统指令】
{SYSTEM_PROMPT}

View File

@ -8,7 +8,7 @@ import logging
from typing import Any
from qdrant_client import AsyncQdrantClient
from qdrant_client.models import Distance, PointStruct, VectorParams, QueryRequest
from qdrant_client.models import Distance, PointStruct, VectorParams
from app.core.config import get_settings
@ -19,8 +19,12 @@ settings = get_settings()
class QdrantClient:
"""
[AC-AISVC-10] Qdrant client with tenant-isolated collection management.
Collection naming: kb_{tenantId} for tenant isolation.
[AC-AISVC-10, AC-AISVC-59] Qdrant client with tenant-isolated collection management.
Collection naming conventions:
- Legacy (single KB): kb_{tenantId}
- Multi-KB: kb_{tenantId}_{kbId}
Supports multi-dimensional vectors (256/512/768) for Matryoshka retrieval.
"""
@ -45,16 +49,42 @@ class QdrantClient:
def get_collection_name(self, tenant_id: str) -> str:
"""
[AC-AISVC-10] Get collection name for a tenant.
[AC-AISVC-10] Get legacy collection name for a tenant.
Naming convention: kb_{tenantId}
Replaces @ with _ to ensure valid collection names.
Note: This is kept for backward compatibility.
For multi-KB, use get_kb_collection_name() instead.
"""
safe_tenant_id = tenant_id.replace('@', '_')
return f"{self._collection_prefix}{safe_tenant_id}"
def get_kb_collection_name(self, tenant_id: str, kb_id: str | None = None) -> str:
"""
[AC-AISVC-59, AC-AISVC-63] Get collection name for a specific knowledge base.
Naming convention:
- If kb_id is None or "default": kb_{tenantId} (legacy format for backward compatibility)
- Otherwise: kb_{tenantId}_{kbId}
Args:
tenant_id: Tenant identifier
kb_id: Knowledge base ID (optional, defaults to legacy naming)
Returns:
Collection name for the knowledge base
"""
safe_tenant_id = tenant_id.replace('@', '_')
if kb_id is None or kb_id == "default" or kb_id == "":
return f"{self._collection_prefix}{safe_tenant_id}"
safe_kb_id = kb_id.replace('-', '_')[:8]
return f"{self._collection_prefix}{safe_tenant_id}_{safe_kb_id}"
async def ensure_collection_exists(self, tenant_id: str, use_multi_vector: bool = True) -> bool:
"""
[AC-AISVC-10] Ensure collection exists for tenant.
[AC-AISVC-10] Ensure collection exists for tenant (legacy single-KB mode).
Supports multi-dimensional vectors for Matryoshka retrieval.
"""
client = await self.get_client()
@ -84,7 +114,7 @@ class QdrantClient:
size=self._vector_size,
distance=Distance.COSINE,
)
await client.create_collection(
collection_name=collection_name,
vectors_config=vectors_config,
@ -98,16 +128,80 @@ class QdrantClient:
logger.error(f"[AC-AISVC-10] Error ensuring collection: {e}")
return False
async def ensure_kb_collection_exists(
self,
tenant_id: str,
kb_id: str | None = None,
use_multi_vector: bool = True,
) -> bool:
"""
[AC-AISVC-59] Ensure collection exists for a specific knowledge base.
Args:
tenant_id: Tenant identifier
kb_id: Knowledge base ID (optional, defaults to legacy naming)
use_multi_vector: Whether to use multi-dimensional vectors
Returns:
True if collection exists or was created successfully
"""
client = await self.get_client()
collection_name = self.get_kb_collection_name(tenant_id, kb_id)
try:
exists = await client.collection_exists(collection_name)
if not exists:
if use_multi_vector:
vectors_config = {
"full": VectorParams(
size=768,
distance=Distance.COSINE,
),
"dim_256": VectorParams(
size=256,
distance=Distance.COSINE,
),
"dim_512": VectorParams(
size=512,
distance=Distance.COSINE,
),
}
else:
vectors_config = VectorParams(
size=self._vector_size,
distance=Distance.COSINE,
)
await client.create_collection(
collection_name=collection_name,
vectors_config=vectors_config,
)
logger.info(
f"[AC-AISVC-59] Created KB collection: {collection_name} for tenant={tenant_id}, kb_id={kb_id} "
f"with multi_vector={use_multi_vector}"
)
return True
except Exception as e:
logger.error(f"[AC-AISVC-59] Error ensuring KB collection: {e}")
return False
async def upsert_vectors(
self,
tenant_id: str,
points: list[PointStruct],
kb_id: str | None = None,
) -> bool:
"""
[AC-AISVC-10] Upsert vectors into tenant's collection.
[AC-AISVC-10, AC-AISVC-63] Upsert vectors into tenant's collection.
Args:
tenant_id: Tenant identifier
points: List of PointStruct to upsert
kb_id: Knowledge base ID (optional, uses legacy naming if not provided)
"""
client = await self.get_client()
collection_name = self.get_collection_name(tenant_id)
collection_name = self.get_kb_collection_name(tenant_id, kb_id)
try:
await client.upsert(
@ -115,7 +209,7 @@ class QdrantClient:
points=points,
)
logger.info(
f"[AC-AISVC-10] Upserted {len(points)} vectors for tenant={tenant_id}"
f"[AC-AISVC-10] Upserted {len(points)} vectors for tenant={tenant_id}, kb_id={kb_id}"
)
return True
except Exception as e:
@ -126,10 +220,11 @@ class QdrantClient:
self,
tenant_id: str,
points: list[dict[str, Any]],
kb_id: str | None = None,
) -> bool:
"""
Upsert points with multi-dimensional vectors.
Args:
tenant_id: Tenant identifier
points: List of points with format:
@ -142,9 +237,10 @@ class QdrantClient:
},
"payload": dict
}
kb_id: Knowledge base ID (optional, uses legacy naming if not provided)
"""
client = await self.get_client()
collection_name = self.get_collection_name(tenant_id)
collection_name = self.get_kb_collection_name(tenant_id, kb_id)
try:
qdrant_points = []
@ -155,13 +251,13 @@ class QdrantClient:
payload=p.get("payload", {}),
)
qdrant_points.append(point)
await client.upsert(
collection_name=collection_name,
points=qdrant_points,
)
logger.info(
f"[RAG-OPT] Upserted {len(points)} multi-vector points for tenant={tenant_id}"
f"[RAG-OPT] Upserted {len(points)} multi-vector points for tenant={tenant_id}, kb_id={kb_id}"
)
return True
except Exception as e:
@ -181,7 +277,7 @@ class QdrantClient:
[AC-AISVC-10] Search vectors in tenant's collection.
Returns results with score >= score_threshold if specified.
Searches both old format (with @) and new format (with _) for backward compatibility.
Args:
tenant_id: Tenant identifier
query_vector: Query vector for similarity search
@ -192,31 +288,31 @@ class QdrantClient:
with_vectors: Whether to return vectors in results (for two-stage reranking)
"""
client = await self.get_client()
logger.info(
f"[AC-AISVC-10] Starting search: tenant_id={tenant_id}, "
f"limit={limit}, score_threshold={score_threshold}, vector_dim={len(query_vector)}, vector_name={vector_name}"
)
collection_names = [self.get_collection_name(tenant_id)]
if '@' in tenant_id:
old_format = f"{self._collection_prefix}{tenant_id}"
new_format = f"{self._collection_prefix}{tenant_id.replace('@', '_')}"
collection_names = [new_format, old_format]
logger.info(f"[AC-AISVC-10] Will search in collections: {collection_names}")
all_hits = []
for collection_name in collection_names:
try:
logger.info(f"[AC-AISVC-10] Searching in collection: {collection_name}")
exists = await client.collection_exists(collection_name)
if not exists:
logger.warning(f"[AC-AISVC-10] Collection {collection_name} does not exist")
continue
try:
results = await client.query_points(
collection_name=collection_name,
@ -241,7 +337,7 @@ class QdrantClient:
)
else:
raise
logger.info(
f"[AC-AISVC-10] Collection {collection_name} returned {len(results.points)} raw results"
)
@ -257,7 +353,7 @@ class QdrantClient:
hit["vector"] = result.vector
hits.append(hit)
all_hits.extend(hits)
if hits:
logger.info(
f"[AC-AISVC-10] Search in collection {collection_name}: {len(hits)} results for tenant={tenant_id}"
@ -277,17 +373,17 @@ class QdrantClient:
continue
all_hits = sorted(all_hits, key=lambda x: x["score"], reverse=True)[:limit]
logger.info(
f"[AC-AISVC-10] Search returned {len(all_hits)} total results for tenant={tenant_id}"
)
if len(all_hits) == 0:
logger.warning(
f"[AC-AISVC-10] No results found! tenant={tenant_id}, "
f"collections_tried={collection_names}, limit={limit}"
)
return all_hits
async def delete_collection(self, tenant_id: str) -> bool:
@ -306,6 +402,132 @@ class QdrantClient:
logger.error(f"[AC-AISVC-10] Error deleting collection: {e}")
return False
async def delete_kb_collection(self, tenant_id: str, kb_id: str) -> bool:
"""
[AC-AISVC-62] Delete a specific knowledge base's collection.
Args:
tenant_id: Tenant identifier
kb_id: Knowledge base ID
Returns:
True if collection was deleted successfully
"""
client = await self.get_client()
collection_name = self.get_kb_collection_name(tenant_id, kb_id)
try:
exists = await client.collection_exists(collection_name)
if exists:
await client.delete_collection(collection_name=collection_name)
logger.info(f"[AC-AISVC-62] Deleted KB collection: {collection_name} for kb_id={kb_id}")
else:
logger.info(f"[AC-AISVC-62] KB collection {collection_name} does not exist, nothing to delete")
return True
except Exception as e:
logger.error(f"[AC-AISVC-62] Error deleting KB collection: {e}")
return False
async def search_kb(
self,
tenant_id: str,
query_vector: list[float],
kb_ids: list[str] | None = None,
limit: int = 5,
score_threshold: float | None = None,
vector_name: str = "full",
with_vectors: bool = False,
) -> list[dict[str, Any]]:
"""
[AC-AISVC-64] Search vectors across multiple knowledge base collections.
Args:
tenant_id: Tenant identifier
query_vector: Query vector for similarity search
kb_ids: List of knowledge base IDs to search. If None, searches legacy collection.
limit: Maximum number of results per collection
score_threshold: Minimum score threshold for results
vector_name: Name of the vector to search
with_vectors: Whether to return vectors in results
Returns:
Combined and sorted results from all collections
"""
client = await self.get_client()
if kb_ids is None or len(kb_ids) == 0:
return await self.search(
tenant_id=tenant_id,
query_vector=query_vector,
limit=limit,
score_threshold=score_threshold,
vector_name=vector_name,
with_vectors=with_vectors,
)
logger.info(
f"[AC-AISVC-64] Starting multi-KB search: tenant_id={tenant_id}, "
f"kb_ids={kb_ids}, limit={limit}, score_threshold={score_threshold}"
)
all_hits = []
for kb_id in kb_ids:
collection_name = self.get_kb_collection_name(tenant_id, kb_id)
try:
exists = await client.collection_exists(collection_name)
if not exists:
logger.warning(f"[AC-AISVC-64] Collection {collection_name} does not exist")
continue
try:
results = await client.query_points(
collection_name=collection_name,
query=query_vector,
using=vector_name,
limit=limit,
with_vectors=with_vectors,
score_threshold=score_threshold,
)
except Exception as e:
if "vector name" in str(e).lower() or "Not existing vector" in str(e) or "using" in str(e).lower():
results = await client.query_points(
collection_name=collection_name,
query=query_vector,
limit=limit,
with_vectors=with_vectors,
score_threshold=score_threshold,
)
else:
raise
for result in results.points:
hit = {
"id": str(result.id),
"score": result.score,
"payload": result.payload or {},
"kb_id": kb_id,
}
if with_vectors and result.vector:
hit["vector"] = result.vector
all_hits.append(hit)
logger.info(
f"[AC-AISVC-64] Collection {collection_name} returned {len(results.points)} results"
)
except Exception as e:
logger.warning(f"[AC-AISVC-64] Error searching collection {collection_name}: {e}")
continue
all_hits = sorted(all_hits, key=lambda x: x["score"], reverse=True)[:limit]
logger.info(
f"[AC-AISVC-64] Multi-KB search returned {len(all_hits)} total results"
)
return all_hits
_qdrant_client: QdrantClient | None = None

View File

@ -6,8 +6,9 @@ SSE utilities for AI Service.
import asyncio
import json
import logging
from collections.abc import AsyncGenerator
from enum import Enum
from typing import Any, AsyncGenerator
from typing import Any
from sse_starlette.sse import EventSourceResponse, ServerSentEvent
@ -43,7 +44,7 @@ class SSEStateMachine:
async with self._lock:
if self._state == SSEState.INIT:
self._state = SSEState.STREAMING
logger.debug(f"[AC-AISVC-07] SSE state transition: INIT -> STREAMING")
logger.debug("[AC-AISVC-07] SSE state transition: INIT -> STREAMING")
return True
return False
@ -51,7 +52,7 @@ class SSEStateMachine:
async with self._lock:
if self._state == SSEState.STREAMING:
self._state = SSEState.FINAL_SENT
logger.debug(f"[AC-AISVC-08] SSE state transition: STREAMING -> FINAL_SENT")
logger.debug("[AC-AISVC-08] SSE state transition: STREAMING -> FINAL_SENT")
return True
return False

View File

@ -21,7 +21,6 @@ class ChatSession(SQLModel, table=True):
__tablename__ = "chat_sessions"
__table_args__ = (
Index("ix_chat_sessions_tenant_session", "tenant_id", "session_id", unique=True),
Index("ix_chat_sessions_tenant_id", "tenant_id"),
)
id: uuid.UUID = Field(default_factory=uuid.uuid4, primary_key=True)
@ -41,12 +40,16 @@ class ChatMessage(SQLModel, table=True):
"""
[AC-AISVC-13] Chat message entity with tenant isolation.
Messages are scoped by (tenant_id, session_id) for multi-tenant security.
[v0.7.0] Extended with monitoring fields for Dashboard statistics.
"""
__tablename__ = "chat_messages"
__table_args__ = (
Index("ix_chat_messages_tenant_session", "tenant_id", "session_id"),
Index("ix_chat_messages_tenant_session_created", "tenant_id", "session_id", "created_at"),
Index("ix_chat_messages_tenant_template", "tenant_id", "prompt_template_id"),
Index("ix_chat_messages_tenant_intent", "tenant_id", "intent_rule_id"),
Index("ix_chat_messages_tenant_flow", "tenant_id", "flow_instance_id"),
)
id: uuid.UUID = Field(default_factory=uuid.uuid4, primary_key=True)
@ -63,6 +66,31 @@ class ChatMessage(SQLModel, table=True):
error_message: str | None = Field(default=None, description="Error message if any")
created_at: datetime = Field(default_factory=datetime.utcnow, description="Message creation time")
prompt_template_id: uuid.UUID | None = Field(
default=None,
description="[v0.7.0] ID of the Prompt template used",
foreign_key="prompt_templates.id",
)
intent_rule_id: uuid.UUID | None = Field(
default=None,
description="[v0.7.0] ID of the Intent rule that matched",
foreign_key="intent_rules.id",
)
flow_instance_id: uuid.UUID | None = Field(
default=None,
description="[v0.7.0] ID of the Flow instance if flow was active",
foreign_key="flow_instances.id",
)
guardrail_triggered: bool = Field(
default=False,
description="[v0.7.0] Whether output guardrail was triggered"
)
guardrail_words: dict[str, Any] | None = Field(
default=None,
sa_column=Column("guardrail_words", JSON, nullable=True),
description="[v0.7.0] Guardrail trigger details: words, categories, strategy"
)
class ChatSessionCreate(SQLModel):
"""Schema for creating a new chat session."""
@ -134,7 +162,6 @@ class KnowledgeBase(SQLModel, table=True):
__tablename__ = "knowledge_bases"
__table_args__ = (
Index("ix_knowledge_bases_tenant_id", "tenant_id"),
Index("ix_knowledge_bases_tenant_kb_type", "tenant_id", "kb_type"),
)
@ -267,7 +294,6 @@ class PromptTemplate(SQLModel, table=True):
__tablename__ = "prompt_templates"
__table_args__ = (
Index("ix_prompt_templates_tenant_scene", "tenant_id", "scene"),
Index("ix_prompt_templates_tenant_id", "tenant_id"),
)
id: uuid.UUID = Field(default_factory=uuid.uuid4, primary_key=True)
@ -354,7 +380,6 @@ class IntentRule(SQLModel, table=True):
__tablename__ = "intent_rules"
__table_args__ = (
Index("ix_intent_rules_tenant_enabled_priority", "tenant_id", "is_enabled"),
Index("ix_intent_rules_tenant_id", "tenant_id"),
)
id: uuid.UUID = Field(default_factory=uuid.uuid4, primary_key=True)
@ -465,7 +490,6 @@ class ForbiddenWord(SQLModel, table=True):
__tablename__ = "forbidden_words"
__table_args__ = (
Index("ix_forbidden_words_tenant_enabled", "tenant_id", "is_enabled"),
Index("ix_forbidden_words_tenant_id", "tenant_id"),
)
id: uuid.UUID = Field(default_factory=uuid.uuid4, primary_key=True)
@ -519,7 +543,6 @@ class BehaviorRule(SQLModel, table=True):
__tablename__ = "behavior_rules"
__table_args__ = (
Index("ix_behavior_rules_tenant_enabled", "tenant_id", "is_enabled"),
Index("ix_behavior_rules_tenant_id", "tenant_id"),
)
id: uuid.UUID = Field(default_factory=uuid.uuid4, primary_key=True)
@ -624,7 +647,6 @@ class ScriptFlow(SQLModel, table=True):
__tablename__ = "script_flows"
__table_args__ = (
Index("ix_script_flows_tenant_id", "tenant_id"),
Index("ix_script_flows_tenant_enabled", "tenant_id", "is_enabled"),
)
@ -746,3 +768,121 @@ class FlowAdvanceResult:
if self.timeout_action is not None:
result["timeout_action"] = self.timeout_action
return result
class FlowTestRecordStatus(str, Enum):
"""[AC-AISVC-93] Flow test record status."""
SUCCESS = "success"
PARTIAL = "partial"
FAILED = "failed"
class FlowTestRecord(SQLModel, table=True):
"""
[AC-AISVC-93] Flow test record entity for complete 12-step execution logging.
Records are retained for 7 days (TTL cleanup via background task).
"""
__tablename__ = "flow_test_records"
__table_args__ = (
Index("ix_flow_test_records_tenant_created", "tenant_id", "created_at"),
Index("ix_flow_test_records_session", "session_id"),
)
id: uuid.UUID = Field(default_factory=uuid.uuid4, primary_key=True)
tenant_id: str = Field(..., description="Tenant ID for multi-tenant isolation", index=True)
session_id: str = Field(..., description="Session ID for test session")
status: str = Field(
default=FlowTestRecordStatus.SUCCESS.value,
description="Overall status: success/partial/failed"
)
steps: list[dict[str, Any]] = Field(
default=[],
sa_column=Column("steps", JSON, nullable=False),
description="12-step execution logs with step, name, status, duration_ms, input, output, error, metadata"
)
final_response: dict[str, Any] | None = Field(
default=None,
sa_column=Column("final_response", JSON, nullable=True),
description="Final ChatResponse with reply, confidence, should_transfer"
)
total_duration_ms: int | None = Field(default=None, description="Total execution time in milliseconds")
created_at: datetime = Field(default_factory=datetime.utcnow, description="Record creation time", index=True)
class FlowTestStepResult(SQLModel):
"""[AC-AISVC-93] Schema for a single step result in flow test."""
step: int = Field(..., description="Step number (1-12)")
name: str = Field(..., description="Step name")
status: str = Field(..., description="Execution status: success/failed/skipped")
duration_ms: int | None = Field(default=None, description="Execution time in milliseconds")
input: dict[str, Any] | None = Field(default=None, description="Step input data")
output: dict[str, Any] | None = Field(default=None, description="Step output data")
error: str | None = Field(default=None, description="Error message if failed")
step_metadata: dict[str, Any] | None = Field(default=None, description="Step metadata (matched rule, template, etc.)")
class ExportTaskStatus(str, Enum):
"""[AC-AISVC-110] Export task status."""
PROCESSING = "processing"
COMPLETED = "completed"
FAILED = "failed"
class ExportTask(SQLModel, table=True):
"""
[AC-AISVC-110] Export task entity for conversation export.
Supports async export with file download.
"""
__tablename__ = "export_tasks"
__table_args__ = (
Index("ix_export_tasks_tenant_status", "tenant_id", "status"),
Index("ix_export_tasks_tenant_created", "tenant_id", "created_at"),
)
id: uuid.UUID = Field(default_factory=uuid.uuid4, primary_key=True)
tenant_id: str = Field(..., description="Tenant ID for multi-tenant isolation", index=True)
status: str = Field(
default=ExportTaskStatus.PROCESSING.value,
description="Task status: processing/completed/failed"
)
file_path: str | None = Field(default=None, description="Path to exported file")
file_name: str | None = Field(default=None, description="Generated file name")
file_size: int | None = Field(default=None, description="File size in bytes")
total_rows: int | None = Field(default=None, description="Total rows exported")
format: str = Field(default="json", description="Export format: json/csv")
filters: dict[str, Any] | None = Field(
default=None,
sa_column=Column("filters", JSON, nullable=True),
description="Export filters applied"
)
error_message: str | None = Field(default=None, description="Error message if failed")
expires_at: datetime | None = Field(default=None, description="File expiration time (for cleanup)")
created_at: datetime = Field(default_factory=datetime.utcnow, description="Task creation time")
completed_at: datetime | None = Field(default=None, description="Completion time")
class ExportTaskCreate(SQLModel):
"""[AC-AISVC-110] Schema for creating an export task."""
format: str = "json"
filters: dict[str, Any] | None = None
class ConversationDetail(SQLModel):
"""[AC-AISVC-109] Schema for conversation detail with execution chain."""
conversation_id: uuid.UUID
session_id: str
user_message: str
ai_reply: str | None = None
triggered_rules: list[dict[str, Any]] | None = None
used_template: dict[str, Any] | None = None
used_flow: dict[str, Any] | None = None
execution_time_ms: int | None = None
confidence: float | None = None
should_transfer: bool = False
execution_steps: list[dict[str, Any]] | None = None
created_at: datetime

View File

@ -19,7 +19,7 @@ logger = logging.getLogger(__name__)
class ApiKeyService:
"""
[AC-AISVC-50] API Key management service.
Features:
- In-memory cache for fast validation
- Database persistence
@ -39,49 +39,49 @@ class ApiKeyService:
select(ApiKey).where(ApiKey.is_active == True)
)
keys = result.scalars().all()
self._keys_cache = {key.key for key in keys}
self._initialized = True
logger.info(f"[AC-AISVC-50] Loaded {len(self._keys_cache)} API keys into memory")
def validate_key(self, key: str) -> bool:
"""
Validate an API key against the in-memory cache.
Args:
key: The API key to validate
Returns:
True if the key is valid, False otherwise
"""
if not self._initialized:
logger.warning("[AC-AISVC-50] API key service not initialized")
return False
return key in self._keys_cache
def generate_key(self) -> str:
"""
Generate a new secure API key.
Returns:
A URL-safe random string
"""
return secrets.token_urlsafe(32)
async def create_key(
self,
session: AsyncSession,
self,
session: AsyncSession,
key_create: ApiKeyCreate
) -> ApiKey:
"""
Create a new API key.
Args:
session: Database session
key_create: Key creation data
Returns:
The created ApiKey entity
"""
@ -90,139 +90,139 @@ class ApiKeyService:
name=key_create.name,
is_active=key_create.is_active,
)
session.add(api_key)
await session.commit()
await session.refresh(api_key)
if api_key.is_active:
self._keys_cache.add(api_key.key)
logger.info(f"[AC-AISVC-50] Created API key: {api_key.name}")
return api_key
async def create_default_key(self, session: AsyncSession) -> Optional[ApiKey]:
"""
Create a default API key if none exists.
Returns:
The created ApiKey or None if keys already exist
"""
result = await session.execute(select(ApiKey).limit(1))
existing = result.scalar_one_or_none()
if existing:
return None
default_key = secrets.token_urlsafe(32)
api_key = ApiKey(
key=default_key,
name="Default API Key",
is_active=True,
)
session.add(api_key)
await session.commit()
await session.refresh(api_key)
self._keys_cache.add(api_key.key)
logger.info(f"[AC-AISVC-50] Created default API key: {api_key.key}")
return api_key
async def delete_key(
self,
session: AsyncSession,
self,
session: AsyncSession,
key_id: str
) -> bool:
"""
Delete an API key.
Args:
session: Database session
key_id: The key ID to delete
Returns:
True if deleted, False if not found
"""
import uuid
try:
key_uuid = uuid.UUID(key_id)
except ValueError:
return False
result = await session.execute(
select(ApiKey).where(ApiKey.id == key_uuid)
)
api_key = result.scalar_one_or_none()
if not api_key:
return False
key_value = api_key.key
await session.delete(api_key)
await session.commit()
self._keys_cache.discard(key_value)
logger.info(f"[AC-AISVC-50] Deleted API key: {api_key.name}")
return True
async def toggle_key(
self,
session: AsyncSession,
self,
session: AsyncSession,
key_id: str,
is_active: bool
) -> Optional[ApiKey]:
"""
Toggle API key active status.
Args:
session: Database session
key_id: The key ID to toggle
is_active: New active status
Returns:
The updated ApiKey or None if not found
"""
import uuid
try:
key_uuid = uuid.UUID(key_id)
except ValueError:
return None
result = await session.execute(
select(ApiKey).where(ApiKey.id == key_uuid)
)
api_key = result.scalar_one_or_none()
if not api_key:
return None
api_key.is_active = is_active
api_key.updated_at = datetime.utcnow()
session.add(api_key)
await session.commit()
await session.refresh(api_key)
if is_active:
self._keys_cache.add(api_key.key)
else:
self._keys_cache.discard(api_key.key)
logger.info(f"[AC-AISVC-50] Toggled API key {api_key.name}: active={is_active}")
return api_key
async def list_keys(self, session: AsyncSession) -> list[ApiKey]:
"""
List all API keys.
Args:
session: Database session
Returns:
List of all ApiKey entities
"""

View File

@ -17,7 +17,7 @@ from typing import Any
import tiktoken
from app.core.config import get_settings
from app.models import ChatMessage, Role
from app.models import ChatMessage
logger = logging.getLogger(__name__)

View File

@ -89,7 +89,7 @@ class DocumentParser(ABC):
class DocumentParseException(Exception):
"""Exception raised when document parsing fails."""
def __init__(
self,
message: str,
@ -105,7 +105,7 @@ class DocumentParseException(Exception):
class UnsupportedFormatError(DocumentParseException):
"""Exception raised when file format is not supported."""
def __init__(self, extension: str, supported: list[str]):
super().__init__(
f"Unsupported file format: {extension}. "

View File

@ -58,23 +58,23 @@ class ExcelParser(DocumentParser):
"""
records = []
rows = list(sheet.iter_rows(max_row=self._max_rows_per_sheet, values_only=True))
if not rows:
return records
headers = rows[0]
header_list = [str(h) if h is not None else f"column_{i}" for i, h in enumerate(headers)]
for row in rows[1:]:
record = {"_sheet": sheet_name}
has_content = False
for i, value in enumerate(row):
if i < len(header_list):
key = header_list[i]
else:
key = f"column_{i}"
if value is not None:
has_content = True
if isinstance(value, (int, float, bool)):
@ -83,10 +83,10 @@ class ExcelParser(DocumentParser):
record[key] = str(value)
elif self._include_empty_cells:
record[key] = None
if has_content or self._include_empty_cells:
records.append(record)
return records
def parse(self, file_path: str | Path) -> ParseResult:
@ -95,46 +95,46 @@ class ExcelParser(DocumentParser):
[AC-AISVC-35] Converts spreadsheet data to JSON format.
"""
path = Path(file_path)
if not path.exists():
raise DocumentParseException(
f"File not found: {path}",
file_path=str(path),
parser="excel"
)
if not self.supports_extension(path.suffix):
raise DocumentParseException(
f"Unsupported file extension: {path.suffix}",
file_path=str(path),
parser="excel"
)
openpyxl = self._get_openpyxl()
try:
workbook = openpyxl.load_workbook(path, read_only=True, data_only=True)
all_records: list[dict[str, Any]] = []
sheet_count = len(workbook.sheetnames)
total_rows = 0
for sheet_name in workbook.sheetnames:
sheet = workbook[sheet_name]
records = self._sheet_to_records(sheet, sheet_name)
all_records.extend(records)
total_rows += len(records)
workbook.close()
json_str = json.dumps(all_records, ensure_ascii=False, indent=2)
file_size = path.stat().st_size
logger.info(
f"Parsed Excel (JSON): {path.name}, sheets={sheet_count}, "
f"rows={total_rows}, chars={len(json_str)}, size={file_size}"
)
return ParseResult(
text=json_str,
source_path=str(path),
@ -146,7 +146,7 @@ class ExcelParser(DocumentParser):
"total_rows": total_rows,
}
)
except DocumentParseException:
raise
except Exception as e:
@ -177,36 +177,36 @@ class CSVParser(DocumentParser):
def _parse_csv_to_records(self, path: Path, encoding: str) -> list[dict[str, Any]]:
"""Parse CSV file and return list of record dictionaries."""
import csv
records = []
with open(path, "r", encoding=encoding, newline="") as f:
with open(path, encoding=encoding, newline="") as f:
reader = csv.reader(f, delimiter=self._delimiter)
rows = list(reader)
if not rows:
return records
headers = rows[0]
header_list = [str(h) if h else f"column_{i}" for i, h in enumerate(headers)]
for row in rows[1:]:
record = {}
has_content = False
for i, value in enumerate(row):
if i < len(header_list):
key = header_list[i]
else:
key = f"column_{i}"
if value:
has_content = True
record[key] = value
if has_content:
records.append(record)
return records
def parse(self, file_path: str | Path) -> ParseResult:
@ -215,14 +215,14 @@ class CSVParser(DocumentParser):
[AC-AISVC-35] Converts CSV data to JSON format.
"""
path = Path(file_path)
if not path.exists():
raise DocumentParseException(
f"File not found: {path}",
file_path=str(path),
parser="csv"
)
try:
records = self._parse_csv_to_records(path, self._encoding)
row_count = len(records)
@ -246,15 +246,15 @@ class CSVParser(DocumentParser):
parser="csv",
details={"error": str(e)}
)
json_str = json.dumps(records, ensure_ascii=False, indent=2)
file_size = path.stat().st_size
logger.info(
f"Parsed CSV (JSON): {path.name}, rows={row_count}, "
f"chars={len(json_str)}, size={file_size}"
)
return ParseResult(
text=json_str,
source_path=str(path),

View File

@ -7,11 +7,11 @@ Design reference: progress.md Section 7.2 - DocumentParserFactory
import logging
from pathlib import Path
from typing import Any, Type
from typing import Any
from app.services.document.base import (
DocumentParser,
DocumentParseException,
DocumentParser,
ParseResult,
UnsupportedFormatError,
)
@ -29,7 +29,7 @@ class DocumentParserFactory:
[AC-AISVC-33, AC-AISVC-34, AC-AISVC-35] Auto-selects parser based on file extension.
"""
_parsers: dict[str, Type[DocumentParser]] = {}
_parsers: dict[str, type[DocumentParser]] = {}
_extension_map: dict[str, str] = {}
@classmethod
@ -37,7 +37,7 @@ class DocumentParserFactory:
"""Initialize default parsers."""
if cls._parsers:
return
cls._parsers = {
"pdf": PDFParser,
"pdfplumber": PDFPlumberParser,
@ -46,7 +46,7 @@ class DocumentParserFactory:
"csv": CSVParser,
"text": TextParser,
}
cls._extension_map = {
".pdf": "pdf",
".docx": "word",
@ -68,7 +68,7 @@ class DocumentParserFactory:
def register_parser(
cls,
name: str,
parser_class: Type[DocumentParser],
parser_class: type[DocumentParser],
extensions: list[str],
) -> None:
"""
@ -97,17 +97,17 @@ class DocumentParserFactory:
[AC-AISVC-33] Creates appropriate parser based on extension.
"""
cls._initialize()
normalized = extension.lower()
if not normalized.startswith("."):
normalized = f".{normalized}"
if normalized not in cls._extension_map:
raise UnsupportedFormatError(normalized, cls.get_supported_extensions())
parser_name = cls._extension_map[normalized]
parser_class = cls._parsers[parser_name]
return parser_class()
@classmethod
@ -120,24 +120,24 @@ class DocumentParserFactory:
"""
Parse a document file.
[AC-AISVC-33, AC-AISVC-34, AC-AISVC-35] Main entry point for parsing.
Args:
file_path: Path to the document file
parser_name: Optional specific parser to use
parser_config: Optional configuration for the parser
Returns:
ParseResult with extracted text and metadata
Raises:
UnsupportedFormatError: If file format is not supported
DocumentParseException: If parsing fails
"""
cls._initialize()
path = Path(file_path)
extension = path.suffix.lower()
if parser_name:
if parser_name not in cls._parsers:
raise DocumentParseException(
@ -151,7 +151,7 @@ class DocumentParserFactory:
parser = cls.get_parser_for_extension(extension)
if parser_config:
parser = type(parser)(**parser_config)
return parser.parse(path)
@classmethod
@ -161,12 +161,12 @@ class DocumentParserFactory:
[AC-AISVC-37] Returns parser metadata.
"""
cls._initialize()
info = []
for name, parser_class in cls._parsers.items():
temp_instance = parser_class.__new__(parser_class)
extensions = temp_instance.get_supported_extensions()
display_names = {
"pdf": "PDF 文档",
"pdfplumber": "PDF 文档 (pdfplumber)",
@ -175,7 +175,7 @@ class DocumentParserFactory:
"csv": "CSV 文件",
"text": "文本文件",
}
descriptions = {
"pdf": "使用 PyMuPDF 解析 PDF 文档,速度快",
"pdfplumber": "使用 pdfplumber 解析 PDF 文档,表格提取效果更好",
@ -184,14 +184,14 @@ class DocumentParserFactory:
"csv": "解析 CSV 文件,自动检测编码",
"text": "解析纯文本文件,支持多种编码",
}
info.append({
"name": name,
"display_name": display_names.get(name, name),
"description": descriptions.get(name, ""),
"extensions": extensions,
})
return info

View File

@ -49,47 +49,47 @@ class PDFParser(DocumentParser):
[AC-AISVC-33] Extracts text from all pages.
"""
path = Path(file_path)
if not path.exists():
raise DocumentParseException(
f"File not found: {path}",
file_path=str(path),
parser="pdf"
)
if not self.supports_extension(path.suffix):
raise DocumentParseException(
f"Unsupported file extension: {path.suffix}",
file_path=str(path),
parser="pdf"
)
fitz = self._get_fitz()
try:
doc = fitz.open(path)
pages: list[PageText] = []
text_parts = []
page_count = len(doc)
for page_num in range(page_count):
page = doc[page_num]
text = page.get_text().strip()
if text:
pages.append(PageText(page=page_num + 1, text=text))
text_parts.append(f"[Page {page_num + 1}]\n{text}")
doc.close()
full_text = "\n\n".join(text_parts)
file_size = path.stat().st_size
logger.info(
f"Parsed PDF: {path.name}, pages={page_count}, "
f"chars={len(full_text)}, size={file_size}"
)
return ParseResult(
text=full_text,
source_path=str(path),
@ -101,7 +101,7 @@ class PDFParser(DocumentParser):
},
pages=pages,
)
except DocumentParseException:
raise
except Exception as e:
@ -121,7 +121,7 @@ class PDFPlumberParser(DocumentParser):
"""
Alternative PDF parser using pdfplumber.
[AC-AISVC-33] Uses pdfplumber for text extraction.
pdfplumber is better for table extraction but slower than PyMuPDF.
"""
@ -149,46 +149,46 @@ class PDFPlumberParser(DocumentParser):
[AC-AISVC-33] Extracts text and optionally tables.
"""
path = Path(file_path)
if not path.exists():
raise DocumentParseException(
f"File not found: {path}",
file_path=str(path),
parser="pdfplumber"
)
pdfplumber = self._get_pdfplumber()
try:
pages: list[PageText] = []
text_parts = []
page_count = 0
with pdfplumber.open(path) as pdf:
page_count = len(pdf.pages)
for page_num, page in enumerate(pdf.pages):
text = page.extract_text() or ""
if self._extract_tables:
tables = page.extract_tables()
for table in tables:
table_text = self._format_table(table)
text += f"\n\n{table_text}"
text = text.strip()
if text:
pages.append(PageText(page=page_num + 1, text=text))
text_parts.append(f"[Page {page_num + 1}]\n{text}")
full_text = "\n\n".join(text_parts)
file_size = path.stat().st_size
logger.info(
f"Parsed PDF (pdfplumber): {path.name}, pages={page_count}, "
f"chars={len(full_text)}, size={file_size}"
)
return ParseResult(
text=full_text,
source_path=str(path),
@ -201,7 +201,7 @@ class PDFPlumberParser(DocumentParser):
},
pages=pages,
)
except DocumentParseException:
raise
except Exception as e:
@ -216,12 +216,12 @@ class PDFPlumberParser(DocumentParser):
"""Format a table as text."""
if not table:
return ""
lines = []
for row in table:
cells = [str(cell) if cell else "" for cell in row]
lines.append(" | ".join(cells))
return "\n".join(lines)
def get_supported_extensions(self) -> list[str]:

View File

@ -35,15 +35,15 @@ class TextParser(DocumentParser):
"""
for enc in ENCODINGS_TO_TRY:
try:
with open(path, "r", encoding=enc) as f:
with open(path, encoding=enc) as f:
text = f.read()
logger.info(f"Successfully parsed with encoding: {enc}")
return text, enc
except (UnicodeDecodeError, LookupError):
continue
raise DocumentParseException(
f"Failed to decode file with any known encoding",
"Failed to decode file with any known encoding",
file_path=str(path),
parser="text"
)
@ -54,25 +54,25 @@ class TextParser(DocumentParser):
[AC-AISVC-33] Direct file reading.
"""
path = Path(file_path)
if not path.exists():
raise DocumentParseException(
f"File not found: {path}",
file_path=str(path),
parser="text"
)
try:
text, encoding_used = self._try_encodings(path)
file_size = path.stat().st_size
line_count = text.count("\n") + 1
logger.info(
f"Parsed text: {path.name}, lines={line_count}, "
f"chars={len(text)}, size={file_size}, encoding={encoding_used}"
)
return ParseResult(
text=text,
source_path=str(path),
@ -83,7 +83,7 @@ class TextParser(DocumentParser):
"encoding": encoding_used,
}
)
except DocumentParseException:
raise
except Exception as e:

View File

@ -49,28 +49,28 @@ class WordParser(DocumentParser):
[AC-AISVC-34] Extracts text while preserving paragraph structure.
"""
path = Path(file_path)
if not path.exists():
raise DocumentParseException(
f"File not found: {path}",
file_path=str(path),
parser="word"
)
if not self.supports_extension(path.suffix):
raise DocumentParseException(
f"Unsupported file extension: {path.suffix}",
file_path=str(path),
parser="word"
)
Document = self._get_docx()
try:
doc = Document(path)
text_parts = []
if self._include_headers:
for section in doc.sections:
header = section.header
@ -78,7 +78,7 @@ class WordParser(DocumentParser):
header_text = "\n".join(p.text for p in header.paragraphs if p.text.strip())
if header_text:
text_parts.append(f"[Header]\n{header_text}")
for para in doc.paragraphs:
if para.text.strip():
style_name = para.style.name if para.style else ""
@ -86,12 +86,12 @@ class WordParser(DocumentParser):
text_parts.append(f"\n## {para.text}")
else:
text_parts.append(para.text)
for table in doc.tables:
table_text = self._format_table(table)
if table_text.strip():
text_parts.append(f"\n[Table]\n{table_text}")
if self._include_footers:
for section in doc.sections:
footer = section.footer
@ -99,18 +99,18 @@ class WordParser(DocumentParser):
footer_text = "\n".join(p.text for p in footer.paragraphs if p.text.strip())
if footer_text:
text_parts.append(f"[Footer]\n{footer_text}")
full_text = "\n\n".join(text_parts)
file_size = path.stat().st_size
paragraph_count = len(doc.paragraphs)
table_count = len(doc.tables)
logger.info(
f"Parsed Word: {path.name}, paragraphs={paragraph_count}, "
f"tables={table_count}, chars={len(full_text)}, size={file_size}"
)
return ParseResult(
text=full_text,
source_path=str(path),
@ -121,7 +121,7 @@ class WordParser(DocumentParser):
"table_count": table_count,
}
)
except DocumentParseException:
raise
except Exception as e:

View File

@ -15,13 +15,13 @@ from app.services.embedding.factory import (
get_embedding_config_manager,
get_embedding_provider,
)
from app.services.embedding.ollama_provider import OllamaEmbeddingProvider
from app.services.embedding.openai_provider import OpenAIEmbeddingProvider
from app.services.embedding.nomic_provider import (
EmbeddingTask,
NomicEmbeddingProvider,
NomicEmbeddingResult,
EmbeddingTask,
)
from app.services.embedding.ollama_provider import OllamaEmbeddingProvider
from app.services.embedding.openai_provider import OpenAIEmbeddingProvider
__all__ = [
"EmbeddingConfig",

View File

@ -123,7 +123,7 @@ class EmbeddingProvider(ABC):
class EmbeddingException(Exception):
"""Exception raised when embedding generation fails."""
def __init__(self, message: str, provider: str = "", details: dict[str, Any] | None = None):
self.provider = provider
self.details = details or {}

View File

@ -10,12 +10,12 @@ Design reference: progress.md Section 7.1 - Architecture
import json
import logging
from pathlib import Path
from typing import Any, Type
from typing import Any
from app.services.embedding.base import EmbeddingException, EmbeddingProvider
from app.services.embedding.nomic_provider import NomicEmbeddingProvider
from app.services.embedding.ollama_provider import OllamaEmbeddingProvider
from app.services.embedding.openai_provider import OpenAIEmbeddingProvider
from app.services.embedding.nomic_provider import NomicEmbeddingProvider
logger = logging.getLogger(__name__)
@ -28,14 +28,14 @@ class EmbeddingProviderFactory:
[AC-AISVC-30] Supports dynamic loading based on configuration.
"""
_providers: dict[str, Type[EmbeddingProvider]] = {
_providers: dict[str, type[EmbeddingProvider]] = {
"ollama": OllamaEmbeddingProvider,
"openai": OpenAIEmbeddingProvider,
"nomic": NomicEmbeddingProvider,
}
@classmethod
def register_provider(cls, name: str, provider_class: Type[EmbeddingProvider]) -> None:
def register_provider(cls, name: str, provider_class: type[EmbeddingProvider]) -> None:
"""
Register a new embedding provider.
[AC-AISVC-30] Allows runtime registration of providers.
@ -62,24 +62,24 @@ class EmbeddingProviderFactory:
f"Unknown provider: {name}",
provider="factory"
)
provider_class = cls._providers[name]
temp_instance = provider_class.__new__(provider_class)
display_names = {
"ollama": "Ollama 本地模型",
"openai": "OpenAI Embedding",
"nomic": "Nomic Embed (优化版)",
}
descriptions = {
"ollama": "使用 Ollama 运行的本地嵌入模型,支持 nomic-embed-text 等开源模型",
"openai": "使用 OpenAI 官方 Embedding API支持 text-embedding-3 系列模型",
"nomic": "Nomic-embed-text v1.5 优化版,支持任务前缀和 Matryoshka 维度截断专为RAG优化",
}
raw_schema = temp_instance.get_config_schema()
properties = {}
required = []
for key, field in raw_schema.items():
@ -97,14 +97,14 @@ class EmbeddingProviderFactory:
properties[key]["maximum"] = field.get("maximum")
if field.get("required"):
required.append(key)
config_schema = {
"type": "object",
"properties": properties,
}
if required:
config_schema["required"] = required
return {
"name": name,
"display_name": display_names.get(name, name),
@ -121,14 +121,14 @@ class EmbeddingProviderFactory:
"""
Create an embedding provider instance.
[AC-AISVC-30] Creates provider based on configuration.
Args:
name: Provider identifier (e.g., "ollama", "openai")
config: Provider-specific configuration
Returns:
Configured EmbeddingProvider instance
Raises:
EmbeddingException: If provider is unknown or configuration is invalid
"""
@ -138,9 +138,9 @@ class EmbeddingProviderFactory:
f"Available: {cls.get_available_providers()}",
provider="factory"
)
provider_class = cls._providers[name]
try:
instance = provider_class(**config)
logger.info(f"Created embedding provider: {name}")
@ -169,14 +169,14 @@ class EmbeddingConfigManager:
self._provider_name = default_provider
self._config = self._default_config.copy()
self._provider: EmbeddingProvider | None = None
self._load_from_file()
def _load_from_file(self) -> None:
"""Load configuration from file if exists."""
try:
if EMBEDDING_CONFIG_FILE.exists():
with open(EMBEDDING_CONFIG_FILE, 'r', encoding='utf-8') as f:
with open(EMBEDDING_CONFIG_FILE, encoding='utf-8') as f:
saved = json.load(f)
self._provider_name = saved.get("provider", self._default_provider)
self._config = saved.get("config", self._default_config.copy())
@ -235,38 +235,38 @@ class EmbeddingConfigManager:
"""
Update embedding configuration.
[AC-AISVC-31, AC-AISVC-40] Supports hot-reload with persistence.
Args:
provider: New provider name
config: New provider configuration
Returns:
True if update was successful
Raises:
EmbeddingException: If configuration is invalid
"""
old_provider = self._provider_name
old_config = self._config.copy()
try:
new_provider_instance = EmbeddingProviderFactory.create_provider(
provider,
config
)
if self._provider:
await self._provider.close()
self._provider_name = provider
self._config = config
self._provider = new_provider_instance
self._save_to_file()
logger.info(f"Updated embedding config: provider={provider}")
return True
except Exception as e:
self._provider_name = old_provider
self._config = old_config
@ -285,39 +285,39 @@ class EmbeddingConfigManager:
"""
Test embedding connection.
[AC-AISVC-41] Tests provider connectivity.
Args:
test_text: Text to embed for testing
provider: Provider to test (uses current if None)
config: Config to test (uses current if None)
Returns:
Dict with test results including success, dimension, latency
"""
import time
test_provider_name = provider or self._provider_name
test_config = config or self._config
try:
test_provider = EmbeddingProviderFactory.create_provider(
test_provider_name,
test_config
)
start_time = time.perf_counter()
embedding = await test_provider.embed(test_text)
latency_ms = (time.perf_counter() - start_time) * 1000
await test_provider.close()
return {
"success": True,
"dimension": len(embedding),
"latency_ms": latency_ms,
"message": f"连接成功,向量维度: {len(embedding)}",
}
except Exception as e:
return {
"success": False,
@ -346,7 +346,7 @@ def get_embedding_config_manager() -> EmbeddingConfigManager:
if _embedding_config_manager is None:
from app.core.config import get_settings
settings = get_settings()
_embedding_config_manager = EmbeddingConfigManager(
default_provider="nomic",
default_config={

View File

@ -15,7 +15,6 @@ import httpx
import numpy as np
from app.services.embedding.base import (
EmbeddingConfig,
EmbeddingException,
EmbeddingProvider,
)
@ -45,12 +44,12 @@ class NomicEmbeddingResult:
class NomicEmbeddingProvider(EmbeddingProvider):
"""
Nomic-embed-text v1.5 embedding provider with task prefixes.
Key features:
- Task prefixes: search_document: for documents, search_query: for queries
- Matryoshka dimension truncation: 256/512/768 dimensions
- Automatic normalization after truncation
Reference: rag-optimization/spec.md Section 2.1, 2.3
"""
@ -87,7 +86,7 @@ class NomicEmbeddingProvider(EmbeddingProvider):
prefix = self.DOCUMENT_PREFIX
else:
prefix = self.QUERY_PREFIX
if text.startswith(prefix):
return text
return f"{prefix}{text}"
@ -98,12 +97,12 @@ class NomicEmbeddingProvider(EmbeddingProvider):
Matryoshka representation learning allows dimension truncation.
"""
truncated = embedding[:target_dim]
arr = np.array(truncated, dtype=np.float32)
norm = np.linalg.norm(arr)
if norm > 0:
arr = arr / norm
return arr.tolist()
async def embed_with_task(
@ -113,18 +112,18 @@ class NomicEmbeddingProvider(EmbeddingProvider):
) -> NomicEmbeddingResult:
"""
Generate embedding with specified task prefix.
Args:
text: Input text to embed
task: DOCUMENT for indexing, QUERY for retrieval
Returns:
NomicEmbeddingResult with all dimension variants
"""
start_time = time.perf_counter()
prefixed_text = self._add_prefix(text, task)
try:
client = await self._get_client()
response = await client.post(
@ -137,25 +136,25 @@ class NomicEmbeddingProvider(EmbeddingProvider):
response.raise_for_status()
data = response.json()
embedding = data.get("embedding", [])
if not embedding:
raise EmbeddingException(
"Empty embedding returned",
provider=self.PROVIDER_NAME,
details={"text_length": len(text), "task": task.value}
)
latency_ms = (time.perf_counter() - start_time) * 1000
embedding_256 = self._truncate_and_normalize(embedding, 256)
embedding_512 = self._truncate_and_normalize(embedding, 512)
embedding_full = self._truncate_and_normalize(embedding, len(embedding))
logger.debug(
f"Generated Nomic embedding: task={task.value}, "
f"dim={len(embedding)}, latency={latency_ms:.2f}ms"
)
return NomicEmbeddingResult(
embedding_full=embedding_full,
embedding_256=embedding_256,
@ -165,7 +164,7 @@ class NomicEmbeddingProvider(EmbeddingProvider):
task=task,
latency_ms=latency_ms,
)
except httpx.HTTPStatusError as e:
raise EmbeddingException(
f"Ollama API error: {e.response.status_code}",

View File

@ -4,7 +4,9 @@ Uses nomic-embed-text model via Ollama API.
"""
import logging
import httpx
from app.core.config import get_settings
logger = logging.getLogger(__name__)
@ -15,7 +17,7 @@ async def get_embedding(text: str) -> list[float]:
Generate embedding vector for text using Ollama nomic-embed-text model.
"""
settings = get_settings()
async with httpx.AsyncClient(timeout=60.0) as client:
try:
response = await client.post(
@ -28,14 +30,14 @@ async def get_embedding(text: str) -> list[float]:
response.raise_for_status()
data = response.json()
embedding = data.get("embedding", [])
if not embedding:
logger.warning(f"Empty embedding returned for text length={len(text)}")
return [0.0] * settings.qdrant_vector_size
logger.debug(f"Generated embedding: dim={len(embedding)}")
return embedding
except httpx.HTTPStatusError as e:
logger.error(f"Ollama API error: {e.response.status_code} - {e.response.text}")
raise

View File

@ -12,7 +12,6 @@ from typing import Any
import httpx
from app.services.embedding.base import (
EmbeddingConfig,
EmbeddingException,
EmbeddingProvider,
)
@ -54,7 +53,7 @@ class OllamaEmbeddingProvider(EmbeddingProvider):
[AC-AISVC-29] Returns embedding vector.
"""
start_time = time.perf_counter()
try:
client = await self._get_client()
response = await client.post(
@ -67,22 +66,22 @@ class OllamaEmbeddingProvider(EmbeddingProvider):
response.raise_for_status()
data = response.json()
embedding = data.get("embedding", [])
if not embedding:
raise EmbeddingException(
"Empty embedding returned",
provider=self.PROVIDER_NAME,
details={"text_length": len(text)}
)
latency_ms = (time.perf_counter() - start_time) * 1000
logger.debug(
f"Generated embedding via Ollama: dim={len(embedding)}, "
f"latency={latency_ms:.2f}ms"
)
return embedding
except httpx.HTTPStatusError as e:
raise EmbeddingException(
f"Ollama API error: {e.response.status_code}",

View File

@ -48,7 +48,7 @@ class OpenAIEmbeddingProvider(EmbeddingProvider):
self._timeout = timeout_seconds
self._client: httpx.AsyncClient | None = None
self._extra_config = kwargs
if dimension:
self._dimension = dimension
elif model in self.MODEL_DIMENSIONS:
@ -75,17 +75,17 @@ class OpenAIEmbeddingProvider(EmbeddingProvider):
[AC-AISVC-29] Supports batch embedding for efficiency.
"""
start_time = time.perf_counter()
try:
client = await self._get_client()
request_body: dict[str, Any] = {
"model": self._model,
"input": texts,
}
if self._dimension and self._model.startswith("text-embedding-3"):
request_body["dimensions"] = self._dimension
response = await client.post(
f"{self._base_url}/embeddings",
headers={
@ -96,7 +96,7 @@ class OpenAIEmbeddingProvider(EmbeddingProvider):
)
response.raise_for_status()
data = response.json()
embeddings = []
for item in data.get("data", []):
embedding = item.get("embedding", [])
@ -107,22 +107,22 @@ class OpenAIEmbeddingProvider(EmbeddingProvider):
details={"index": item.get("index", 0)}
)
embeddings.append(embedding)
if len(embeddings) != len(texts):
raise EmbeddingException(
f"Embedding count mismatch: expected {len(texts)}, got {len(embeddings)}",
provider=self.PROVIDER_NAME
)
latency_ms = (time.perf_counter() - start_time) * 1000
logger.debug(
f"Generated {len(embeddings)} embeddings via OpenAI: "
f"dim={len(embeddings[0]) if embeddings else 0}, "
f"latency={latency_ms:.2f}ms"
)
return embeddings
except httpx.HTTPStatusError as e:
raise EmbeddingException(
f"OpenAI API error: {e.response.status_code}",

View File

@ -3,7 +3,7 @@ Flow services for AI Service.
[AC-AISVC-71~AC-AISVC-77] Script flow management and execution engine.
"""
from app.services.flow.flow_service import ScriptFlowService
from app.services.flow.engine import FlowEngine
from app.services.flow.flow_service import ScriptFlowService
__all__ = ["ScriptFlowService", "FlowEngine"]

View File

@ -14,10 +14,10 @@ from sqlalchemy.ext.asyncio import AsyncSession
from sqlmodel import col
from app.models.entities import (
ScriptFlow,
FlowAdvanceResult,
FlowInstance,
FlowInstanceStatus,
FlowAdvanceResult,
ScriptFlow,
TimeoutAction,
)
@ -27,14 +27,14 @@ logger = logging.getLogger(__name__)
class FlowEngine:
"""
[AC-AISVC-74~AC-AISVC-77] State machine engine for script flow execution.
State Machine:
- IDLE: No active flow
- ACTIVE: Flow is being executed
- COMPLETED: Flow finished successfully
- TIMEOUT: Flow timed out
- CANCELLED: Flow was cancelled
Core Methods:
- check_active_flow: Check if session has active flow
- start: Start a new flow instance
@ -52,11 +52,11 @@ class FlowEngine:
) -> FlowInstance | None:
"""
[AC-AISVC-75] Check if session has an active flow instance.
Args:
tenant_id: Tenant ID for isolation
session_id: Session ID to check
Returns:
Active FlowInstance or None
"""
@ -76,12 +76,12 @@ class FlowEngine:
) -> tuple[FlowInstance | None, str | None]:
"""
[AC-AISVC-74] Start a new flow instance and return first step content.
Args:
tenant_id: Tenant ID for isolation
session_id: Session ID for the conversation
flow_id: ID of the flow to start
Returns:
Tuple of (FlowInstance, first_step_content) or (None, error_message)
"""
@ -141,12 +141,12 @@ class FlowEngine:
) -> FlowAdvanceResult:
"""
[AC-AISVC-75, AC-AISVC-76] Advance flow based on user input.
Args:
tenant_id: Tenant ID for isolation
session_id: Session ID for the conversation
user_input: User's input message
Returns:
FlowAdvanceResult with completion status and next message
"""
@ -168,11 +168,11 @@ class FlowEngine:
return FlowAdvanceResult(completed=True, message=None)
current_step = flow.steps[current_step_idx]
self._record_input(instance, user_input)
next_step_no = self._match_next_step(current_step, user_input)
if next_step_no is None:
default_next = current_step.get("default_next")
if default_next:
@ -223,11 +223,11 @@ class FlowEngine:
) -> FlowAdvanceResult:
"""
[AC-AISVC-77] Handle timeout for current step.
Args:
tenant_id: Tenant ID for isolation
session_id: Session ID for the conversation
Returns:
FlowAdvanceResult based on timeout_action configuration
"""
@ -268,7 +268,7 @@ class FlowEngine:
instance.current_step = default_next
instance.updated_at = datetime.utcnow()
await self._session.flush()
next_step = flow.steps[default_next - 1]
return FlowAdvanceResult(
completed=False,
@ -286,7 +286,7 @@ class FlowEngine:
instance.completed_at = datetime.utcnow()
instance.updated_at = datetime.utcnow()
await self._session.flush()
return FlowAdvanceResult(
completed=True,
message="抱歉,等待超时,正在为您转接人工客服...",
@ -330,7 +330,7 @@ class FlowEngine:
return None
flow = await self._get_flow_by_id(instance.flow_id)
return {
"instance_id": str(instance.id),
"flow_id": str(instance.flow_id),
@ -372,11 +372,11 @@ class FlowEngine:
) -> int | None:
"""
Match user input against next_conditions.
Args:
step: Current step definition
user_input: User's input message
Returns:
goto_step number if matched, None otherwise
"""
@ -410,7 +410,7 @@ class FlowEngine:
"""Record user input in flow context."""
if instance.context is None:
instance.context = {"inputs": []}
inputs = instance.context.get("inputs", [])
inputs.append({
"step": instance.current_step,

View File

@ -5,10 +5,11 @@ Script Flow Service for AI Service.
import logging
import uuid
from collections.abc import Sequence
from datetime import datetime
from typing import Any, Sequence
from typing import Any
from sqlalchemy import select, func
from sqlalchemy import func, select
from sqlalchemy.ext.asyncio import AsyncSession
from sqlmodel import col
@ -16,7 +17,6 @@ from app.models.entities import (
ScriptFlow,
ScriptFlowCreate,
ScriptFlowUpdate,
FlowInstanceStatus,
)
logger = logging.getLogger(__name__)
@ -25,7 +25,7 @@ logger = logging.getLogger(__name__)
class ScriptFlowService:
"""
[AC-AISVC-71~AC-AISVC-73] Service for managing script flow definitions.
Features:
- Flow CRUD with tenant isolation
- Step validation
@ -44,7 +44,7 @@ class ScriptFlowService:
[AC-AISVC-71] Create a new script flow with steps.
"""
self._validate_steps(create_data.steps)
flow = ScriptFlow(
tenant_id=tenant_id,
name=create_data.name,
@ -72,10 +72,10 @@ class ScriptFlowService:
stmt = select(ScriptFlow).where(
ScriptFlow.tenant_id == tenant_id
)
if is_enabled is not None:
stmt = stmt.where(ScriptFlow.is_enabled == is_enabled)
stmt = stmt.order_by(col(ScriptFlow.created_at).desc())
result = await self._session.execute(stmt)
return result.scalars().all()
@ -146,7 +146,7 @@ class ScriptFlowService:
flow.updated_at = datetime.utcnow()
await self._session.flush()
logger.info(
f"[AC-AISVC-73] Updated script flow: tenant={tenant_id}, id={flow_id}"
)
@ -164,7 +164,7 @@ class ScriptFlowService:
await self._session.delete(flow)
await self._session.flush()
logger.info(
f"Deleted script flow: tenant={tenant_id}, id={flow_id}"
)
@ -186,7 +186,7 @@ class ScriptFlowService:
) -> int:
"""Get count of intent rules linked to this flow."""
from app.models.entities import IntentRule
stmt = select(func.count()).select_from(IntentRule).where(
IntentRule.tenant_id == tenant_id,
IntentRule.flow_id == flow_id,
@ -198,7 +198,7 @@ class ScriptFlowService:
"""Validate step definitions."""
if not steps:
raise ValueError("Flow must have at least one step")
step_nos = set()
for step in steps:
step_no = step.get("step_no")
@ -207,10 +207,10 @@ class ScriptFlowService:
if step_no in step_nos:
raise ValueError(f"Duplicate step_no: {step_no}")
step_nos.add(step_no)
if not step.get("content"):
raise ValueError(f"Step {step_no} must have content")
next_conditions = step.get("next_conditions", [])
for cond in next_conditions:
if cond.get("goto_step") is None:

View File

@ -3,8 +3,8 @@ Intent recognition and routing services.
[AC-AISVC-65~AC-AISVC-70] Intent rule management and matching engine.
"""
from app.services.intent.router import IntentMatchResult, IntentRouter
from app.services.intent.rule_service import IntentRuleService
from app.services.intent.router import IntentRouter, IntentMatchResult
__all__ = [
"IntentRuleService",

View File

@ -41,7 +41,7 @@ class IntentMatchResult:
class IntentRouter:
"""
[AC-AISVC-69] Intent matching engine.
Matching algorithm:
1. Load rules ordered by priority DESC
2. For each rule, try keyword matching first
@ -60,11 +60,11 @@ class IntentRouter:
) -> IntentMatchResult | None:
"""
[AC-AISVC-69] Match user message against intent rules.
Args:
message: User input message
rules: List of enabled rules ordered by priority DESC
Returns:
IntentMatchResult if matched, None otherwise
"""
@ -93,7 +93,7 @@ class IntentRouter:
)
return regex_result
logger.debug(f"[AC-AISVC-70] No intent matched, will fallback to default RAG")
logger.debug("[AC-AISVC-70] No intent matched, will fallback to default RAG")
return None
def _match_keywords(
@ -160,7 +160,7 @@ class IntentRouter:
) -> tuple[IntentMatchResult | None, str | None]:
"""
[AC-AISVC-69] Match and return rule_id for statistics update.
Returns:
Tuple of (match_result, rule_id_for_stats)
"""

View File

@ -6,8 +6,9 @@ Intent rule service for AI Service.
import logging
import time
import uuid
from collections.abc import Sequence
from datetime import datetime
from typing import Any, Sequence
from typing import Any
from sqlalchemy import select
from sqlalchemy.ext.asyncio import AsyncSession
@ -63,7 +64,7 @@ _rule_cache = RuleCache()
class IntentRuleService:
"""
[AC-AISVC-65~AC-AISVC-68] Service for managing intent rules.
Features:
- Rule CRUD with tenant isolation
- Hit count statistics

View File

@ -6,10 +6,10 @@ Knowledge Base service for AI Service.
import logging
import os
import uuid
from collections.abc import Sequence
from datetime import datetime
from typing import Sequence
from sqlalchemy import select, func
from sqlalchemy import func, select
from sqlalchemy.ext.asyncio import AsyncSession
from sqlmodel import col

View File

@ -8,8 +8,9 @@ Design reference: design.md Section 8.1 - LLMClient interface
"""
from abc import ABC, abstractmethod
from collections.abc import AsyncGenerator
from dataclasses import dataclass, field
from typing import Any, AsyncGenerator
from typing import Any
@dataclass

View File

@ -7,7 +7,7 @@ Design pattern: Factory pattern for pluggable LLM providers.
import json
import logging
from dataclasses import dataclass, field
from dataclasses import dataclass
from pathlib import Path
from typing import Any
@ -266,9 +266,9 @@ class LLMConfigManager:
def __init__(self):
from app.core.config import get_settings
settings = get_settings()
self._current_provider: str = settings.llm_provider
self._current_config: dict[str, Any] = {
"api_key": settings.llm_api_key,
@ -278,14 +278,14 @@ class LLMConfigManager:
"temperature": settings.llm_temperature,
}
self._client: LLMClient | None = None
self._load_from_file()
def _load_from_file(self) -> None:
"""Load configuration from file if exists."""
try:
if LLM_CONFIG_FILE.exists():
with open(LLM_CONFIG_FILE, 'r', encoding='utf-8') as f:
with open(LLM_CONFIG_FILE, encoding='utf-8') as f:
saved = json.load(f)
self._current_provider = saved.get("provider", self._current_provider)
saved_config = saved.get("config", {})
@ -343,7 +343,7 @@ class LLMConfigManager:
self._current_provider = provider
self._current_config = validated_config
self._save_to_file()
logger.info(f"[AC-ASA-16] LLM config updated: provider={provider}")
@ -357,7 +357,7 @@ class LLMConfigManager:
"""Validate configuration against provider schema."""
schema_props = provider_info.config_schema.get("properties", {})
required_fields = provider_info.config_schema.get("required", [])
validated = {}
for key, prop_schema in schema_props.items():
if key in config:
@ -399,7 +399,7 @@ class LLMConfigManager:
test_provider = provider or self._current_provider
test_config = config if config else self._current_config
logger.info(f"[AC-ASA-17] Test connection: provider={test_provider}, model={test_config.get('model')}")
if test_provider not in LLM_PROVIDERS:

View File

@ -9,7 +9,8 @@ Design reference: design.md Section 8.1 - LLMClient interface
import json
import logging
from typing import Any, AsyncGenerator
from collections.abc import AsyncGenerator
from typing import Any
import httpx
from tenacity import (
@ -20,7 +21,7 @@ from tenacity import (
)
from app.core.config import get_settings
from app.core.exceptions import AIServiceException, ErrorCode, ServiceUnavailableException, TimeoutException
from app.core.exceptions import AIServiceException, ErrorCode, TimeoutException
from app.services.llm.base import LLMClient, LLMConfig, LLMResponse, LLMStreamChunk
logger = logging.getLogger(__name__)
@ -133,13 +134,13 @@ class OpenAIClient(LLMClient):
body = self._build_request_body(messages, effective_config, stream=False, **kwargs)
logger.info(f"[AC-AISVC-02] Generating response with model={effective_config.model}")
logger.info(f"[AC-AISVC-02] ========== FULL PROMPT TO AI ==========")
logger.info("[AC-AISVC-02] ========== FULL PROMPT TO AI ==========")
for i, msg in enumerate(messages):
role = msg.get("role", "unknown")
content = msg.get("content", "")
logger.info(f"[AC-AISVC-02] [{i}] role={role}, content_length={len(content)}")
logger.info(f"[AC-AISVC-02] [{i}] content:\n{content}")
logger.info(f"[AC-AISVC-02] ======================================")
logger.info("[AC-AISVC-02] ======================================")
try:
response = await client.post(
@ -220,13 +221,13 @@ class OpenAIClient(LLMClient):
body = self._build_request_body(messages, effective_config, stream=True, **kwargs)
logger.info(f"[AC-AISVC-06] Starting streaming generation with model={effective_config.model}")
logger.info(f"[AC-AISVC-06] ========== FULL PROMPT TO AI (STREAMING) ==========")
logger.info("[AC-AISVC-06] ========== FULL PROMPT TO AI (STREAMING) ==========")
for i, msg in enumerate(messages):
role = msg.get("role", "unknown")
content = msg.get("content", "")
logger.info(f"[AC-AISVC-06] [{i}] role={role}, content_length={len(content)}")
logger.info(f"[AC-AISVC-06] [{i}] content:\n{content}")
logger.info(f"[AC-AISVC-06] ======================================")
logger.info("[AC-AISVC-06] ======================================")
try:
async with client.stream(
@ -263,7 +264,7 @@ class OpenAIClient(LLMClient):
details=[{"status_code": e.response.status_code, "response": error_detail}],
)
logger.info(f"[AC-AISVC-06] Streaming generation completed")
logger.info("[AC-AISVC-06] Streaming generation completed")
def _parse_stream_chunk(
self,

View File

@ -4,13 +4,13 @@ Memory service for AI Service.
"""
import logging
from typing import Sequence
from collections.abc import Sequence
from sqlalchemy import select
from sqlalchemy.ext.asyncio import AsyncSession
from sqlmodel import col
from app.models.entities import ChatMessage, ChatMessageCreate, ChatSession, ChatSessionCreate
from app.models.entities import ChatMessage, ChatSession
logger = logging.getLogger(__name__)

View File

@ -165,24 +165,24 @@ class PromptMonitor:
select(PromptTemplateVersion)
.where(
PromptTemplateVersion.template_id == template_id,
PromptTemplateVersion.status == TemplateVersionStatus.PUBLISHED.value,
)
.order_by(PromptTemplateVersion.version.desc())
)
version_result = await self._session.execute(version_stmt)
published_version = version_result.scalar_one_or_none()
latest_version = version_result.scalar_one_or_none()
if not published_version:
if not latest_version:
return None
resolver = VariableResolver()
merged_variables = self._merge_variables(
published_version.variables or [],
latest_version.variables or [],
variables or {},
)
rendered_content = resolver.resolve(
published_version.system_instruction,
published_version.variables,
latest_version.system_instruction,
latest_version.variables,
merged_variables,
)
@ -214,8 +214,8 @@ class PromptMonitor:
return PromptPreviewResult(
template_id=str(template.id),
template_name=template.name,
version=published_version.version,
raw_content=published_version.system_instruction,
version=latest_version.version,
raw_content=latest_version.system_instruction,
variables=variable_list,
rendered_content=rendered_content,
estimated_tokens=token_count.total,

View File

@ -2,15 +2,19 @@
Orchestrator service for AI Service.
[AC-AISVC-01, AC-AISVC-02, AC-AISVC-06, AC-AISVC-07] Core orchestration logic for chat generation.
Design reference: design.md Section 2.2 - 关键数据流
1. Memory.load(tenantId, sessionId)
2. merge_context(local_history, external_history)
3. Retrieval.retrieve(query, tenantId, channelType, metadata)
4. build_prompt(merged_history, retrieved_docs, currentMessage)
5. LLM.generate(...) (non-streaming) or LLM.stream_generate(...) (streaming)
6. compute_confidence(...)
7. Memory.append(tenantId, sessionId, user/assistant messages)
8. Return ChatResponse (or output via SSE)
Design reference: design.md Section 10 - Orchestrator升级为12步pipeline
1. InputScanner: Scan user input for forbidden words (logging only)
2. FlowEngine: Check if session has active script flow
3. IntentRouter: Match intent rules and route to appropriate handler
4. QueryRewriter: (Optional, skipped in MVP) Rewrite query for better retrieval
5. Multi-KB Retrieval: Retrieve from target knowledge bases
6. ResultRanker: Rank results by KB type priority
7. PromptBuilder: Load template + inject behavior rules
8. LLM.generate: Generate response
9. OutputFilter: Filter forbidden words in output
10. Confidence: Calculate confidence score
11. Memory: Save messages
12. Return: Build and return ChatResponse
RAG Optimization (rag-optimization/spec.md):
- Two-stage retrieval with Matryoshka dimensions
@ -19,24 +23,36 @@ RAG Optimization (rag-optimization/spec.md):
"""
import logging
import uuid
from collections.abc import AsyncGenerator
from dataclasses import dataclass, field
from typing import Any, AsyncGenerator
from typing import Any
from sse_starlette.sse import ServerSentEvent
from app.core.config import get_settings
from app.core.database import get_session
from app.core.prompts import SYSTEM_PROMPT, format_evidence_for_prompt
from app.core.sse import (
SSEStateMachine,
create_error_event,
create_final_event,
create_message_event,
SSEStateMachine,
)
from app.models import ChatRequest, ChatResponse
from app.services.confidence import ConfidenceCalculator, ConfidenceResult
from app.services.context import ContextMerger, MergedContext
from app.services.flow.engine import FlowEngine
from app.services.guardrail.behavior_service import BehaviorRuleService
from app.services.guardrail.input_scanner import InputScanner
from app.services.guardrail.output_filter import OutputFilter
from app.services.guardrail.word_service import ForbiddenWordService
from app.services.intent.router import IntentRouter
from app.services.intent.rule_service import IntentRuleService
from app.services.llm.base import LLMClient, LLMConfig, LLMResponse
from app.services.memory import MemoryService
from app.services.prompt.template_service import PromptTemplateService
from app.services.prompt.variable_resolver import VariableResolver
from app.services.retrieval.base import BaseRetriever, RetrievalContext, RetrievalResult
logger = logging.getLogger(__name__)
@ -60,6 +76,20 @@ class GenerationContext:
"""
[AC-AISVC-01, AC-AISVC-02] Context accumulated during generation pipeline.
Contains all intermediate results for diagnostics and response building.
12-Step Pipeline tracking:
1. input_scan_result: InputScanner result
2. active_flow: Active FlowInstance if exists
3. intent_match: IntentMatchResult if matched
4. query_rewritten: Rewritten query (optional)
5. retrieval_result: Multi-KB retrieval result
6. ranked_results: Ranked retrieval results
7. system_prompt: Built system prompt with template + behavior rules
8. llm_response: LLM generation result
9. filtered_reply: Output after forbidden word filtering
10. confidence_result: Confidence calculation result
11. messages_saved: Whether messages were saved
12. final_response: Final ChatResponse
"""
tenant_id: str
session_id: str
@ -67,19 +97,45 @@ class GenerationContext:
channel_type: str
request_metadata: dict[str, Any] | None = None
# Original pipeline fields
local_history: list[dict[str, str]] = field(default_factory=list)
merged_context: MergedContext | None = None
retrieval_result: RetrievalResult | None = None
llm_response: LLMResponse | None = None
confidence_result: ConfidenceResult | None = None
# Phase 10-14 new pipeline fields
input_scan_result: Any = None # InputScanResult
active_flow: Any = None # FlowInstance
intent_match: Any = None # IntentMatchResult
query_rewritten: str | None = None
ranked_results: list[Any] = field(default_factory=list)
system_prompt: str | None = None
filtered_reply: str | None = None
target_kb_ids: list[str] | None = None
behavior_rules: list[str] = field(default_factory=list)
diagnostics: dict[str, Any] = field(default_factory=dict)
class OrchestratorService:
"""
[AC-AISVC-01, AC-AISVC-02, AC-AISVC-06, AC-AISVC-07] Orchestrator for chat generation.
Coordinates memory, retrieval, and LLM components.
Coordinates memory, retrieval, LLM, and guardrail components.
12-Step Pipeline (design.md Section 10):
1. InputScanner: Scan user input for forbidden words
2. FlowEngine: Check if session has active script flow
3. IntentRouter: Match intent rules and route
4. QueryRewriter: (Optional, skipped in MVP)
5. Multi-KB Retrieval: Retrieve from target knowledge bases
6. ResultRanker: Rank results by KB type priority
7. PromptBuilder: Load template + inject behavior rules
8. LLM.generate: Generate response
9. OutputFilter: Filter forbidden words in output
10. Confidence: Calculate confidence score
11. Memory: Save messages
12. Return: Build and return ChatResponse
SSE Event Flow (per design.md Section 6.2):
- message* (0 or more) -> final (exactly 1) -> close
@ -94,6 +150,15 @@ class OrchestratorService:
context_merger: ContextMerger | None = None,
confidence_calculator: ConfidenceCalculator | None = None,
config: OrchestratorConfig | None = None,
# Phase 10-14 new services
input_scanner: InputScanner | None = None,
intent_router: IntentRouter | None = None,
intent_rule_service: IntentRuleService | None = None,
flow_engine: FlowEngine | None = None,
prompt_template_service: PromptTemplateService | None = None,
variable_resolver: VariableResolver | None = None,
behavior_rule_service: BehaviorRuleService | None = None,
output_filter: OutputFilter | None = None,
):
"""
Initialize orchestrator with optional dependencies for DI.
@ -105,6 +170,14 @@ class OrchestratorService:
context_merger: Context merger for history deduplication
confidence_calculator: Confidence calculator for response scoring
config: Orchestrator configuration
input_scanner: Input scanner for forbidden word detection
intent_router: Intent router for rule matching
intent_rule_service: Intent rule service for loading rules
flow_engine: Flow engine for script flow execution
prompt_template_service: Prompt template service for template loading
variable_resolver: Variable resolver for template variable substitution
behavior_rule_service: Behavior rule service for loading behavior rules
output_filter: Output filter for forbidden word filtering
"""
settings = get_settings()
self._llm_client = llm_client
@ -121,6 +194,16 @@ class OrchestratorService:
)
self._llm_config: LLMConfig | None = None
# Phase 10-14 services
self._input_scanner = input_scanner
self._intent_router = intent_router or IntentRouter()
self._intent_rule_service = intent_rule_service
self._flow_engine = flow_engine
self._prompt_template_service = prompt_template_service
self._variable_resolver = variable_resolver or VariableResolver()
self._behavior_rule_service = behavior_rule_service
self._output_filter = output_filter
async def generate(
self,
tenant_id: str,
@ -128,29 +211,27 @@ class OrchestratorService:
) -> ChatResponse:
"""
Generate a non-streaming response.
[AC-AISVC-01, AC-AISVC-02] Complete generation pipeline.
[AC-AISVC-01, AC-AISVC-02] Complete 12-step generation pipeline.
Pipeline (per design.md Section 2.2):
1. Load local history from Memory
2. Merge with external history (dedup + truncate)
3. RAG retrieval (optional)
4. Build prompt with context and evidence
5. LLM generation
6. Calculate confidence
7. Save messages to Memory
8. Return ChatResponse
12-Step Pipeline (design.md Section 10):
1. InputScanner: Scan user input for forbidden words
2. FlowEngine: Check if session has active script flow
3. IntentRouter: Match intent rules and route
4. QueryRewriter: (Optional, skipped in MVP)
5. Multi-KB Retrieval: Retrieve from target knowledge bases
6. ResultRanker: Rank results by KB type priority
7. PromptBuilder: Load template + inject behavior rules
8. LLM.generate: Generate response
9. OutputFilter: Filter forbidden words in output
10. Confidence: Calculate confidence score
11. Memory: Save messages
12. Return: Build and return ChatResponse
"""
logger.info(
f"[AC-AISVC-01] Starting generation for tenant={tenant_id}, "
f"[AC-AISVC-01] Starting 12-step generation for tenant={tenant_id}, "
f"session={request.session_id}, channel_type={request.channel_type}, "
f"current_message={request.current_message[:100]}..."
)
logger.info(
f"[AC-AISVC-01] Config: enable_rag={self._config.enable_rag}, "
f"use_optimized_retriever={self._config.use_optimized_retriever}, "
f"llm_client={'configured' if self._llm_client else 'NOT configured'}, "
f"retriever={'configured' if self._retriever else 'NOT configured'}"
)
ctx = GenerationContext(
tenant_id=tenant_id,
@ -161,19 +242,42 @@ class OrchestratorService:
)
try:
await self._load_local_history(ctx)
# Step 1: InputScanner - Scan user input for forbidden words
await self._scan_input(ctx)
# Load local history and merge context (original pipeline)
await self._load_local_history(ctx)
await self._merge_context(ctx, request.history)
# Step 2: FlowEngine - Check if session has active script flow
await self._check_active_flow(ctx)
# Step 3: IntentRouter - Match intent rules and route
await self._match_intent(ctx)
# Step 4: QueryRewriter - (Optional, skipped in MVP)
# ctx.query_rewritten = ctx.current_message
# Step 5-6: Multi-KB Retrieval + ResultRanker
if self._config.enable_rag and self._retriever:
await self._retrieve_evidence(ctx)
# Step 7: PromptBuilder - Load template + inject behavior rules
await self._build_system_prompt(ctx)
# Step 8: LLM.generate - Generate response
await self._generate_response(ctx)
# Step 9: OutputFilter - Filter forbidden words in output
await self._filter_output(ctx)
# Step 10: Confidence - Calculate confidence score
self._calculate_confidence(ctx)
# Step 11: Memory - Save messages
await self._save_messages(ctx)
# Step 12: Return - Build and return ChatResponse
return self._build_response(ctx)
except Exception as e:
@ -219,6 +323,200 @@ class OrchestratorService:
logger.warning(f"[AC-AISVC-13] Failed to load history: {e}")
ctx.diagnostics["memory_error"] = str(e)
async def _scan_input(self, ctx: GenerationContext) -> None:
"""
[AC-AISVC-83] Step 1: Scan user input for forbidden words (logging only).
"""
if not self._input_scanner:
logger.debug("[AC-AISVC-83] No input scanner configured, skipping")
ctx.diagnostics["input_scan_enabled"] = False
return
try:
ctx.input_scan_result = await self._input_scanner.scan(
text=ctx.current_message,
tenant_id=ctx.tenant_id,
)
ctx.diagnostics["input_scan"] = {
"flagged": ctx.input_scan_result.flagged,
"matched_words": ctx.input_scan_result.matched_words,
"matched_categories": ctx.input_scan_result.matched_categories,
}
if ctx.input_scan_result.flagged:
logger.info(
f"[AC-AISVC-83] Input flagged: words={ctx.input_scan_result.matched_words}, "
f"categories={ctx.input_scan_result.matched_categories}"
)
except Exception as e:
logger.warning(f"[AC-AISVC-83] Input scan failed: {e}")
ctx.diagnostics["input_scan_error"] = str(e)
async def _check_active_flow(self, ctx: GenerationContext) -> None:
"""
[AC-AISVC-75] Step 2: Check if session has active script flow.
If active flow exists, advance it based on user input.
"""
if not self._flow_engine:
logger.debug("[AC-AISVC-75] No flow engine configured, skipping")
ctx.diagnostics["flow_check_enabled"] = False
return
try:
ctx.active_flow = await self._flow_engine.check_active_flow(
tenant_id=ctx.tenant_id,
session_id=ctx.session_id,
)
if ctx.active_flow:
logger.info(
f"[AC-AISVC-75] Active flow found: flow_id={ctx.active_flow.flow_id}, "
f"current_step={ctx.active_flow.current_step}"
)
# Advance the flow based on user input
advance_result = await self._flow_engine.advance(
instance=ctx.active_flow,
user_input=ctx.current_message,
)
ctx.diagnostics["flow_advance"] = {
"completed": advance_result.completed,
"has_message": advance_result.message is not None,
}
# If flow provides a message, use it as the reply and skip LLM
if advance_result.message:
ctx.llm_response = LLMResponse(
content=advance_result.message,
model="script_flow",
usage={},
finish_reason="flow_step",
)
ctx.diagnostics["flow_handled"] = True
logger.info(f"[AC-AISVC-75] Flow provided reply, skipping LLM")
else:
ctx.diagnostics["flow_check_enabled"] = True
ctx.diagnostics["active_flow"] = False
except Exception as e:
logger.warning(f"[AC-AISVC-75] Flow check failed: {e}")
ctx.diagnostics["flow_check_error"] = str(e)
async def _match_intent(self, ctx: GenerationContext) -> None:
"""
[AC-AISVC-69, AC-AISVC-70] Step 3: Match intent rules and route.
Routes to: fixed reply, RAG with target KBs, flow start, or transfer.
"""
# Skip if flow already handled the request
if ctx.diagnostics.get("flow_handled"):
logger.info("[AC-AISVC-69] Flow already handled, skipping intent matching")
return
if not self._intent_rule_service:
logger.debug("[AC-AISVC-69] No intent rule service configured, skipping")
ctx.diagnostics["intent_match_enabled"] = False
return
try:
# Load enabled rules ordered by priority
async with get_session() as session:
from app.services.intent.rule_service import IntentRuleService
rule_service = IntentRuleService(session)
rules = await rule_service.get_enabled_rules_for_matching(ctx.tenant_id)
if not rules:
ctx.diagnostics["intent_match_enabled"] = True
ctx.diagnostics["intent_matched"] = False
return
# Match intent
ctx.intent_match = self._intent_router.match(
message=ctx.current_message,
rules=rules,
)
if ctx.intent_match:
logger.info(
f"[AC-AISVC-69] Intent matched: rule={ctx.intent_match.rule.name}, "
f"response_type={ctx.intent_match.rule.response_type}"
)
ctx.diagnostics["intent_match"] = ctx.intent_match.to_dict()
# Increment hit count
async with get_session() as session:
rule_service = IntentRuleService(session)
await rule_service.increment_hit_count(
tenant_id=ctx.tenant_id,
rule_id=ctx.intent_match.rule.id,
)
# Route based on response_type
if ctx.intent_match.rule.response_type == "fixed":
# Fixed reply - skip LLM
ctx.llm_response = LLMResponse(
content=ctx.intent_match.rule.fixed_reply or "收到您的消息。",
model="intent_fixed",
usage={},
finish_reason="intent_fixed",
)
ctx.diagnostics["intent_handled"] = True
logger.info("[AC-AISVC-70] Intent fixed reply, skipping LLM")
elif ctx.intent_match.rule.response_type == "rag":
# RAG with target KBs
ctx.target_kb_ids = ctx.intent_match.rule.target_kb_ids or []
logger.info(f"[AC-AISVC-70] Intent RAG, target_kb_ids={ctx.target_kb_ids}")
elif ctx.intent_match.rule.response_type == "flow":
# Start script flow
if ctx.intent_match.rule.flow_id and self._flow_engine:
async with get_session() as session:
flow_engine = FlowEngine(session)
instance, first_step = await flow_engine.start(
tenant_id=ctx.tenant_id,
session_id=ctx.session_id,
flow_id=ctx.intent_match.rule.flow_id,
)
if first_step:
ctx.llm_response = LLMResponse(
content=first_step,
model="script_flow",
usage={},
finish_reason="flow_start",
)
ctx.diagnostics["intent_handled"] = True
logger.info("[AC-AISVC-70] Intent flow started, skipping LLM")
elif ctx.intent_match.rule.response_type == "transfer":
# Transfer to human
ctx.llm_response = LLMResponse(
content=ctx.intent_match.rule.transfer_message or "正在为您转接人工客服...",
model="intent_transfer",
usage={},
finish_reason="intent_transfer",
)
ctx.confidence_result = ConfidenceResult(
confidence=0.0,
should_transfer=True,
transfer_reason="intent_rule_transfer",
is_retrieval_insufficient=False,
)
ctx.diagnostics["intent_handled"] = True
logger.info("[AC-AISVC-70] Intent transfer, skipping LLM")
else:
ctx.diagnostics["intent_match_enabled"] = True
ctx.diagnostics["intent_matched"] = False
except Exception as e:
logger.warning(f"[AC-AISVC-69] Intent matching failed: {e}")
ctx.diagnostics["intent_match_error"] = str(e)
async def _merge_context(
self,
ctx: GenerationContext,
@ -263,8 +561,13 @@ class OrchestratorService:
async def _retrieve_evidence(self, ctx: GenerationContext) -> None:
"""
[AC-AISVC-16, AC-AISVC-17] RAG retrieval for evidence.
Step 3 of the generation pipeline.
Step 5-6: Multi-KB retrieval with target KBs from intent matching.
"""
# Skip if flow or intent already handled
if ctx.diagnostics.get("flow_handled") or ctx.diagnostics.get("intent_handled"):
logger.info("[AC-AISVC-16] Request already handled, skipping retrieval")
return
logger.info(
f"[AC-AISVC-16] Starting retrieval: tenant={ctx.tenant_id}, "
f"query={ctx.current_message[:100]}..., retriever={type(self._retriever).__name__ if self._retriever else 'None'}"
@ -278,6 +581,12 @@ class OrchestratorService:
metadata=ctx.request_metadata,
)
# If intent matched with target KBs, pass them to retriever
if ctx.target_kb_ids:
retrieval_ctx.metadata = retrieval_ctx.metadata or {}
retrieval_ctx.metadata["target_kb_ids"] = ctx.target_kb_ids
logger.info(f"[AC-AISVC-16] Using target_kb_ids from intent: {ctx.target_kb_ids}")
ctx.retrieval_result = await self._retriever.retrieve(retrieval_ctx)
ctx.diagnostics["retrieval"] = {
@ -292,7 +601,7 @@ class OrchestratorService:
f"max_score={ctx.retrieval_result.max_score:.3f}, "
f"is_empty={ctx.retrieval_result.is_empty}"
)
if ctx.retrieval_result.hit_count > 0:
for i, hit in enumerate(ctx.retrieval_result.hits[:3]):
logger.info(
@ -308,11 +617,76 @@ class OrchestratorService:
)
ctx.diagnostics["retrieval_error"] = str(e)
async def _build_system_prompt(self, ctx: GenerationContext) -> None:
"""
[AC-AISVC-56, AC-AISVC-84] Step 7: Build system prompt with template + behavior rules.
"""
# Skip if flow or intent already handled
if ctx.diagnostics.get("flow_handled") or ctx.diagnostics.get("intent_handled"):
logger.info("[AC-AISVC-56] Request already handled, using default prompt")
ctx.system_prompt = self._config.system_prompt
return
try:
# Try to load template from service
if self._prompt_template_service:
async with get_session() as session:
template_service = PromptTemplateService(session)
template_version = await template_service.get_published_template(
tenant_id=ctx.tenant_id,
scene="default", # TODO: Make scene configurable
)
if template_version:
# Resolve variables
variables = {
"persona_name": "AI助手",
"current_time": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
"channel_type": ctx.channel_type,
}
ctx.system_prompt = self._variable_resolver.resolve(
template=template_version.system_instruction,
variables=variables,
)
logger.info(f"[AC-AISVC-56] Loaded template: scene=default, version={template_version.version}")
else:
ctx.system_prompt = self._config.system_prompt
logger.info("[AC-AISVC-56] No published template found, using default")
else:
ctx.system_prompt = self._config.system_prompt
# Load and inject behavior rules
if self._behavior_rule_service:
async with get_session() as session:
behavior_service = BehaviorRuleService(session)
rules = await behavior_service.get_enabled_rules(ctx.tenant_id)
if rules:
ctx.behavior_rules = [rule.rule_text for rule in rules]
behavior_text = "\n".join([f"- {rule}" for rule in ctx.behavior_rules])
ctx.system_prompt += f"\n\n行为约束:\n{behavior_text}"
logger.info(f"[AC-AISVC-84] Injected {len(rules)} behavior rules")
ctx.diagnostics["prompt_template"] = {
"source": "template" if self._prompt_template_service else "default",
"behavior_rules_count": len(ctx.behavior_rules),
}
except Exception as e:
logger.warning(f"[AC-AISVC-56] Failed to build system prompt: {e}")
ctx.system_prompt = self._config.system_prompt
ctx.diagnostics["prompt_build_error"] = str(e)
async def _generate_response(self, ctx: GenerationContext) -> None:
"""
[AC-AISVC-02] Generate response using LLM.
Step 4-5 of the generation pipeline.
Step 8 of the 12-step pipeline.
"""
# Skip if flow or intent already handled
if ctx.diagnostics.get("flow_handled") or ctx.diagnostics.get("intent_handled"):
logger.info("[AC-AISVC-02] Request already handled, skipping LLM generation")
return
messages = self._build_llm_messages(ctx)
logger.info(
f"[AC-AISVC-02] Building LLM messages: count={len(messages)}, "
@ -368,13 +742,62 @@ class OrchestratorService:
ctx.diagnostics["llm_mode"] = "fallback"
ctx.diagnostics["fallback_reason"] = f"llm_error: {str(e)}"
async def _filter_output(self, ctx: GenerationContext) -> None:
"""
[AC-AISVC-82] Step 9: Filter forbidden words in output.
"""
if not ctx.llm_response:
logger.debug("[AC-AISVC-82] No LLM response to filter")
return
if not self._output_filter:
logger.debug("[AC-AISVC-82] No output filter configured, skipping")
ctx.filtered_reply = ctx.llm_response.content
ctx.diagnostics["output_filter_enabled"] = False
return
try:
filter_result = await self._output_filter.filter(
reply=ctx.llm_response.content,
tenant_id=ctx.tenant_id,
)
ctx.filtered_reply = filter_result.filtered_text
ctx.diagnostics["output_filter"] = {
"triggered": filter_result.triggered,
"matched_words": filter_result.matched_words,
"strategy_applied": filter_result.strategy_applied,
}
if filter_result.triggered:
logger.info(
f"[AC-AISVC-82] Output filtered: words={filter_result.matched_words}, "
f"strategy={filter_result.strategy_applied}"
)
# If blocked, override confidence
if filter_result.strategy_applied == "block":
ctx.confidence_result = ConfidenceResult(
confidence=0.0,
should_transfer=True,
transfer_reason="output_blocked_by_guardrail",
is_retrieval_insufficient=False,
)
except Exception as e:
logger.warning(f"[AC-AISVC-82] Output filtering failed: {e}")
ctx.filtered_reply = ctx.llm_response.content
ctx.diagnostics["output_filter_error"] = str(e)
def _build_llm_messages(self, ctx: GenerationContext) -> list[dict[str, str]]:
"""
[AC-AISVC-02] Build messages for LLM including system prompt and evidence.
Uses ctx.system_prompt from Step 7 (template + behavior rules).
"""
messages = []
system_content = self._config.system_prompt
# Use system prompt from Step 7 (template + behavior rules)
system_content = ctx.system_prompt or self._config.system_prompt
if ctx.retrieval_result and not ctx.retrieval_result.is_empty:
evidence_text = self._format_evidence(ctx.retrieval_result)
@ -386,20 +809,12 @@ class OrchestratorService:
messages.extend(ctx.merged_context.messages)
messages.append({"role": "user", "content": ctx.current_message})
logger.info(
f"[AC-AISVC-02] Built {len(messages)} messages for LLM: "
f"system_len={len(system_content)}, history_count={len(ctx.merged_context.messages) if ctx.merged_context else 0}"
)
logger.debug(f"[AC-AISVC-02] System prompt preview: {system_content[:500]}...")
logger.info(f"[AC-AISVC-02] ========== ORCHESTRATOR FULL PROMPT ==========")
for i, msg in enumerate(messages):
role = msg.get("role", "unknown")
content = msg.get("content", "")
logger.info(f"[AC-AISVC-02] [{i}] role={role}, content_length={len(content)}")
logger.info(f"[AC-AISVC-02] [{i}] content:\n{content}")
logger.info(f"[AC-AISVC-02] ==============================================")
return messages
@ -459,7 +874,8 @@ class OrchestratorService:
async def _save_messages(self, ctx: GenerationContext) -> None:
"""
[AC-AISVC-13] Save user and assistant messages to Memory.
Step 7 of the generation pipeline.
Step 11 of the 12-step pipeline.
Uses filtered_reply from Step 9.
"""
if not self._memory_service:
logger.info("[AC-AISVC-13] No memory service configured, skipping save")
@ -477,7 +893,13 @@ class OrchestratorService:
{"role": "user", "content": ctx.current_message},
]
if ctx.llm_response:
# Use filtered_reply if available, otherwise use llm_response.content
if ctx.filtered_reply:
messages_to_save.append({
"role": "assistant",
"content": ctx.filtered_reply,
})
elif ctx.llm_response:
messages_to_save.append({
"role": "assistant",
"content": ctx.llm_response.content,
@ -503,9 +925,16 @@ class OrchestratorService:
def _build_response(self, ctx: GenerationContext) -> ChatResponse:
"""
[AC-AISVC-02] Build final ChatResponse from generation context.
Step 8 of the generation pipeline.
Step 12 of the 12-step pipeline.
Uses filtered_reply from Step 9.
"""
reply = ctx.llm_response.content if ctx.llm_response else self._fallback_response(ctx)
# Use filtered_reply if available, otherwise use llm_response.content
if ctx.filtered_reply:
reply = ctx.filtered_reply
elif ctx.llm_response:
reply = ctx.llm_response.content
else:
reply = self._fallback_response(ctx)
confidence = ctx.confidence_result.confidence if ctx.confidence_result else 0.5
should_transfer = ctx.confidence_result.should_transfer if ctx.confidence_result else True
@ -534,6 +963,12 @@ class OrchestratorService:
Generate a streaming response.
[AC-AISVC-06, AC-AISVC-07, AC-AISVC-08] Yields SSE events in proper sequence.
12-Step Pipeline (same as generate, but with streaming LLM output):
1-7: Same as generate() up to PromptBuilder
8: LLM.stream_generate (streaming)
9: OutputFilter with streaming support
10-12: Confidence, Memory, Return
SSE Event Sequence (per design.md Section 6.2):
1. message events (multiple) - each with incremental delta
2. final event (exactly 1) - with complete response
@ -545,7 +980,7 @@ class OrchestratorService:
3. connection close
"""
logger.info(
f"[AC-AISVC-06] Starting streaming generation for tenant={tenant_id}, "
f"[AC-AISVC-06] Starting 12-step streaming generation for tenant={tenant_id}, "
f"session={request.session_id}"
)
@ -561,15 +996,33 @@ class OrchestratorService:
)
try:
# Steps 1-7: Same as generate()
await self._scan_input(ctx)
await self._load_local_history(ctx)
await self._merge_context(ctx, request.history)
await self._check_active_flow(ctx)
await self._match_intent(ctx)
if self._config.enable_rag and self._retriever:
await self._retrieve_evidence(ctx)
await self._build_system_prompt(ctx)
# Step 8: LLM streaming generation
full_reply = ""
if self._llm_client:
# If flow or intent already handled, stream the pre-determined response
if ctx.diagnostics.get("flow_handled") or ctx.diagnostics.get("intent_handled"):
if ctx.llm_response:
# Stream the pre-determined response character by character
import asyncio
for char in ctx.llm_response.content:
if not state_machine.can_send_message():
break
yield create_message_event(delta=char)
full_reply += char
await asyncio.sleep(0.01)
elif self._llm_client:
async for event in self._stream_from_llm(ctx, state_machine):
if event.event == "message":
full_reply += self._extract_delta_from_event(event)
@ -588,13 +1041,20 @@ class OrchestratorService:
finish_reason="stop",
)
# Step 9: OutputFilter (on complete reply)
await self._filter_output(ctx)
# Step 10: Confidence
self._calculate_confidence(ctx)
# Step 11: Memory
await self._save_messages(ctx)
# Step 12: Return final event
if await state_machine.transition_to_final():
final_reply = ctx.filtered_reply or full_reply
yield create_final_event(
reply=full_reply,
reply=final_reply,
confidence=ctx.confidence_result.confidence if ctx.confidence_result else 0.5,
should_transfer=ctx.confidence_result.should_transfer if ctx.confidence_result else False,
transfer_reason=ctx.confidence_result.transfer_reason if ctx.confidence_result else None,

View File

@ -6,22 +6,23 @@ Prompt template service for AI Service.
import logging
import time
import uuid
from collections.abc import Sequence
from datetime import datetime
from typing import Any, Sequence
from typing import Any
from sqlalchemy import select, func
from sqlalchemy import select
from sqlalchemy.ext.asyncio import AsyncSession
from sqlmodel import col
from app.core.prompts import SYSTEM_PROMPT
from app.models.entities import (
PromptTemplate,
PromptTemplateVersion,
PromptTemplateCreate,
PromptTemplateUpdate,
PromptTemplateVersion,
TemplateVersionStatus,
)
from app.services.prompt.variable_resolver import VariableResolver
from app.core.prompts import SYSTEM_PROMPT
logger = logging.getLogger(__name__)
@ -74,7 +75,7 @@ _template_cache = TemplateCache()
class PromptTemplateService:
"""
[AC-AISVC-52~AC-AISVC-58] Service for managing prompt templates.
Features:
- Template CRUD with tenant isolation
- Version management (auto-create new version on update)
@ -132,10 +133,10 @@ class PromptTemplateService:
stmt = select(PromptTemplate).where(
PromptTemplate.tenant_id == tenant_id
)
if scene:
stmt = stmt.where(PromptTemplate.scene == scene)
stmt = stmt.order_by(col(PromptTemplate.created_at).desc())
result = await self._session.execute(stmt)
return result.scalars().all()
@ -168,13 +169,13 @@ class PromptTemplateService:
return None
versions = await self._get_versions(template_id)
current_version = None
for v in versions:
if v.status == TemplateVersionStatus.PUBLISHED.value:
current_version = v
break
return {
"id": str(template.id),
"name": template.name,
@ -225,7 +226,7 @@ class PromptTemplateService:
if update_data.system_instruction is not None:
latest_version = await self._get_latest_version(template_id)
new_version_num = (latest_version.version + 1) if latest_version else 1
new_version = PromptTemplateVersion(
template_id=template_id,
version=new_version_num,
@ -236,9 +237,9 @@ class PromptTemplateService:
self._session.add(new_version)
await self._session.flush()
self._cache.invalidate(tenant_id, template.scene)
logger.info(
f"[AC-AISVC-53] Updated prompt template: tenant={tenant_id}, id={template_id}"
)
@ -259,26 +260,26 @@ class PromptTemplateService:
return False
versions = await self._get_versions(template_id)
for v in versions:
if v.status == TemplateVersionStatus.PUBLISHED.value:
v.status = TemplateVersionStatus.ARCHIVED.value
target_version = None
for v in versions:
if v.version == version:
target_version = v
break
if not target_version:
return False
target_version.status = TemplateVersionStatus.PUBLISHED.value
await self._session.flush()
self._cache.invalidate(tenant_id, template.scene)
self._cache.set(tenant_id, template.scene, target_version)
logger.info(
f"[AC-AISVC-54] Published template version: tenant={tenant_id}, "
f"template_id={template_id}, version={version}"
@ -304,7 +305,7 @@ class PromptTemplateService:
) -> str:
"""
[AC-AISVC-51, AC-AISVC-56] Get the published template for a scene.
Resolution order:
1. Check in-memory cache
2. Query database for published version
@ -402,12 +403,12 @@ class PromptTemplateService:
versions = await self._get_versions(template_id)
for v in versions:
await self._session.delete(v)
await self._session.delete(template)
await self._session.flush()
self._cache.invalidate(tenant_id, template.scene)
logger.info(
f"Deleted prompt template: tenant={tenant_id}, id={template_id}"
)

View File

@ -24,7 +24,7 @@ BUILTIN_VARIABLES = {
class VariableResolver:
"""
[AC-AISVC-56] Variable replacement engine for prompt templates.
Supports:
- Built-in variables: persona_name, current_time, channel_type, tenant_name, session_id
- Custom variables: defined in template with defaults
@ -50,17 +50,17 @@ class VariableResolver:
) -> str:
"""
Resolve all {{variable}} placeholders in the template.
Args:
template: Template string with {{variable}} placeholders
variables: Custom variable definitions from template
extra_context: Additional context for resolution
Returns:
Template with all variables replaced
"""
context = self._build_context(variables, extra_context)
def replace_var(match: re.Match) -> str:
var_name = match.group(1)
if var_name in context:
@ -70,7 +70,7 @@ class VariableResolver:
return str(value)
logger.warning(f"Unknown variable in template: {var_name}")
return match.group(0)
resolved = VARIABLE_PATTERN.sub(replace_var, template)
return resolved
@ -81,32 +81,32 @@ class VariableResolver:
) -> dict[str, Any]:
"""Build the complete context for variable resolution."""
context = {}
for key, value in BUILTIN_VARIABLES.items():
if key in self._context:
context[key] = self._context[key]
else:
context[key] = value
if variables:
for var in variables:
name = var.get("name")
default = var.get("default", "")
if name:
context[name] = default
if extra_context:
context.update(extra_context)
return context
def extract_variables(self, template: str) -> list[str]:
"""
Extract all variable names from a template.
Args:
template: Template string
Returns:
List of variable names found in the template
"""
@ -119,24 +119,24 @@ class VariableResolver:
) -> dict[str, Any]:
"""
Validate that all variables in template are defined.
Args:
template: Template string
defined_variables: Variables defined in template metadata
Returns:
Dict with 'valid' boolean and 'missing' list
"""
used_vars = set(self.extract_variables(template))
builtin_vars = set(BUILTIN_VARIABLES.keys())
defined_names = set()
if defined_variables:
defined_names = {v.get("name") for v in defined_variables if v.get("name")}
available_vars = builtin_vars | defined_names
missing = used_vars - available_vars
return {
"valid": len(missing) == 0,
"missing": list(missing),

View File

@ -10,28 +10,28 @@ from app.services.retrieval.base import (
RetrievalHit,
RetrievalResult,
)
from app.services.retrieval.vector_retriever import VectorRetriever, get_vector_retriever
from app.services.retrieval.indexer import (
IndexingProgress,
IndexingResult,
KnowledgeIndexer,
get_knowledge_indexer,
)
from app.services.retrieval.metadata import (
ChunkMetadata,
ChunkMetadataModel,
MetadataFilter,
KnowledgeChunk,
MetadataFilter,
RetrievalStrategy,
RetrieveRequest,
RetrieveResult,
RetrievalStrategy,
)
from app.services.retrieval.optimized_retriever import (
OptimizedRetriever,
get_optimized_retriever,
TwoStageResult,
RRFCombiner,
TwoStageResult,
get_optimized_retriever,
)
from app.services.retrieval.indexer import (
KnowledgeIndexer,
get_knowledge_indexer,
IndexingProgress,
IndexingResult,
)
from app.services.retrieval.vector_retriever import VectorRetriever, get_vector_retriever
__all__ = [
"BaseRetriever",
@ -41,6 +41,7 @@ __all__ = [
"VectorRetriever",
"get_vector_retriever",
"ChunkMetadata",
"ChunkMetadataModel",
"MetadataFilter",
"KnowledgeChunk",
"RetrieveRequest",

View File

@ -3,7 +3,6 @@ Knowledge base indexing service with optimized embedding.
Reference: rag-optimization/spec.md Section 5.1
"""
import asyncio
import logging
import uuid
from dataclasses import dataclass, field
@ -12,7 +11,7 @@ from typing import Any
from app.core.config import get_settings
from app.core.qdrant_client import QdrantClient, get_qdrant_client
from app.services.embedding.nomic_provider import NomicEmbeddingProvider, NomicEmbeddingResult
from app.services.embedding.nomic_provider import NomicEmbeddingProvider
from app.services.retrieval.metadata import ChunkMetadata, KnowledgeChunk
logger = logging.getLogger(__name__)
@ -27,13 +26,13 @@ class IndexingProgress:
failed_chunks: int = 0
current_document: str = ""
started_at: datetime = field(default_factory=datetime.utcnow)
@property
def progress_percent(self) -> int:
if self.total_chunks == 0:
return 0
return int((self.processed_chunks / self.total_chunks) * 100)
@property
def elapsed_seconds(self) -> float:
return (datetime.utcnow() - self.started_at).total_seconds()
@ -53,14 +52,14 @@ class IndexingResult:
class KnowledgeIndexer:
"""
Knowledge base indexer with optimized embedding.
Features:
- Task prefixes (search_document:) for document embedding
- Multi-dimensional vectors (256/512/768)
- Metadata support
- Batch processing
"""
def __init__(
self,
qdrant_client: QdrantClient | None = None,
@ -75,12 +74,12 @@ class KnowledgeIndexer:
self._chunk_overlap = chunk_overlap
self._batch_size = batch_size
self._progress: IndexingProgress | None = None
async def _get_client(self) -> QdrantClient:
if self._qdrant_client is None:
self._qdrant_client = await get_qdrant_client()
return self._qdrant_client
async def _get_embedding_provider(self) -> NomicEmbeddingProvider:
if self._embedding_provider is None:
self._embedding_provider = NomicEmbeddingProvider(
@ -89,30 +88,30 @@ class KnowledgeIndexer:
dimension=settings.qdrant_vector_size,
)
return self._embedding_provider
def chunk_text(self, text: str, metadata: ChunkMetadata | None = None) -> list[KnowledgeChunk]:
"""
Split text into chunks for indexing.
Each line becomes a separate chunk for better retrieval granularity.
Args:
text: Full text to chunk
metadata: Metadata to attach to each chunk
Returns:
List of KnowledgeChunk objects
"""
chunks = []
doc_id = str(uuid.uuid4())
lines = text.split('\n')
for i, line in enumerate(lines):
line = line.strip()
if len(line) < 10:
continue
chunk = KnowledgeChunk(
chunk_id=f"{doc_id}_{i}",
document_id=doc_id,
@ -120,9 +119,9 @@ class KnowledgeIndexer:
metadata=metadata or ChunkMetadata(),
)
chunks.append(chunk)
return chunks
def chunk_text_by_lines(
self,
text: str,
@ -132,25 +131,25 @@ class KnowledgeIndexer:
) -> list[KnowledgeChunk]:
"""
Split text by lines, each line is a separate chunk.
Args:
text: Full text to chunk
metadata: Metadata to attach to each chunk
min_line_length: Minimum line length to be indexed
merge_short_lines: Whether to merge consecutive short lines
Returns:
List of KnowledgeChunk objects
"""
chunks = []
doc_id = str(uuid.uuid4())
lines = text.split('\n')
if merge_short_lines:
merged_lines = []
current_line = ""
for line in lines:
line = line.strip()
if not line:
@ -158,27 +157,27 @@ class KnowledgeIndexer:
merged_lines.append(current_line)
current_line = ""
continue
if current_line:
current_line += " " + line
else:
current_line = line
if len(current_line) >= min_line_length * 2:
merged_lines.append(current_line)
current_line = ""
if current_line:
merged_lines.append(current_line)
lines = merged_lines
for i, line in enumerate(lines):
line = line.strip()
if len(line) < min_line_length:
continue
chunk = KnowledgeChunk(
chunk_id=f"{doc_id}_{i}",
document_id=doc_id,
@ -186,9 +185,9 @@ class KnowledgeIndexer:
metadata=metadata or ChunkMetadata(),
)
chunks.append(chunk)
return chunks
async def index_document(
self,
tenant_id: str,
@ -198,40 +197,40 @@ class KnowledgeIndexer:
) -> IndexingResult:
"""
Index a single document with optimized embedding.
Args:
tenant_id: Tenant identifier
document_id: Document identifier
text: Document text content
metadata: Optional metadata for the document
Returns:
IndexingResult with status and statistics
"""
start_time = datetime.utcnow()
try:
client = await self._get_client()
provider = await self._get_embedding_provider()
await client.ensure_collection_exists(tenant_id, use_multi_vector=True)
chunks = self.chunk_text(text, metadata)
self._progress = IndexingProgress(
total_chunks=len(chunks),
current_document=document_id,
)
points = []
for i, chunk in enumerate(chunks):
try:
embedding_result = await provider.embed_document(chunk.content)
chunk.embedding_full = embedding_result.embedding_full
chunk.embedding_256 = embedding_result.embedding_256
chunk.embedding_512 = embedding_result.embedding_512
point = {
"id": str(uuid.uuid4()), # Generate a valid UUID for Qdrant
"vector": {
@ -248,27 +247,27 @@ class KnowledgeIndexer:
}
}
points.append(point)
self._progress.processed_chunks += 1
logger.debug(
f"[RAG-OPT] Indexed chunk {i+1}/{len(chunks)} for doc={document_id}"
)
except Exception as e:
logger.warning(f"[RAG-OPT] Failed to index chunk {i}: {e}")
self._progress.failed_chunks += 1
if points:
await client.upsert_multi_vector(tenant_id, points)
elapsed = (datetime.utcnow() - start_time).total_seconds()
logger.info(
f"[RAG-OPT] Indexed document {document_id}: "
f"{len(points)} chunks in {elapsed:.2f}s"
)
return IndexingResult(
success=True,
total_chunks=len(chunks),
@ -276,11 +275,11 @@ class KnowledgeIndexer:
failed_chunks=self._progress.failed_chunks,
elapsed_seconds=elapsed,
)
except Exception as e:
elapsed = (datetime.utcnow() - start_time).total_seconds()
logger.error(f"[RAG-OPT] Failed to index document {document_id}: {e}")
return IndexingResult(
success=False,
total_chunks=0,
@ -289,7 +288,7 @@ class KnowledgeIndexer:
elapsed_seconds=elapsed,
error_message=str(e),
)
async def index_documents_batch(
self,
tenant_id: str,
@ -297,7 +296,7 @@ class KnowledgeIndexer:
) -> list[IndexingResult]:
"""
Index multiple documents in batch.
Args:
tenant_id: Tenant identifier
documents: List of documents with format:
@ -306,12 +305,12 @@ class KnowledgeIndexer:
"text": str,
"metadata": ChunkMetadata (optional)
}
Returns:
List of IndexingResult for each document
"""
results = []
for doc in documents:
result = await self.index_document(
tenant_id=tenant_id,
@ -320,9 +319,9 @@ class KnowledgeIndexer:
metadata=doc.get("metadata"),
)
results.append(result)
return results
def get_progress(self) -> IndexingProgress | None:
"""Get current indexing progress."""
return self._progress

View File

@ -50,7 +50,7 @@ class ChunkMetadata:
valid_until: date | None = None
priority: int = 5
keywords: list[str] = field(default_factory=list)
def to_dict(self) -> dict[str, Any]:
"""Convert to dictionary for storage."""
return {
@ -65,7 +65,7 @@ class ChunkMetadata:
"priority": self.priority,
"keywords": self.keywords,
}
@classmethod
def from_dict(cls, data: dict[str, Any]) -> "ChunkMetadata":
"""Create from dictionary."""
@ -95,29 +95,29 @@ class MetadataFilter:
valid_only: bool = True
min_priority: int | None = None
keywords: list[str] | None = None
def to_qdrant_filter(self) -> dict[str, Any] | None:
"""Convert to Qdrant filter format."""
conditions = []
if self.categories:
conditions.append({
"key": "metadata.category",
"match": {"any": self.categories}
})
if self.departments:
conditions.append({
"key": "metadata.department",
"match": {"any": self.departments}
})
if self.target_audiences:
conditions.append({
"key": "metadata.target_audience",
"match": {"any": self.target_audiences}
})
if self.valid_only:
today = date.today().isoformat()
conditions.append({
@ -126,19 +126,19 @@ class MetadataFilter:
{"key": "metadata.valid_until", "range": {"gte": today}}
]
})
if self.min_priority is not None:
conditions.append({
"key": "metadata.priority",
"range": {"lte": self.min_priority}
})
if not conditions:
return None
if len(conditions) == 1:
return {"must": conditions}
return {"must": conditions}
@ -157,7 +157,7 @@ class KnowledgeChunk:
metadata: ChunkMetadata = field(default_factory=ChunkMetadata)
created_at: datetime = field(default_factory=datetime.utcnow)
updated_at: datetime = field(default_factory=datetime.utcnow)
def to_qdrant_point(self, point_id: int | str) -> dict[str, Any]:
"""Convert to Qdrant point format."""
return {
@ -189,7 +189,7 @@ class RetrieveRequest:
top_k: int = 10
filters: MetadataFilter | None = None
strategy: RetrievalStrategy = RetrievalStrategy.HYBRID
def __post_init__(self):
if not self.query_with_prefix:
self.query_with_prefix = f"search_query:{self.query}"

View File

@ -6,7 +6,7 @@ Reference: rag-optimization/spec.md Section 2.2, 2.4, 2.5
import asyncio
import logging
import re
from dataclasses import dataclass, field
from dataclasses import dataclass
from typing import Any
from app.core.config import get_settings
@ -19,10 +19,7 @@ from app.services.retrieval.base import (
RetrievalResult,
)
from app.services.retrieval.metadata import (
ChunkMetadata,
MetadataFilter,
RetrieveResult,
RetrievalStrategy,
)
logger = logging.getLogger(__name__)
@ -42,14 +39,14 @@ class RRFCombiner:
"""
Reciprocal Rank Fusion for combining multiple retrieval results.
Reference: rag-optimization/spec.md Section 2.5
Formula: score = Σ(1 / (k + rank_i))
Default k = 60
"""
def __init__(self, k: int = 60):
self._k = k
def combine(
self,
vector_results: list[dict[str, Any]],
@ -59,22 +56,22 @@ class RRFCombiner:
) -> list[dict[str, Any]]:
"""
Combine vector and BM25 results using RRF.
Args:
vector_results: Results from vector search
bm25_results: Results from BM25 search
vector_weight: Weight for vector results
bm25_weight: Weight for BM25 results
Returns:
Combined and sorted results
"""
combined_scores: dict[str, dict[str, Any]] = {}
for rank, result in enumerate(vector_results):
chunk_id = result.get("chunk_id") or result.get("id", str(rank))
rrf_score = vector_weight / (self._k + rank + 1)
if chunk_id not in combined_scores:
combined_scores[chunk_id] = {
"score": 0.0,
@ -91,13 +88,13 @@ class RRFCombiner:
combined_scores[chunk_id]["vector_rank"] = rank
if result.get("vector"):
combined_scores[chunk_id]["vector"] = result.get("vector")
combined_scores[chunk_id]["score"] += rrf_score
for rank, result in enumerate(bm25_results):
chunk_id = result.get("chunk_id") or result.get("id", str(rank))
rrf_score = bm25_weight / (self._k + rank + 1)
if chunk_id not in combined_scores:
combined_scores[chunk_id] = {
"score": 0.0,
@ -112,15 +109,15 @@ class RRFCombiner:
else:
combined_scores[chunk_id]["bm25_score"] = result.get("score", 0.0)
combined_scores[chunk_id]["bm25_rank"] = rank
combined_scores[chunk_id]["score"] += rrf_score
sorted_results = sorted(
combined_scores.values(),
key=lambda x: x["score"],
reverse=True
)
return sorted_results
@ -131,10 +128,10 @@ class OptimizedRetriever(BaseRetriever):
- Two-stage retrieval (256 dim -> 768 dim)
- RRF hybrid ranking (vector + BM25)
- Metadata filtering
Reference: rag-optimization/spec.md Section 2, 3, 4
"""
def __init__(
self,
qdrant_client: QdrantClient | None = None,
@ -155,12 +152,12 @@ class OptimizedRetriever(BaseRetriever):
self._hybrid_enabled = hybrid_enabled if hybrid_enabled is not None else settings.rag_hybrid_enabled
self._rrf_k = rrf_k or settings.rag_rrf_k
self._rrf_combiner = RRFCombiner(k=self._rrf_k)
async def _get_client(self) -> QdrantClient:
if self._qdrant_client is None:
self._qdrant_client = await get_qdrant_client()
return self._qdrant_client
async def _get_embedding_provider(self) -> NomicEmbeddingProvider:
from app.services.embedding.factory import get_embedding_config_manager
manager = get_embedding_config_manager()
@ -173,11 +170,11 @@ class OptimizedRetriever(BaseRetriever):
model=settings.ollama_embedding_model,
dimension=settings.qdrant_vector_size,
)
async def retrieve(self, ctx: RetrievalContext) -> RetrievalResult:
"""
Retrieve documents using optimized strategy.
Strategy selection:
1. If two_stage_enabled: use two-stage retrieval
2. If hybrid_enabled: use RRF hybrid ranking
@ -191,17 +188,17 @@ class OptimizedRetriever(BaseRetriever):
f"[RAG-OPT] Retrieval config: top_k={self._top_k}, "
f"score_threshold={self._score_threshold}, min_hits={self._min_hits}"
)
try:
provider = await self._get_embedding_provider()
logger.info(f"[RAG-OPT] Using embedding provider: {type(provider).__name__}")
embedding_result = await provider.embed_query(ctx.query)
logger.info(
f"[RAG-OPT] Embedding generated: full_dim={len(embedding_result.embedding_full)}, "
f"dim_256={'available' if embedding_result.embedding_256 else 'not available'}"
)
if self._two_stage_enabled and self._hybrid_enabled:
logger.info("[RAG-OPT] Using two-stage + hybrid retrieval strategy")
results = await self._two_stage_hybrid_retrieve(
@ -232,9 +229,9 @@ class OptimizedRetriever(BaseRetriever):
embedding_result.embedding_full,
self._top_k,
)
logger.info(f"[RAG-OPT] Raw results count: {len(results)}")
retrieval_hits = [
RetrievalHit(
text=result.get("payload", {}).get("text", ""),
@ -245,15 +242,15 @@ class OptimizedRetriever(BaseRetriever):
for result in results
if result.get("score", 0.0) >= self._score_threshold
]
filtered_count = len(results) - len(retrieval_hits)
if filtered_count > 0:
logger.info(
f"[RAG-OPT] Filtered out {filtered_count} results below threshold {self._score_threshold}"
)
is_insufficient = len(retrieval_hits) < self._min_hits
diagnostics = {
"query_length": len(ctx.query),
"top_k": self._top_k,
@ -266,30 +263,30 @@ class OptimizedRetriever(BaseRetriever):
"raw_results_count": len(results),
"filtered_below_threshold": filtered_count,
}
logger.info(
f"[RAG-OPT] Retrieval complete: {len(retrieval_hits)} hits, "
f"insufficient={is_insufficient}, max_score={diagnostics['max_score']:.3f}"
)
if len(retrieval_hits) == 0:
logger.warning(
f"[RAG-OPT] No hits found! tenant={ctx.tenant_id}, query={ctx.query[:50]}..., "
f"raw_results={len(results)}, threshold={self._score_threshold}"
)
return RetrievalResult(
hits=retrieval_hits,
diagnostics=diagnostics,
)
except Exception as e:
logger.error(f"[RAG-OPT] Retrieval error: {e}", exc_info=True)
return RetrievalResult(
hits=[],
diagnostics={"error": str(e), "is_insufficient": True},
)
async def _two_stage_retrieve(
self,
tenant_id: str,
@ -298,16 +295,16 @@ class OptimizedRetriever(BaseRetriever):
) -> list[dict[str, Any]]:
"""
Two-stage retrieval using Matryoshka dimensions.
Stage 1: Fast retrieval with 256-dim vectors
Stage 2: Precise reranking with 768-dim vectors
Reference: rag-optimization/spec.md Section 2.4
"""
import time
client = await self._get_client()
stage1_start = time.perf_counter()
candidates = await self._search_with_dimension(
client, tenant_id, embedding_result.embedding_256, "dim_256",
@ -315,22 +312,22 @@ class OptimizedRetriever(BaseRetriever):
with_vectors=True,
)
stage1_latency = (time.perf_counter() - stage1_start) * 1000
logger.info(
f"[RAG-OPT] Stage 1: {len(candidates)} candidates in {stage1_latency:.2f}ms"
)
stage2_start = time.perf_counter()
reranked = []
for candidate in candidates:
vector_data = candidate.get("vector", {})
stored_full_embedding = None
if isinstance(vector_data, dict):
stored_full_embedding = vector_data.get("full", [])
elif isinstance(vector_data, list):
stored_full_embedding = vector_data
if stored_full_embedding and len(stored_full_embedding) > 0:
similarity = self._cosine_similarity(
embedding_result.embedding_full,
@ -339,17 +336,17 @@ class OptimizedRetriever(BaseRetriever):
candidate["score"] = similarity
candidate["stage"] = "reranked"
reranked.append(candidate)
reranked.sort(key=lambda x: x.get("score", 0), reverse=True)
results = reranked[:top_k]
stage2_latency = (time.perf_counter() - stage2_start) * 1000
logger.info(
f"[RAG-OPT] Stage 2: {len(results)} final results in {stage2_latency:.2f}ms"
)
return results
async def _hybrid_retrieve(
self,
tenant_id: str,
@ -359,39 +356,39 @@ class OptimizedRetriever(BaseRetriever):
) -> list[dict[str, Any]]:
"""
Hybrid retrieval using RRF to combine vector and BM25 results.
Reference: rag-optimization/spec.md Section 2.5
"""
client = await self._get_client()
vector_task = self._search_with_dimension(
client, tenant_id, embedding_result.embedding_full, "full",
top_k * 2
)
bm25_task = self._bm25_search(client, tenant_id, query, top_k * 2)
vector_results, bm25_results = await asyncio.gather(
vector_task, bm25_task, return_exceptions=True
)
if isinstance(vector_results, Exception):
logger.warning(f"[RAG-OPT] Vector search failed: {vector_results}")
vector_results = []
if isinstance(bm25_results, Exception):
logger.warning(f"[RAG-OPT] BM25 search failed: {bm25_results}")
bm25_results = []
combined = self._rrf_combiner.combine(
vector_results,
bm25_results,
vector_weight=settings.rag_vector_weight,
bm25_weight=settings.rag_bm25_weight,
)
return combined[:top_k]
async def _two_stage_hybrid_retrieve(
self,
tenant_id: str,
@ -401,64 +398,64 @@ class OptimizedRetriever(BaseRetriever):
) -> list[dict[str, Any]]:
"""
Two-stage + Hybrid retrieval strategy.
Stage 1: Fast retrieval with 256-dim vectors + BM25 in parallel
Stage 2: RRF fusion + Precise reranking with 768-dim vectors
This combines the best of both worlds:
- Two-stage: Speed from 256-dim, precision from 768-dim reranking
- Hybrid: Semantic matching from vectors, keyword matching from BM25
"""
import time
client = await self._get_client()
stage1_start = time.perf_counter()
vector_task = self._search_with_dimension(
client, tenant_id, embedding_result.embedding_256, "dim_256",
top_k * self._two_stage_expand_factor,
with_vectors=True,
)
bm25_task = self._bm25_search(client, tenant_id, query, top_k * self._two_stage_expand_factor)
vector_results, bm25_results = await asyncio.gather(
vector_task, bm25_task, return_exceptions=True
)
if isinstance(vector_results, Exception):
logger.warning(f"[RAG-OPT] Vector search failed: {vector_results}")
vector_results = []
if isinstance(bm25_results, Exception):
logger.warning(f"[RAG-OPT] BM25 search failed: {bm25_results}")
bm25_results = []
stage1_latency = (time.perf_counter() - stage1_start) * 1000
logger.info(
f"[RAG-OPT] Two-stage Hybrid Stage 1: vector={len(vector_results)}, bm25={len(bm25_results)}, latency={stage1_latency:.2f}ms"
)
stage2_start = time.perf_counter()
combined = self._rrf_combiner.combine(
vector_results,
bm25_results,
vector_weight=settings.rag_vector_weight,
bm25_weight=settings.rag_bm25_weight,
)
reranked = []
for candidate in combined[:top_k * 2]:
vector_data = candidate.get("vector", {})
stored_full_embedding = None
if isinstance(vector_data, dict):
stored_full_embedding = vector_data.get("full", [])
elif isinstance(vector_data, list):
stored_full_embedding = vector_data
if stored_full_embedding and len(stored_full_embedding) > 0:
similarity = self._cosine_similarity(
embedding_result.embedding_full,
@ -467,17 +464,17 @@ class OptimizedRetriever(BaseRetriever):
candidate["score"] = similarity
candidate["stage"] = "two_stage_hybrid_reranked"
reranked.append(candidate)
reranked.sort(key=lambda x: x.get("score", 0), reverse=True)
results = reranked[:top_k]
stage2_latency = (time.perf_counter() - stage2_start) * 1000
logger.info(
f"[RAG-OPT] Two-stage Hybrid Stage 2 (reranking): {len(results)} final results in {stage2_latency:.2f}ms"
)
return results
async def _vector_retrieve(
self,
tenant_id: str,
@ -489,7 +486,7 @@ class OptimizedRetriever(BaseRetriever):
return await self._search_with_dimension(
client, tenant_id, embedding, "full", top_k
)
async def _search_with_dimension(
self,
client: QdrantClient,
@ -505,7 +502,7 @@ class OptimizedRetriever(BaseRetriever):
f"[RAG-OPT] Searching with vector_name={vector_name}, "
f"limit={limit}, vector_dim={len(query_vector)}, with_vectors={with_vectors}"
)
results = await client.search(
tenant_id=tenant_id,
query_vector=query_vector,
@ -513,17 +510,17 @@ class OptimizedRetriever(BaseRetriever):
vector_name=vector_name,
with_vectors=with_vectors,
)
logger.info(
f"[RAG-OPT] Search returned {len(results)} results"
)
if len(results) > 0:
for i, r in enumerate(results[:3]):
logger.debug(
f"[RAG-OPT] Result {i+1}: id={r['id']}, score={r['score']:.4f}"
)
return results
except Exception as e:
logger.error(
@ -531,7 +528,7 @@ class OptimizedRetriever(BaseRetriever):
exc_info=True
)
return []
async def _bm25_search(
self,
client: QdrantClient,
@ -546,15 +543,15 @@ class OptimizedRetriever(BaseRetriever):
try:
qdrant = await client.get_client()
collection_name = client.get_collection_name(tenant_id)
query_terms = set(re.findall(r'\w+', query.lower()))
results = await qdrant.scroll(
collection_name=collection_name,
limit=limit * 3,
with_payload=True,
)
scored_results = []
for point in results[0]:
text = point.payload.get("text", "").lower()
@ -567,21 +564,21 @@ class OptimizedRetriever(BaseRetriever):
"score": score,
"payload": point.payload or {},
})
scored_results.sort(key=lambda x: x["score"], reverse=True)
return scored_results[:limit]
except Exception as e:
logger.debug(f"[RAG-OPT] BM25 search failed: {e}")
return []
def _cosine_similarity(self, vec1: list[float], vec2: list[float]) -> float:
"""Calculate cosine similarity between two vectors."""
import numpy as np
a = np.array(vec1)
b = np.array(vec2)
return float(np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b)))
async def health_check(self) -> bool:
"""Check if retriever is healthy."""
try:

View File

@ -4,7 +4,6 @@ Vector retriever for AI Service.
"""
import logging
from typing import Any
from app.core.config import get_settings
from app.core.qdrant_client import QdrantClient, get_qdrant_client
@ -84,7 +83,7 @@ class VectorRetriever(BaseRetriever):
limit=self._top_k,
score_threshold=self._score_threshold,
)
logger.info(f"[AC-AISVC-16] Search returned {len(hits)} raw hits")
retrieval_hits = [
@ -115,7 +114,7 @@ class VectorRetriever(BaseRetriever):
f"[AC-AISVC-17] Retrieval complete: {len(retrieval_hits)} hits, "
f"insufficient={is_insufficient}, max_score={diagnostics['max_score']:.3f}"
)
if len(retrieval_hits) == 0:
logger.warning(
f"[AC-AISVC-17] No hits found! tenant={ctx.tenant_id}, "
@ -140,7 +139,7 @@ class VectorRetriever(BaseRetriever):
[AC-AISVC-29] Uses configured embedding provider.
"""
from app.services.embedding import get_embedding_provider
provider = await get_embedding_provider()
return await provider.embed(text)

View File

@ -0,0 +1,178 @@
# Phase 11 多知识库管理 - 实现进度报告
**日期**: 2026-02-27
**版本**: v0.6.0
**状态**: T11.1~T11.5 已完成
---
## 1. 任务完成情况
| 任务ID | 描述 | 状态 | 验收标准 |
|--------|------|------|----------|
| T11.1 | 扩展 KnowledgeBase 实体 | ✅ 完成 | AC-AISVC-59 |
| T11.2 | 实现知识库 CRUD 服务 | ✅ 完成 | AC-AISVC-59, AC-AISVC-61, AC-AISVC-62 |
| T11.3 | 实现知识库管理 API | ✅ 完成 | AC-AISVC-59, AC-AISVC-60, AC-AISVC-61, AC-AISVC-62 |
| T11.4 | 升级 Qdrant Collection 命名 | ✅ 完成 | AC-AISVC-63 |
| T11.5 | 修改文档上传流程 | ✅ 完成 | AC-AISVC-63 |
| T11.6 | 修改 OptimizedRetriever | ⏳ 待处理 | AC-AISVC-64 |
| T11.7 | kb_default 迁移 | ⏳ 待处理 | AC-AISVC-59 |
| T11.8 | 单元测试 | ⏳ 待处理 | AC-AISVC-59~AC-AISVC-64 |
---
## 2. 实现详情
### 2.1 T11.1 - 扩展 KnowledgeBase 实体
**文件**: `app/models/entities.py`
新增字段:
- `kb_type`: 知识库类型 (product/faq/script/policy/general)
- `priority`: 优先级权重 (数值越大越优先)
- `is_enabled`: 是否启用
- `doc_count`: 文档数量 (冗余统计)
新增枚举:
```python
class KBType(str, Enum):
PRODUCT = "product"
FAQ = "faq"
SCRIPT = "script"
POLICY = "policy"
GENERAL = "general"
```
新增 Schema:
- `KnowledgeBaseCreate`: 创建知识库请求
- `KnowledgeBaseUpdate`: 更新知识库请求
---
### 2.2 T11.2 - 知识库 CRUD 服务
**文件**: `app/services/knowledge_base_service.py` (新建)
核心方法:
- `create_knowledge_base()`: 创建知识库并初始化 Qdrant Collection
- `get_knowledge_base()`: 获取单个知识库
- `list_knowledge_bases()`: 列表查询 (支持按类型和状态过滤)
- `update_knowledge_base()`: 更新知识库
- `delete_knowledge_base()`: 删除知识库及关联文档和 Collection
- `update_doc_count()`: 更新文档计数
- `get_or_create_default_kb()`: 获取或创建默认知识库
---
### 2.3 T11.3 - 知识库管理 API
**文件**: `app/api/admin/kb.py`
新增端点:
| 方法 | 路径 | 描述 |
|------|------|------|
| GET | /admin/kb/knowledge-bases | 查询知识库列表 |
| POST | /admin/kb/knowledge-bases | 创建知识库 |
| GET | /admin/kb/knowledge-bases/{kb_id} | 获取知识库详情 |
| PUT | /admin/kb/knowledge-bases/{kb_id} | 更新知识库 |
| DELETE | /admin/kb/knowledge-bases/{kb_id} | 删除知识库 |
---
### 2.4 T11.4 - Qdrant Collection 命名升级
**文件**: `app/core/qdrant_client.py`
命名规则:
- 旧格式 (兼容): `kb_{tenantId}`
- 新格式: `kb_{tenantId}_{kbId}`
新增方法:
- `get_kb_collection_name()`: 获取知识库 Collection 名称
- `ensure_kb_collection_exists()`: 确保知识库 Collection 存在
- `delete_kb_collection()`: 删除知识库 Collection
- `search_kb()`: 多知识库检索
更新方法:
- `upsert_vectors()`: 新增 `kb_id` 参数
- `upsert_multi_vector()`: 新增 `kb_id` 参数
---
### 2.5 T11.5 - 文档上传流程改造
**文件**: `app/api/admin/kb.py`
变更:
1. `upload_document()` 端点支持指定 `kb_id` 参数
2. 如果知识库不存在,自动使用默认知识库
3. 上传后更新知识库文档计数
4. `_index_document()` 函数新增 `kb_id` 参数
5. 索引时使用 `ensure_kb_collection_exists()` 创建 Collection
6. 向量 payload 中新增 `kb_id` 字段
---
## 3. 数据库迁移
需要执行以下 SQL 来更新现有表结构:
```sql
-- 添加新字段到 knowledge_bases 表
ALTER TABLE knowledge_bases ADD COLUMN IF NOT EXISTS kb_type VARCHAR DEFAULT 'general';
ALTER TABLE knowledge_bases ADD COLUMN IF NOT EXISTS priority INTEGER DEFAULT 0;
ALTER TABLE knowledge_bases ADD COLUMN IF NOT EXISTS is_enabled BOOLEAN DEFAULT TRUE;
ALTER TABLE knowledge_bases ADD COLUMN IF NOT EXISTS doc_count INTEGER DEFAULT 0;
-- 创建索引
CREATE INDEX IF NOT EXISTS ix_knowledge_bases_tenant_kb_type ON knowledge_bases (tenant_id, kb_type);
```
---
## 4. 待完成任务
### T11.6 - 修改 OptimizedRetriever
- 支持 `target_kb_ids` 参数
- 实现多 Collection 并行检索
- 按知识库优先级排序结果
### T11.7 - kb_default 迁移
- 首次启动时为现有数据创建默认知识库记录
- 将现有 `kb_{tenantId}` Collection 映射为默认知识库
### T11.8 - 单元测试
- 知识库 CRUD 服务测试
- 多知识库检索测试
- Collection 命名兼容性测试
---
## 5. 验证清单
- [x] KnowledgeBase 实体包含新字段
- [x] 知识库创建时自动创建 Qdrant Collection
- [x] 知识库删除时自动删除 Qdrant Collection
- [x] 文档上传支持指定 kb_id
- [x] 向量索引到正确的 Collection
- [x] Collection 命名兼容旧格式
- [ ] 多知识库并行检索 (T11.6)
- [ ] kb_default 自动迁移 (T11.7)
- [ ] 单元测试通过 (T11.8)
---
## 6. 相关文件
### 新建文件
- `app/services/knowledge_base_service.py`
### 修改文件
- `app/models/entities.py`
- `app/core/qdrant_client.py`
- `app/api/admin/kb.py`
---
**下一步**: 执行 T11.6~T11.8 或进入 Phase 12 (意图识别与规则引擎)

View File

@ -21,6 +21,7 @@ dependencies = [
"pymupdf>=1.23.0",
"pdfplumber>=0.10.0",
"python-multipart>=0.0.6",
"redis>=5.0.0",
]
[project.optional-dependencies]

View File

@ -0,0 +1,74 @@
-- Migration: Add monitoring fields to chat_messages and create new monitoring tables
-- Date: 2026-02-27
-- Issue: [AC-AISVC-91~AC-AISVC-95, AC-AISVC-108~AC-AISVC-110] Dashboard and monitoring enhancement
-- Add monitoring fields to chat_messages
ALTER TABLE chat_messages
ADD COLUMN IF NOT EXISTS prompt_template_id UUID REFERENCES prompt_templates(id);
ALTER TABLE chat_messages
ADD COLUMN IF NOT EXISTS intent_rule_id UUID REFERENCES intent_rules(id);
ALTER TABLE chat_messages
ADD COLUMN IF NOT EXISTS flow_instance_id UUID REFERENCES flow_instances(id);
ALTER TABLE chat_messages
ADD COLUMN IF NOT EXISTS guardrail_triggered BOOLEAN DEFAULT FALSE;
ALTER TABLE chat_messages
ADD COLUMN IF NOT EXISTS guardrail_words JSONB;
-- Add indexes for monitoring queries
CREATE INDEX IF NOT EXISTS ix_chat_messages_tenant_template
ON chat_messages (tenant_id, prompt_template_id);
CREATE INDEX IF NOT EXISTS ix_chat_messages_tenant_intent
ON chat_messages (tenant_id, intent_rule_id);
CREATE INDEX IF NOT EXISTS ix_chat_messages_tenant_flow
ON chat_messages (tenant_id, flow_instance_id);
CREATE INDEX IF NOT EXISTS ix_chat_messages_guardrail
ON chat_messages (tenant_id, guardrail_triggered)
WHERE guardrail_triggered = TRUE;
-- Create flow_test_records table
CREATE TABLE IF NOT EXISTS flow_test_records (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id VARCHAR NOT NULL,
session_id VARCHAR NOT NULL,
status VARCHAR DEFAULT 'success',
steps JSONB NOT NULL DEFAULT '[]',
final_response JSONB,
total_duration_ms INTEGER,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX IF NOT EXISTS ix_flow_test_records_tenant_created
ON flow_test_records (tenant_id, created_at);
CREATE INDEX IF NOT EXISTS ix_flow_test_records_session
ON flow_test_records (session_id);
-- Create export_tasks table
CREATE TABLE IF NOT EXISTS export_tasks (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id VARCHAR NOT NULL,
status VARCHAR DEFAULT 'processing',
file_path VARCHAR,
file_name VARCHAR,
file_size INTEGER,
total_rows INTEGER,
format VARCHAR DEFAULT 'json',
filters JSONB,
error_message TEXT,
expires_at TIMESTAMP,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
completed_at TIMESTAMP
);
CREATE INDEX IF NOT EXISTS ix_export_tasks_tenant_status
ON export_tasks (tenant_id, status);
CREATE INDEX IF NOT EXISTS ix_export_tasks_tenant_created
ON export_tasks (tenant_id, created_at);

View File

@ -0,0 +1,57 @@
"""
Migration script to add monitoring fields to chat_messages table.
Run: python scripts/migrations/add_chat_message_fields.py
"""
import asyncio
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent.parent))
from sqlalchemy import text
from sqlalchemy.ext.asyncio import create_async_engine
from sqlalchemy.orm import sessionmaker
from sqlmodel.ext.asyncio.session import AsyncSession
from app.core.config import get_settings
async def run_migration():
"""Run the migration to add new columns to chat_messages table."""
settings = get_settings()
engine = create_async_engine(settings.database_url, echo=True)
async_session_maker = sessionmaker(engine, class_=AsyncSession, expire_on_commit=False)
statements = [
"ALTER TABLE chat_messages ADD COLUMN IF NOT EXISTS prompt_template_id UUID",
"ALTER TABLE chat_messages ADD COLUMN IF NOT EXISTS intent_rule_id UUID",
"ALTER TABLE chat_messages ADD COLUMN IF NOT EXISTS flow_instance_id UUID",
"ALTER TABLE chat_messages ADD COLUMN IF NOT EXISTS guardrail_triggered BOOLEAN DEFAULT FALSE",
"ALTER TABLE chat_messages ADD COLUMN IF NOT EXISTS guardrail_words JSONB",
"CREATE INDEX IF NOT EXISTS ix_chat_messages_tenant_template ON chat_messages(tenant_id, prompt_template_id)",
"CREATE INDEX IF NOT EXISTS ix_chat_messages_tenant_intent ON chat_messages(tenant_id, intent_rule_id)",
"CREATE INDEX IF NOT EXISTS ix_chat_messages_tenant_flow ON chat_messages(tenant_id, flow_instance_id)",
]
async with async_session_maker() as session:
for statement in statements:
try:
await session.execute(text(statement))
print(f"Executed: {statement[:60]}...")
except Exception as e:
error_str = str(e).lower()
if "already exists" in error_str or "duplicate" in error_str:
print(f"Skipped (already exists): {statement[:60]}...")
else:
print(f"Error: {e}")
raise
await session.commit()
print("\nMigration completed successfully!")
await engine.dispose()
if __name__ == "__main__":
asyncio.run(run_migration())

View File

@ -0,0 +1,115 @@
## 继续 - 意图规则使用指南补充内容
### 4.2 创建意图规则 - 响应示例
```json
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"name": "转人工意图",
"keywords": ["人工", "客服", "投诉", "找人"],
"patterns": ["转.*人工", "找.*客服", "我要.*投诉"],
"priority": 200,
"response_type": "transfer",
"target_kb_ids": [],
"flow_id": null,
"fixed_reply": null,
"transfer_message": "正在为您转接人工客服,请稍候...",
"is_enabled": true,
"hit_count": 0,
"created_at": "2026-02-27T12:00:00Z",
"updated_at": "2026-02-27T12:00:00Z"
}
```
### 4.3 查询意图规则列表
**接口**: `GET /admin/intent-rules`
**查询参数**:
- `response_type` (可选): 按响应类型筛选
- `is_enabled` (可选): 按启用状态筛选
**请求示例**:
```bash
curl -X GET http://ai-service:8080/admin/intent-rules \
-H "X-API-Key: your_api_key" \
-H "X-Tenant-Id: szmp@ash@2026"
```
### 4.4 更新意图规则
**接口**: `PUT /admin/intent-rules/{rule_id}`
### 4.5 删除意图规则
**接口**: `DELETE /admin/intent-rules/{rule_id}`
---
## 5. 实际使用场景
### 5.1 常见问题快速回复
使用 fixed 类型直接返回预设回复,跳过 LLM 调用,响应速度快且成本低。
### 5.2 专业领域定向检索
使用 rag 类型将问题路由到特定知识库,提高检索精准度。
### 5.3 复杂流程引导
使用 flow 类型启动多轮对话流程,自动收集用户信息。
### 5.4 敏感问题转人工
使用 transfer 类型直接转接人工客服。
---
## 6. 缓存机制
- **缓存位置**: 内存缓存
- **缓存键**: tenant_id
- **TTL**: 60 秒
- **失效时机**: 创建/更新/删除规则时立即清除
---
## 7. 最佳实践
### 7.1 关键词设计
- 长度: 2-6 个字
- 使用完整词组
- 避免太短或太长
### 7.2 优先级设计
- 200+: 紧急/敏感问题
- 100-199: 重要业务流程
- 50-99: 常规咨询
- 0-49: 通用兜底规则
---
## 8. 故障排查
### 8.1 规则未生效
检查: 规则是否启用、缓存是否过期、优先级是否正确
### 8.2 误匹配问题
优化关键词和正则表达式,调整优先级
---
## 9. 总结
意图规则提供智能路由功能,支持四种响应类型,具有高性能缓存和租户隔离特性。
---
**文档版本**: v1.0
**生成时间**: 2026-02-27
**维护状态**: ✅ 活跃维护

View File

@ -0,0 +1,38 @@
---
## 5. 总结
意图规则Intent Rule是 AI 中台的智能路由系统,在 12 步生成流程的第 3 步执行。
### 5.1 核心流程
创建规则 → 设置关键词/正则 → 配置响应类型 → 启用规则 → 用户对话 → 意图匹配 → 路由处理
### 5.2 关键特性
- **智能路由**: 根据用户意图自动选择最佳处理方式
- **优先级控制**: 灵活的优先级机制避免冲突
- **四种响应**: 固定回复、RAG 检索、话术流程、转人工
- **高性能**: 60 秒缓存 + 优化的匹配算法
- **租户隔离**: 多租户数据完全独立
- **命中统计**: 自动记录规则使用情况
### 5.3 最佳实践
1. **关键词设计**: 2-6 个字,使用完整词组
2. **正则表达式**: 简单明了,避免过于复杂
3. **优先级分配**: 200+ 紧急、100-199 重要、50-99 常规、0-49 兜底
4. **响应类型**: 根据场景选择最合适的类型
5. **测试验证**: 先低优先级测试,再调整为正式优先级
6. **监控优化**: 定期检查命中率,优化关键词
---
**文档版本**: v1.0
**生成时间**: 2026-02-27
**维护状态**: ✅ 活跃维护
**相关文档**:
- [AI 中台对接文档](../AI中台对接文档.md)
- [Prompt 模板管理分析](./prompt-template-analysis.md)

322
docs/intent-rule-usage.md Normal file
View File

@ -0,0 +1,322 @@
# 意图规则使用指南
## 1. 概述
意图规则Intent Rule是 AI 中台的智能路由系统,用于识别用户意图并自动路由到最合适的处理方式。它在 12 步生成流程的第 3 步执行,优先级高于默认的 RAG 检索。
### 1.1 核心特性
- ✅ **关键词匹配**:支持多个关键词的模糊匹配(不区分大小写)
- ✅ **正则表达式匹配**:支持复杂的模式匹配
- ✅ **优先级排序**:按优先级从高到低匹配,命中第一个即停止
- ✅ **四种响应类型**固定回复、RAG 检索、话术流程、转人工
- ✅ **租户隔离**:不同租户的规则完全独立
- ✅ **缓存优化**60 秒 TTL 内存缓存,减少数据库查询
- ✅ **命中统计**:自动记录规则命中次数
### 1.2 在生成流程中的位置
```
用户消息 → Step 1: 输入扫描 → Step 2: 流程检查 → Step 3: 意图匹配 → Step 4-12: 后续处理
```
**意图匹配的作用**
- 如果匹配成功 → 根据 response_type 路由到对应处理方式
- 如果匹配失败 → 继续执行默认的 RAG 检索流程
---
## 2. 数据模型
### 2.1 IntentRule 实体
| 字段 | 类型 | 必填 | 说明 |
|-----|------|------|------|
| `id` | UUID | 自动生成 | 规则唯一标识 |
| `tenant_id` | string | ✅ | 租户 ID格式name@ash@year |
| `name` | string | ✅ | 规则名称(如"退货意图" |
| `keywords` | string[] | ❌ | 关键词列表(如 ["退货", "退款"] |
| `patterns` | string[] | ❌ | 正则表达式列表(如 ["退.*货", "如何退货"] |
| `priority` | int | ❌ | 优先级(默认 0数值越大优先级越高 |
| `response_type` | string | ✅ | 响应类型fixed/rag/flow/transfer |
| `target_kb_ids` | string[] | ❌ | 目标知识库 ID 列表rag 类型必填) |
| `flow_id` | UUID | ❌ | 话术流程 IDflow 类型必填) |
| `fixed_reply` | string | ❌ | 固定回复内容fixed 类型必填) |
| `transfer_message` | string | ❌ | 转人工提示语transfer 类型必填) |
| `is_enabled` | bool | ✅ | 是否启用(默认 true |
| `hit_count` | int | 自动 | 命中次数统计 |
| `created_at` | datetime | 自动 | 创建时间 |
| `updated_at` | datetime | 自动 | 更新时间 |
### 2.2 响应类型详解
#### 2.2.1 fixed - 固定回复
**用途**:对于明确的问题,直接返回预设的固定回复,跳过 LLM 生成。
**适用场景**
- 常见问候语("你好"、"在吗"
- 简单查询("营业时间"、"联系方式"
- 标准流程说明("如何下单"、"支付方式"
**必填字段**`fixed_reply`
**示例**
```json
{
"name": "营业时间查询",
"keywords": ["营业时间", "几点开门", "几点关门"],
"response_type": "fixed",
"fixed_reply": "我们的营业时间是周一至周日 9:00-21:00节假日正常营业。"
}
```
#### 2.2.2 rag - 定向知识库检索
**用途**:将用户问题路由到特定的知识库进行检索,而不是搜索所有知识库。
**适用场景**
- 产品咨询 → 路由到产品知识库
- 售后问题 → 路由到售后知识库
- 政策查询 → 路由到政策知识库
**必填字段**`target_kb_ids`(知识库 ID 列表)
**示例**
```json
{
"name": "产品咨询意图",
"keywords": ["产品", "功能", "参数", "配置"],
"patterns": [".*产品.*", "有什么功能"],
"response_type": "rag",
"target_kb_ids": ["kb_product_001", "kb_product_002"]
}
```
#### 2.2.3 flow - 启动话术流程
**用途**:触发预定义的多轮对话流程(话术脚本)。
**适用场景**
- 订单处理流程(收集地址、确认信息)
- 问题诊断流程(逐步排查问题)
- 信息采集流程(收集用户需求)
**必填字段**`flow_id`(话术流程 ID
**示例**
```json
{
"name": "退货流程意图",
"keywords": ["退货", "退款", "不想要了"],
"response_type": "flow",
"flow_id": "flow_return_process_001"
}
```
#### 2.2.4 transfer - 转人工
**用途**:直接转接到人工客服。
**适用场景**
- 投诉建议
- 复杂问题
- 明确要求人工服务
**必填字段**`transfer_message`(转人工提示语)
**示例**
```json
{
"name": "转人工意图",
"keywords": ["人工", "客服", "投诉"],
"patterns": ["转.*人工", "找.*客服"],
"response_type": "transfer",
"transfer_message": "正在为您转接人工客服,请稍候..."
}
```
---
## 3. 匹配算法
### 3.1 匹配流程
```
1. 加载启用的规则is_enabled=true按 priority DESC 排序
2. 遍历规则列表(从高优先级到低优先级)
3. 对每个规则:
a. 先尝试关键词匹配keywords
b. 如果关键词未匹配尝试正则表达式匹配patterns
4. 命中第一个规则后立即返回,不再继续匹配
5. 如果所有规则都未匹配,返回 None继续默认 RAG 流程)
```
### 3.2 关键词匹配规则
- **不区分大小写**:用户输入和关键词都转为小写后匹配
- **子串匹配**:只要用户消息包含关键词即可(如 "我想退货" 包含 "退货"
- **任意匹配**keywords 列表中任意一个关键词匹配即成功
**示例**
```python
keywords = ["退货", "退款", "不想要"]
user_message = "我想退货" # ✅ 匹配成功(包含"退货"
user_message = "能退款吗" # ✅ 匹配成功(包含"退款"
user_message = "发货了吗" # ❌ 匹配失败
```
### 3.3 正则表达式匹配规则
- **不区分大小写**:使用 `re.IGNORECASE` 标志
- **全文匹配**:使用 `re.search()`,匹配消息中的任意位置
- **任意匹配**patterns 列表中任意一个模式匹配即成功
- **错误处理**:如果正则表达式语法错误,记录警告并跳过该模式
**示例**
```python
patterns = ["退.*货", "如何.*退货", "^退货.*"]
user_message = "我想退货" # ✅ 匹配 "退.*货"
user_message = "如何申请退货" # ✅ 匹配 "如何.*退货"
user_message = "退货流程" # ✅ 匹配 "^退货.*"
```
### 3.4 优先级机制
**规则排序**
```sql
ORDER BY priority DESC, created_at DESC
```
**优先级策略**
- 高优先级规则优先匹配
- 相同优先级按创建时间倒序(新规则优先)
- 建议为重要规则设置更高的优先级(如 100、200
**示例场景**
```
规则 A: priority=100, keywords=["退货"] → 精确退货流程
规则 B: priority=50, keywords=["退", "货"] → 通用退货咨询
规则 C: priority=0, keywords=["产品"] → 产品咨询
用户输入 "我想退货"
1. 先匹配规则 Apriority=100→ 命中,返回
2. 不再匹配规则 B 和 C
```
---
## 4. API 使用
### 4.1 认证与租户隔离
所有接口必须携带以下 HTTP Headers
```http
X-API-Key: <your_api_key>
X-Tenant-Id: <tenant_id>
```
### 4.2 创建意图规则
**接口**`POST /admin/intent-rules`
**请求示例 1固定回复**
```bash
curl -X POST http://ai-service:8080/admin/intent-rules \
-H "Content-Type: application/json" \
-H "X-API-Key: your_api_key" \
-H "X-Tenant-Id: szmp@ash@2026" \
-d '{
"name": "营业时间查询",
"keywords": ["营业时间", "几点开门", "几点关门", "什么时候营业"],
"priority": 50,
"response_type": "fixed",
"fixed_reply": "我们的营业时间是周一至周日 9:00-21:00节假日正常营业。如有特殊情况会提前通知。"
}'
```
**请求示例 2定向知识库检索**
```bash
curl -X POST http://ai-service:8080/admin/intent-rules \
-H "Content-Type: application/json" \
-H "X-API-Key: your_api_key" \
-H "X-Tenant-Id: szmp@ash@2026" \
-d '{
"name": "产品咨询意图",
"keywords": ["产品", "功能", "参数", "配置", "型号"],
"patterns": [".*产品.*", "有什么功能", "支持.*吗"],
"priority": 80,
"response_type": "rag",
"target_kb_ids": ["kb_product_001", "kb_product_faq_002"]
}'
```
**请求示例 3启动话术流程**
```bash
curl -X POST http://ai-service:8080/admin/intent-rules \
-H "Content-Type: application/json" \
-H "X-API-Key: your_api_key" \
-H "X-Tenant-Id: szmp@ash@2026" \
-d '{
"name": "退货流程意图",
"keywords": ["退货", "退款", "不想要了", "申请退货"],
"patterns": ["退.*货", "如何.*退", "想.*退"],
"priority": 100,
"response_type": "flow",
"flow_id": "550e8400-e29b-41d4-a716-446655440000"
}'
```
**请求示例 4转人工**
```bash
curl -X POST http://ai-service:8080/admin/intent-rules \
-H "Content-Type: application/json" \
-H "X-API-Key: your_api_key" \
-H "X-Tenant-Id: szmp@ash@2026" \
-d '{
"name": "转人工意图",
"keywords": ["人工", "客服", "投诉", "找人"],
"patterns": ["转.*人工", "找.*客服", "我要.*投诉"],
"priority": 200,
"response_type": "transfer",
"transfer_message": "正在为您转接人工客服,请稍候..."
}'
```
---
## 5. 总结
意图规则Intent Rule是 AI 中台的智能路由系统,在 12 步生成流程的第 3 步执行。
### 5.1 核心流程
创建规则 → 设置关键词/正则 → 配置响应类型 → 启用规则 → 用户对话 → 意图匹配 → 路由处理
### 5.2 关键特性
- **智能路由**: 根据用户意图自动选择最佳处理方式
- **优先级控制**: 灵活的优先级机制避免冲突
- **四种响应**: 固定回复、RAG 检索、话术流程、转人工
- **高性能**: 60 秒缓存 + 优化的匹配算法
- **租户隔离**: 多租户数据完全独立
- **命中统计**: 自动记录规则使用情况
### 5.3 最佳实践
1. **关键词设计**: 2-6 个字,使用完整词组
2. **正则表达式**: 简单明了,避免过于复杂
3. **优先级分配**: 200+ 紧急、100-199 重要、50-99 常规、0-49 兜底
4. **响应类型**: 根据场景选择最合适的类型
5. **测试验证**: 先低优先级测试,再调整为正式优先级
6. **监控优化**: 定期检查命中率,优化关键词
---
**文档版本**: v1.0
**生成时间**: 2026-02-27
**维护状态**: ✅ 活跃维护
**相关文档**:
- [AI 中台对接文档](../AI中台对接文档.md)
- [Prompt 模板管理分析](./prompt-template-analysis.md)

View File

@ -0,0 +1,626 @@
# Prompt 模板管理生效机制与占位符使用分析
## 1. 核心架构
Prompt 模板管理系统由以下核心组件构成:
### 1.1 数据模型
**PromptTemplate模板实体**
- `id`: UUID模板唯一标识
- `tenant_id`: 租户 ID实现多租户隔离
- `name`: 模板名称
- `scene`: 场景标识(如 "default"、"customer_service"
- `description`: 模板描述
- `is_default`: 是否为默认模板
**PromptTemplateVersion模板版本**
- `template_id`: 关联的模板 ID
- `version`: 版本号(整数,自增)
- `status`: 版本状态draft/published/archived
- `system_instruction`: 系统指令内容(包含占位符)
- `variables`: 自定义变量定义列表
### 1.2 核心服务
**PromptTemplateService** - 模板管理服务
- 位置:`ai-service/app/services/prompt/template_service.py`
- 功能:模板 CRUD、版本管理、发布/回滚、缓存
**VariableResolver** - 变量解析器
- 位置:`ai-service/app/services/prompt/variable_resolver.py`
- 功能:占位符替换、变量验证
**OrchestratorService** - 编排服务
- 位置:`ai-service/app/services/orchestrator.py`
- 功能:在对话生成流程中加载和应用模板
---
## 2. 生效机制详解
### 2.1 模板加载流程12 步 Pipeline 中的第 7 步)
```
用户请求 → Orchestrator._build_system_prompt() → 加载模板 → 解析变量 → 注入行为规则 → 传递给 LLM
```
**详细步骤**
1. **触发时机**:每次对话请求到达时,在 Step 7PromptBuilder执行
2. **加载逻辑**
```python
# orchestrator.py:632-638
template_service = PromptTemplateService(session)
template_version = await template_service.get_published_template(
tenant_id=ctx.tenant_id,
scene="default", # 场景可配置
)
```
3. **缓存机制**
- 首次加载:从数据库查询 `status=published` 的版本
- 后续请求从内存缓存读取TTL 300 秒)
- 缓存失效:发布/回滚操作会自动清除缓存
4. **降级策略**
- 如果没有已发布的模板 → 使用硬编码的 `SYSTEM_PROMPT`
- 如果数据库查询失败 → 使用硬编码的 `SYSTEM_PROMPT`
### 2.2 版本管理机制
**版本状态流转**
```
draft草稿→ published已发布→ archived已归档
```
**发布流程**
```python
# template_service.py:248-287
async def publish_version(tenant_id, template_id, version):
1. 查询模板是否存在(租户隔离)
2. 将当前 published 版本改为 archived
3. 将目标版本改为 published
4. 清除缓存并预热新版本
5. 记录日志
```
**回滚流程**
```python
# template_service.py:289-298
async def rollback_version(tenant_id, template_id, version):
# 实际上就是调用 publish_version
# 将历史版本重新标记为 published
```
**热更新保证**
- 发布/回滚后立即清除缓存:`self._cache.invalidate(tenant_id, scene)`
- 下次请求会从数据库加载最新版本
- 无需重启服务
### 2.3 租户隔离机制
所有操作都强制进行租户隔离:
```python
# 查询时必须带 tenant_id
stmt = select(PromptTemplate).where(
PromptTemplate.tenant_id == tenant_id,
PromptTemplate.scene == scene,
)
```
不同租户的模板完全独立,互不影响。
---
## 3. 占位符使用详解
### 3.1 占位符语法
**格式**`{{variable_name}}`
**示例**
```
你是 {{persona_name}},当前时间是 {{current_time}}。
你正在为 {{tenant_name}} 提供 {{channel_type}} 渠道的客服服务。
```
### 3.2 内置变量
**VariableResolver** 提供以下内置变量:
| 变量名 | 类型 | 默认值 | 说明 |
|--------|------|--------|------|
| `persona_name` | string | "小N" | AI 人设名称 |
| `current_time` | function | 动态生成 | 当前时间格式YYYY-MM-DD HH:MM |
| `channel_type` | string | "default" | 渠道类型wechat/douyin/jd |
| `tenant_name` | string | "平台" | 租户名称 |
| `session_id` | string | "" | 会话 ID |
**动态变量示例**
```python
# variable_resolver.py:15-21
BUILTIN_VARIABLES = {
"persona_name": "小N",
"current_time": lambda: datetime.now().strftime("%Y-%m-%d %H:%M"),
"channel_type": "default",
"tenant_name": "平台",
"session_id": "",
}
```
### 3.3 自定义变量
**定义方式**:在模板的 `variables` 字段中定义
```json
{
"variables": [
{
"name": "company_name",
"default": "XX科技有限公司",
"description": "公司名称"
},
{
"name": "service_hours",
"default": "9:00-18:00",
"description": "服务时间"
}
]
}
```
**使用示例**
```
欢迎咨询 {{company_name}},我们的服务时间是 {{service_hours}}。
```
### 3.4 变量解析流程
```python
# variable_resolver.py:45-75
def resolve(template, variables, extra_context):
1. 构建上下文:内置变量 + 自定义变量 + 额外上下文
2. 正则匹配:找到所有 {{variable}} 占位符
3. 替换逻辑:
- 如果变量存在 → 替换为值(函数则调用)
- 如果变量不存在 → 保留原占位符 + 记录警告
4. 返回解析后的字符串
```
**正则表达式**
```python
VARIABLE_PATTERN = re.compile(r"\{\{(\w+)\}\}")
```
### 3.5 变量优先级
变量解析的优先级(从高到低):
1. **extra_context**(运行时传入的额外上下文)
2. **自定义变量**(模板定义的 variables
3. **实例化时的上下文**VariableResolver 构造函数传入)
4. **内置变量**BUILTIN_VARIABLES
```python
# variable_resolver.py:77-101
def _build_context(variables, extra_context):
context = {}
# 1. 加载内置变量
for key, value in BUILTIN_VARIABLES.items():
if key in self._context:
context[key] = self._context[key] # 实例化时的上下文
else:
context[key] = value # 内置默认值
# 2. 加载自定义变量
if variables:
for var in variables:
context[var["name"]] = var.get("default", "")
# 3. 加载额外上下文(优先级最高)
if extra_context:
context.update(extra_context)
return context
```
---
## 4. 实际使用示例
### 4.1 创建模板(通过 API
```bash
POST /admin/prompt-templates
X-Tenant-Id: szmp@ash@2026
X-API-Key: your_api_key
{
"name": "客服模板 v1",
"scene": "default",
"description": "标准客服对话模板",
"system_instruction": "你是 {{persona_name}},一位专业的客服助手。\n当前时间{{current_time}}\n渠道{{channel_type}}\n\n你需要遵循以下原则\n- 礼貌、专业、耐心\n- 优先使用知识库内容回答\n- 无法回答时建议转人工\n\n公司信息{{company_name}}\n服务时间{{service_hours}}",
"variables": [
{
"name": "company_name",
"default": "XX科技",
"description": "公司名称"
},
{
"name": "service_hours",
"default": "9:00-21:00",
"description": "服务时间"
}
],
"is_default": true
}
```
**响应**
```json
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"name": "客服模板 v1",
"scene": "default",
"description": "标准客服对话模板",
"is_default": true,
"created_at": "2026-02-27T12:00:00Z",
"updated_at": "2026-02-27T12:00:00Z"
}
```
此时模板已创建,但版本状态为 `draft`,尚未生效。
### 4.2 发布模板
```bash
POST /admin/prompt-templates/{tpl_id}/publish
X-Tenant-Id: szmp@ash@2026
X-API-Key: your_api_key
{
"version": 1
}
```
**响应**
```json
{
"success": true,
"message": "Version 1 published successfully"
}
```
**生效时间**:立即生效(缓存已清除)
### 4.3 模板生效后的实际效果
**用户请求**
```json
POST /ai/chat
X-Tenant-Id: szmp@ash@2026
X-API-Key: your_api_key
{
"sessionId": "kf_001_wx123_1708765432000",
"currentMessage": "你好",
"channelType": "wechat"
}
```
**Orchestrator 内部处理**
1. **加载模板**Step 7
```python
# 从缓存或数据库加载已发布的模板
template_version = await template_service.get_published_template(
tenant_id="szmp@ash@2026",
scene="default"
)
# 返回system_instruction + variables
```
2. **解析变量**
```python
resolver = VariableResolver(
channel_type="wechat",
tenant_name="深圳某项目",
session_id="kf_001_wx123_1708765432000"
)
system_prompt = resolver.resolve(
template=template_version.system_instruction,
variables=template_version.variables,
extra_context={"persona_name": "AI助手"}
)
```
3. **解析结果**
```
你是 AI助手一位专业的客服助手。
当前时间2026-02-27 20:18
渠道wechat
你需要遵循以下原则:
- 礼貌、专业、耐心
- 优先使用知识库内容回答
- 无法回答时建议转人工
公司信息XX科技
服务时间9:00-21:00
```
4. **注入行为规则**(如果有):
```python
# 从数据库加载行为规则
rules = await behavior_service.get_enabled_rules(tenant_id)
# 拼接到 system_prompt
behavior_text = "\n".join([f"- {rule}" for rule in rules])
system_prompt += f"\n\n行为约束\n{behavior_text}"
```
5. **传递给 LLM**
```python
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": "你好"}
]
response = await llm_client.generate(messages)
```
### 4.4 更新模板
```bash
PUT /admin/prompt-templates/{tpl_id}
X-Tenant-Id: szmp@ash@2026
X-API-Key: your_api_key
{
"system_instruction": "你是 {{persona_name}},一位专业且友好的客服助手。\n...",
"variables": [
{
"name": "company_name",
"default": "XX科技有限公司", # 修改了默认值
"description": "公司全称"
}
]
}
```
**效果**
- 创建新版本version=2status=draft
- 旧版本version=1仍然是 published 状态
- **模板不会立即生效**,需要发布 version 2
### 4.5 回滚模板
```bash
POST /admin/prompt-templates/{tpl_id}/rollback
X-Tenant-Id: szmp@ash@2026
X-API-Key: your_api_key
{
"version": 1
}
```
**效果**
- version 2 变为 archived
- version 1 重新变为 published
- 缓存清除,立即生效
---
## 5. 高级特性
### 5.1 变量验证
```python
# variable_resolver.py:115-144
def validate_variables(template, defined_variables):
"""
验证模板中的所有变量是否已定义
返回:
{
"valid": True/False,
"missing": ["未定义的变量列表"],
"used_variables": ["模板中使用的所有变量"]
}
"""
```
**使用场景**
- 前端编辑模板时实时验证
- 发布前检查是否有未定义的变量
### 5.2 变量提取
```python
# variable_resolver.py:103-113
def extract_variables(template):
"""
从模板中提取所有变量名
返回:["persona_name", "current_time", "company_name"]
"""
```
**使用场景**
- 前端显示模板使用的变量列表
- 自动生成变量定义表单
### 5.3 缓存策略
**TemplateCache** 实现:
```python
# template_service.py:32-72
class TemplateCache:
def __init__(self, ttl_seconds=300):
self._cache = {} # key: (tenant_id, scene)
self._ttl = 300 # 5 分钟
def get(self, tenant_id, scene):
# 检查是否过期
if time.time() - cached_at < self._ttl:
return version
else:
del self._cache[key] # 自动清理过期缓存
def invalidate(self, tenant_id, scene=None):
# 发布/回滚时清除缓存
```
**缓存失效时机**
- 发布新版本
- 回滚到旧版本
- 更新模板(创建新版本时不清除,因为新版本是 draft
- TTL 过期5 分钟)
---
## 6. 最佳实践
### 6.1 模板设计建议
1. **使用语义化的变量名**
```
✅ {{company_name}}、{{service_hours}}
❌ {{var1}}、{{x}}
```
2. **为所有自定义变量提供默认值**
```json
{
"name": "company_name",
"default": "XX公司", // 必须提供
"description": "公司名称"
}
```
3. **避免在模板中硬编码业务数据**
```
❌ 你是小明,为 XX 公司提供服务
✅ 你是 {{persona_name}},为 {{company_name}} 提供服务
```
4. **合理使用内置变量**
- `current_time`:适用于时间敏感的场景
- `channel_type`:适用于多渠道差异化话术
- `session_id`:适用于调试和追踪
### 6.2 版本管理建议
1. **小步迭代**
- 每次修改创建新版本
- 在测试环境验证后再发布
- 保留历史版本以便回滚
2. **版本命名规范**(在 description 中):
```
v1.0 - 初始版本
v1.1 - 优化语气,增加公司信息变量
v1.2 - 修复变量引用错误
```
3. **灰度发布**(未来扩展):
- 可以为不同租户发布不同版本
- 可以按百分比逐步切换版本
### 6.3 性能优化建议
1. **利用缓存**
- 模板内容很少变化,缓存命中率高
- 5 分钟 TTL 平衡了实时性和性能
2. **避免频繁发布**
- 发布操作会清除缓存
- 建议批量修改后统一发布
3. **监控缓存命中率**
```python
logger.debug(f"Cache hit for template: tenant={tenant_id}, scene={scene}")
```
---
## 7. 故障排查
### 7.1 模板未生效
**症状**:发布后仍使用旧模板或硬编码 SYSTEM_PROMPT
**排查步骤**
1. 检查版本状态:`GET /admin/prompt-templates/{tpl_id}`
- 确认目标版本的 status 是否为 `published`
2. 检查缓存:等待 5 分钟或重启服务
3. 检查日志:
```
[AC-AISVC-51] Cache hit for template: tenant=xxx, scene=default
[AC-AISVC-51] Loaded published template from DB: tenant=xxx, scene=default
[AC-AISVC-51] No published template found, using fallback
```
### 7.2 变量未替换
**症状**:生成的 system_prompt 中仍有 `{{variable}}` 占位符
**排查步骤**
1. 检查变量定义:确认变量在 `variables` 中定义
2. 检查变量名拼写:必须完全匹配(区分大小写)
3. 检查日志:
```
WARNING: Unknown variable in template: xxx
```
### 7.3 租户隔离问题
**症状**:租户 A 看到了租户 B 的模板
**排查步骤**
1. 检查请求头:确认 `X-Tenant-Id` 正确
2. 检查数据库:
```sql
SELECT * FROM prompt_templates WHERE tenant_id = 'xxx';
```
3. 检查代码:所有查询必须带 `tenant_id` 过滤
---
## 8. 总结
### 8.1 核心流程
```
创建模板 → 编辑内容 → 发布版本 → 缓存加载 → 变量解析 → 传递给 LLM
↓ ↓ ↓ ↓ ↓ ↓
draft draft published 内存缓存 占位符替换 生成回复
```
### 8.2 关键特性
- ✅ **版本管理**:支持多版本、发布/回滚、历史追溯
- ✅ **热更新**:发布后立即生效,无需重启
- ✅ **租户隔离**:多租户数据完全隔离
- ✅ **缓存优化**5 分钟 TTL减少数据库查询
- ✅ **变量系统**:内置变量 + 自定义变量,支持动态值
- ✅ **降级策略**:模板不可用时自动回退到硬编码
### 8.3 扩展方向
- 🔄 **场景路由**:根据 intent 或 channel 自动选择不同 scene
- 🔄 **A/B 测试**:同一场景支持多个模板并行测试
- 🔄 **模板继承**:子模板继承父模板并覆盖部分内容
- 🔄 **变量类型**:支持 string/number/boolean/array 等类型
- 🔄 **条件渲染**:支持 `{{#if}}...{{/if}}` 等逻辑控制
---
**文档生成时间**2026-02-27 20:18
**相关代码版本**v0.6.0
**维护状态**:✅ 活跃维护

663
docs/script-flow-usage.md Normal file
View File

@ -0,0 +1,663 @@
# 话术流程管理使用指南
## 1. 概述
话术流程Script Flow是 AI 中台的多轮对话引导系统,用于按照预定义的步骤引导用户完成信息收集、问题诊断或业务流程。它通过状态机机制实现结构化的对话流程控制。
### 1.1 核心特性
- ✅ **多步骤编排**:支持创建包含多个步骤的对话流程
- ✅ **拖拽排序**:可视化拖拽调整步骤顺序
- ✅ **等待输入控制**:每个步骤可配置是否等待用户回复
- ✅ **超时处理**:支持超时重复、跳过或转人工
- ✅ **变量占位符**:话术内容支持动态变量替换
- ✅ **流程预览**:可视化预览流程执行效果
- ✅ **意图触发**:通过意图规则自动触发流程
- ✅ **状态管理**:自动跟踪流程执行进度
### 1.2 典型应用场景
| 场景 | 说明 | 示例 |
|------|------|------|
| **订单处理** | 收集订单信息并确认 | 收集收货地址 → 确认商品 → 选择支付方式 → 确认下单 |
| **退货流程** | 引导用户完成退货申请 | 确认订单号 → 选择退货原因 → 上传凭证 → 提交申请 |
| **问题诊断** | 逐步排查用户问题 | 确认设备型号 → 检查网络连接 → 重启设备 → 测试结果 |
| **信息采集** | 收集用户需求或反馈 | 询问需求类型 → 收集详细描述 → 确认联系方式 → 提交工单 |
| **预约服务** | 引导用户完成预约 | 选择服务类型 → 选择时间 → 填写联系方式 → 确认预约 |
---
## 2. 功能介绍
### 2.1 流程列表页面
**访问路径**AI 中台管理后台 → 智能路由 → 话术流程管理
**页面功能**
- 查看所有话术流程列表
- 显示流程名称、描述、步骤数、关联规则数、状态、更新时间
- 快速启用/禁用流程
- 创建、编辑、预览、删除流程
**列表字段说明**
| 字段 | 说明 |
|------|------|
| 流程名称 | 流程的唯一标识名称 |
| 描述 | 流程的用途说明(可选) |
| 步骤数 | 流程包含的步骤总数 |
| 关联规则 | 触发该流程的意图规则数量 |
| 状态 | 启用/禁用开关 |
| 更新时间 | 最后修改时间 |
### 2.2 流程编辑器
**核心组件**
1. **基本信息区**:配置流程名称、描述、启用状态
2. **步骤编辑区**:可视化编辑流程步骤
3. **拖拽排序**:通过拖拽图标调整步骤顺序
4. **步骤配置**:为每个步骤配置详细参数
### 2.3 流程预览器
**功能**
- 时间轴可视化展示流程步骤
- 模拟流程执行过程
- 高亮显示当前步骤
- 查看步骤配置详情
- 支持上一步/下一步/重置操作
---
## 3. 使用步骤
### 3.1 创建话术流程
#### 步骤 1进入创建页面
1. 点击页面右上角的 **"新建流程"** 按钮
2. 弹出流程编辑对话框
#### 步骤 2填写基本信息
```
流程名称:退货流程引导 (必填,建议简洁明确)
描述:引导用户完成退货申请 (可选,说明流程用途)
启用状态:✓ 启用 (默认启用)
```
#### 步骤 3添加流程步骤
点击 **"添加步骤"** 按钮,配置每个步骤:
**步骤 1 配置示例**
```
话术内容:您好,我来帮您处理退货申请。请提供您的订单号。
等待输入:✓ 开启
超时时间60 秒
超时动作:重复当前步骤
```
**步骤 2 配置示例**
```
话术内容:收到订单号 {{order_id}},请问您的退货原因是什么?
1. 商品质量问题
2. 不符合预期
3. 其他原因
等待输入:✓ 开启
超时时间120 秒
超时动作:转人工
```
**步骤 3 配置示例**
```
话术内容:好的,已记录您的退货原因。请上传商品照片或相关凭证。
等待输入:✓ 开启
超时时间180 秒
超时动作:跳过进入下一步
```
**步骤 4 配置示例**
```
话术内容:感谢您的配合!退货申请已提交,工单号为 {{ticket_id}}。我们将在 1-3 个工作日内处理,请保持手机畅通。
等待输入:✗ 关闭
```
#### 步骤 4调整步骤顺序
- 鼠标悬停在步骤左侧的 **拖拽图标**
- 按住鼠标左键拖动步骤到目标位置
- 释放鼠标完成排序
#### 步骤 5保存流程
- 点击 **"创建"** 按钮保存流程
- 系统自动分配流程 ID
- 返回流程列表页面
### 3.2 编辑话术流程
1. 在流程列表中找到目标流程
2. 点击 **"编辑"** 按钮
3. 修改流程信息或步骤配置
4. 点击 **"保存"** 按钮
**注意事项**
- 修改后立即生效,影响正在执行的流程
- 建议在低峰期修改重要流程
- 修改前可先禁用流程,测试无误后再启用
### 3.3 预览话术流程
1. 在流程列表中找到目标流程
2. 点击 **"预览"** 按钮
3. 在右侧抽屉中查看流程时间轴
4. 使用 **上一步/下一步** 按钮模拟流程执行
5. 查看每个步骤的配置详情
**预览功能**
- 时间轴可视化展示所有步骤
- 当前步骤高亮显示(蓝色边框)
- 已完成步骤显示为绿色
- 未执行步骤显示为灰色
- 显示步骤的等待输入、超时配置
### 3.4 删除话术流程
1. 在流程列表中找到目标流程
2. 点击 **"删除"** 按钮
3. 确认删除操作
**警告**
- 删除操作不可恢复
- 如果有意图规则关联该流程,删除后规则将失效
- 建议先禁用流程观察一段时间,确认无影响后再删除
### 3.5 启用/禁用流程
**方式 1列表页快速切换**
- 直接点击流程列表中的 **状态开关**
- 立即生效
**方式 2编辑页面修改**
- 进入流程编辑页面
- 修改 **启用状态** 开关
- 保存后生效
**禁用效果**
- 禁用后,意图规则无法触发该流程
- 正在执行的流程实例不受影响,继续执行完成
- 流程数据保留,可随时重新启用
---
## 4. 步骤配置详解
### 4.1 话术内容
**功能**:定义该步骤向用户展示的话术文本。
**支持变量占位符**
```
语法:{{variable_name}}
示例:
- 您的订单号是 {{order_id}}
- 尊敬的 {{customer_name}},您好!
- 当前时间:{{current_time}}
- 渠道:{{channel_type}}
```
**变量来源**
- 会话上下文变量
- 用户输入提取的信息
- 系统内置变量(时间、渠道等)
**最佳实践**
- 话术简洁明了,避免过长
- 使用礼貌用语,保持专业
- 明确告知用户需要做什么
- 提供选项时使用编号列表
### 4.2 等待用户输入
**功能**:控制该步骤是否需要等待用户回复。
**开启(✓)**
- 系统发送话术后,等待用户回复
- 收到用户回复后,根据条件推进到下一步
- 如果超时未回复,执行超时动作
**关闭(✗)**
- 系统发送话术后,立即推进到下一步
- 适用于纯信息告知的步骤(如确认信息、结束语)
**使用场景**
- ✓ 需要收集信息:订单号、退货原因、联系方式
- ✓ 需要用户确认:是/否、选择选项
- ✗ 纯信息展示:流程说明、结果通知、感谢语
### 4.3 超时时间
**功能**:设置等待用户回复的最长时间(单位:秒)。
**取值范围**5 - 300 秒
**推荐配置**
- **简单问题**(如是/否30-60 秒
- **需要查找信息**如订单号60-120 秒
- **需要操作**如上传图片120-180 秒
- **复杂问题**如详细描述180-300 秒
**注意事项**
- 超时时间过短:用户体验差,容易误触发超时
- 超时时间过长:流程卡住时间长,影响效率
- 根据实际业务场景调整
### 4.4 超时动作
**功能**:定义超时后的处理策略。
#### 选项 1重复当前步骤
**行为**:重新发送当前步骤的话术,再次等待用户回复。
**适用场景**
- 用户可能暂时离开,需要提醒
- 问题简单,用户可能忘记回复
- 重要信息收集,不能跳过
**示例话术**
```
首次:请提供您的订单号。
重复:您还在吗?请提供您的订单号,以便我帮您处理。
```
#### 选项 2跳过进入下一步
**行为**:跳过当前步骤,直接进入下一步。
**适用场景**
- 可选信息收集(如备注、补充说明)
- 有默认值或兜底方案
- 不影响流程继续执行
**注意事项**
- 确保下一步不依赖当前步骤的输入
- 在后续步骤中处理缺失信息的情况
#### 选项 3转人工
**行为**:结束流程,将会话转接到人工客服。
**适用场景**
- 关键信息收集失败
- 用户长时间无响应
- 流程无法继续执行
**效果**
- 设置 `shouldTransfer=true`
- 返回转人工提示语
- 流程实例标记为"已完成"
---
## 5. 与意图规则关联
### 5.1 创建触发规则
话术流程需要通过 **意图规则** 触发。
**步骤**
1. 进入 **意图规则管理** 页面
2. 点击 **"新建规则"** 按钮
3. 配置规则信息:
```
规则名称:退货意图
关键词:退货、退款、不想要了、申请退货
正则表达式:退.*货、如何.*退、想.*退
优先级100
响应类型:启动话术流程
关联流程:选择 "退货流程引导"
启用状态:✓ 启用
```
4. 保存规则
### 5.2 触发流程
**触发条件**
- 用户消息匹配意图规则的关键词或正则表达式
- 规则的响应类型为 `flow`
- 规则和流程均为启用状态
**触发流程**
```
用户输入:我想退货
意图识别:命中 "退货意图" 规则
启动流程:创建 "退货流程引导" 实例
执行步骤 1发送第一步话术
等待用户输入:收集订单号
执行步骤 2发送第二步话术
...
流程完成:恢复正常对话
```
### 5.3 流程执行优先级
**优先级规则**
1. **进行中的流程** > 意图识别 > RAG 检索
2. 如果会话存在进行中的流程实例,优先处理流程逻辑
3. 流程完成后,恢复正常的意图识别和 RAG 流程
**示例**
```
会话状态:退货流程进行中(步骤 2
用户输入:营业时间是什么?
系统行为:忽略 "营业时间" 意图,继续处理流程步骤 2
```
---
## 6. 最佳实践
### 6.1 流程设计原则
#### 1. 步骤数量适中
- **推荐**3-7 个步骤
- **避免**:步骤过多(>10 步)导致用户疲劳
- **避免**:步骤过少(<3 无法体现流程价值
#### 2. 话术简洁明确
- 每个步骤的话术控制在 50-100 字
- 明确告知用户需要做什么
- 提供示例或选项帮助用户理解
#### 3. 合理设置超时
- 根据任务复杂度设置超时时间
- 重要步骤使用 "重复" 或 "转人工"
- 可选步骤使用 "跳过"
#### 4. 提供退出机制
- 在话术中告知用户如何退出流程
- 示例:"如需退出流程,请回复 '退出' 或 '转人工'"
### 6.2 话术编写技巧
#### 1. 使用礼貌用语
```
✓ 好的:您好,我来帮您处理退货申请。
✗ 不好:提供订单号。
```
#### 2. 明确指令
```
✓ 好的请提供您的订单号格式20240227001
✗ 不好:请提供信息。
```
#### 3. 提供选项
```
✓ 好的:请选择退货原因:
1. 商品质量问题
2. 不符合预期
3. 其他原因
✗ 不好:为什么要退货?
```
#### 4. 确认信息
```
✓ 好的:您的订单号是 {{order_id}},退货原因是 {{reason}},确认无误吗?(是/否)
✗ 不好:信息已记录。
```
### 6.3 测试与优化
#### 1. 测试流程
- 创建流程后,先设置为 "禁用" 状态
- 使用测试账号模拟完整流程
- 测试各种用户输入场景(正常、异常、超时)
- 确认无误后再启用
#### 2. 监控数据
- 定期查看流程执行统计
- 关注步骤完成率、超时率、转人工率
- 识别用户卡住的步骤
#### 3. 持续优化
- 根据数据调整话术和超时配置
- 优化步骤顺序和数量
- 收集用户反馈改进流程
### 6.4 常见问题处理
#### 问题 1用户中途退出流程
**解决方案**
- 在话术中明确告知退出方式
- 设置关键词触发退出(如 "退出"、"转人工"
- 超时后自动转人工
#### 问题 2用户输入不符合预期
**解决方案**
- 在话术中提供示例和格式说明
- 使用选项列表限制用户输入
- 设置输入验证和错误提示
#### 问题 3流程执行时间过长
**解决方案**
- 减少步骤数量,合并相似步骤
- 缩短超时时间
- 将复杂流程拆分为多个子流程
#### 问题 4流程与意图规则冲突
**解决方案**
- 提高流程触发规则的优先级
- 在流程执行期间,系统自动优先处理流程逻辑
- 避免创建过于宽泛的意图规则
---
## 7. API 参考
### 7.1 列表查询
**接口**`GET /admin/script-flows`
**请求参数**
```json
{
"is_enabled": true // 可选,筛选启用/禁用的流程
}
```
**响应示例**
```json
{
"data": [
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"name": "退货流程引导",
"description": "引导用户完成退货申请",
"step_count": 4,
"is_enabled": true,
"linked_rule_count": 2,
"created_at": "2026-02-27T10:00:00",
"updated_at": "2026-02-27T15:30:00"
}
]
}
```
### 7.2 创建流程
**接口**`POST /admin/script-flows`
**请求示例**
```json
{
"name": "退货流程引导",
"description": "引导用户完成退货申请",
"is_enabled": true,
"steps": [
{
"step_id": "step_001",
"order": 1,
"content": "您好,我来帮您处理退货申请。请提供您的订单号。",
"wait_for_input": true,
"timeout_seconds": 60,
"timeout_action": "repeat",
"next_conditions": []
},
{
"step_id": "step_002",
"order": 2,
"content": "收到订单号 {{order_id}},请问您的退货原因是什么?",
"wait_for_input": true,
"timeout_seconds": 120,
"timeout_action": "transfer",
"next_conditions": []
}
]
}
```
**响应示例**
```json
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"name": "退货流程引导",
"description": "引导用户完成退货申请",
"step_count": 2,
"is_enabled": true,
"created_at": "2026-02-27T16:00:00",
"updated_at": "2026-02-27T16:00:00"
}
```
### 7.3 查询详情
**接口**`GET /admin/script-flows/{flow_id}`
**响应示例**
```json
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"name": "退货流程引导",
"description": "引导用户完成退货申请",
"is_enabled": true,
"steps": [
{
"step_id": "step_001",
"order": 1,
"content": "您好,我来帮您处理退货申请。请提供您的订单号。",
"wait_for_input": true,
"timeout_seconds": 60,
"timeout_action": "repeat",
"next_conditions": []
}
],
"created_at": "2026-02-27T16:00:00",
"updated_at": "2026-02-27T16:00:00"
}
```
### 7.4 更新流程
**接口**`PUT /admin/script-flows/{flow_id}`
**请求示例**
```json
{
"name": "退货流程引导(优化版)",
"is_enabled": true,
"steps": [
// 更新后的步骤列表
]
}
```
### 7.5 删除流程
**接口**`DELETE /admin/script-flows/{flow_id}`
**响应**204 No Content
---
## 8. 数据模型
### 8.1 ScriptFlow流程实体
| 字段 | 类型 | 必填 | 说明 |
|------|------|------|------|
| `id` | UUID | 自动生成 | 流程唯一标识 |
| `tenant_id` | string | ✅ | 租户 ID |
| `name` | string | ✅ | 流程名称 |
| `description` | string | ❌ | 流程描述 |
| `steps` | FlowStep[] | ✅ | 步骤列表 |
| `is_enabled` | boolean | ✅ | 是否启用(默认 true |
| `created_at` | datetime | 自动 | 创建时间 |
| `updated_at` | datetime | 自动 | 更新时间 |
### 8.2 FlowStep步骤实体
| 字段 | 类型 | 必填 | 说明 |
|------|------|------|------|
| `step_id` | string | ✅ | 步骤唯一标识 |
| `order` | int | ✅ | 步骤顺序(从 1 开始) |
| `content` | string | ✅ | 话术内容 |
| `wait_for_input` | boolean | ✅ | 是否等待用户输入 |
| `timeout_seconds` | int | ❌ | 超时时间5-300 秒) |
| `timeout_action` | string | ❌ | 超时动作repeat/skip/transfer |
| `next_conditions` | NextCondition[] | ❌ | 下一步条件(预留) |
### 8.3 超时动作枚举
| 值 | 说明 |
|------|------|
| `repeat` | 重复当前步骤 |
| `skip` | 跳过进入下一步 |
| `transfer` | 转人工 |
---
## 9. 总结
话术流程管理是 AI 中台的核心功能之一,通过可视化的流程编排,实现结构化的多轮对话引导。
### 9.1 核心价值
- **提升效率**:自动化处理标准流程,减少人工介入
- **保证质量**:标准化话术,确保服务一致性
- **优化体验**:引导式对话,降低用户操作难度
- **数据收集**:结构化收集信息,便于后续处理
### 9.2 使用流程
创建流程 → 配置步骤 → 关联意图规则 → 测试验证 → 启用上线 → 监控优化
### 9.3 注意事项
1. 流程设计要简洁,避免步骤过多
2. 话术要清晰明确,提供必要的示例
3. 合理设置超时时间和超时动作
4. 测试充分后再上线
5. 定期监控数据并优化流程
---
**文档版本**v1.0
**生成时间**2026-02-27
**维护状态**:✅ 活跃维护
**相关文档**
- [意图规则使用指南](./intent-rule-usage.md)
- [AI 中台对接文档](../AI中台对接文档.md)
- [Prompt 模板管理分析](./prompt-template-analysis.md)

View File

@ -34,6 +34,8 @@ context:
module: "{module_name}" # 对应 spec/<module>/ 目录名
feature: "{feature_id}" # 对应 requirements.md 中的 feature_id
status: [🔄进行中 | ⏳待开始 | ✅已完成]
version: "0.3.0" # 当前迭代版本号
active_ac_range: "AC-MOD-21~30" # 当前活跃的 AC 编号范围
spec_references:
# 必须引用模块 Spec 目录下的 SSOT 文档
@ -43,6 +45,9 @@ spec_references:
design: "spec/{module}/design.md"
tasks: "spec/{module}/tasks.md"
# 版本化迭代信息(从 requirements.md frontmatter 读取)
active_version: "0.2.0-0.3.0" # 活跃版本范围(仅关注这些版本的 AC
overall_progress:
format: "- [ ] Phase X: 名称 (进度%) [关联 Tasks.md ID]"
min_phases: 3
@ -87,7 +92,11 @@ startup_guide:
### 1. 启动模式
- **继续模式**:检查 `docs/progress/` 下是否存在对应模块的进度文档。若存在Read 文档 → 引用 Spec 目录 → 简短汇报状态 → 直接开始。
- **新建模式**:若满足“触发条件(硬门禁阈值)”任一项(或涉及跨模块并行),必须先创建进度文档,再开始工作。
- **新建模式**:若满足”触发条件(硬门禁阈值)”任一项(或涉及跨模块并行),必须先创建进度文档,再开始工作。
- **版本化迭代模式**
- 读取 `requirements.md` 的 frontmatter识别 `active_version``version`
- 在进度文档中记录 `active_ac_range`(如 `AC-AISVC-91~110`
- 仅关注活跃版本的 AC历史版本折叠在 `<details>` 中)可跳过
### 2. 下一步行动The Gold Rule
- 禁止输出模糊的下一步(如“继续开发”)。
@ -100,6 +109,13 @@ startup_guide:
### 4. 禁止事项
- 禁止编造或假设需求;信息不足必须询问用户,并在 Progress 中记录澄清结果。
- 禁止使用不存在的工具接口名;所有操作应基于当前环境可用工具。
- 禁止引用历史版本(已折叠)的 AC 编号;所有代码/测试/提交必须引用活跃版本的 AC。
### 5. 版本化迭代支持
- **读取版本信息**:启动时从 `requirements.md` frontmatter 读取 `active_version``version_history`
- **聚焦活跃版本**:仅关注活跃版本范围内的 AC`0.6.0-0.7.0`),历史版本可跳过
- **AC 编号连续性**:新迭代的 AC 编号延续上一版本(如 `AC-AISVC-90``AC-AISVC-91`
- **进度文档同步**:在 `context.active_ac_range` 中记录当前迭代的 AC 范围,便于快速定位
---

View File

@ -225,10 +225,153 @@ source:
---
## 4. 如何执行
## 4. 版本化迭代规则Version Iteration Protocol
### 4.1 问题背景
在多次迭代后,规范文档(尤其是 `requirements.md`)会因 AC 不断累积导致文档膨胀,进而引发:
- AI 上下文快速占满(单个文档可能 500+ 行)
- 历史需求干扰当前迭代的理解
- 难以快速定位当前活跃的 AC 范围
### 4.2 解决方案:单文档内版本分区
采用 **单文档内版本分区 + 历史折叠** 策略:
- 保留最近 1-2 个版本为”活跃版本”(展开)
- 将更早的版本折叠为 `<details>` 标签AI 默认跳过)
- 在 frontmatter 中标记 `active_version``version_history`
### 4.3 requirements.md 版本化模板
```markdown
---
feature_id: “MOD” # 模块短 ID
title: “模块需求规范”
status: “in-progress”
version: “0.3.0” # 当前最新版本
active_version: “0.2.0-0.3.0” # 活跃版本范围(展开显示)
version_history: # 版本历史索引
- version: “0.3.0”
ac_range: “AC-MOD-21~30”
description: “功能增强 C”
- version: “0.2.0”
ac_range: “AC-MOD-11~20”
description: “功能扩展 B”
- version: “0.1.0”
ac_range: “AC-MOD-01~10”
description: “基础功能 A已折叠
last_updated: “2026-02-27”
---
# 模块需求规范MOD
## 1. 背景与目标
[保持不变,描述整体背景]
## 2. 模块边界Scope
[保持不变]
## 3. 依赖盘点Dependencies
[保持不变]
## 4. 用户故事User Stories
[仅保留活跃版本的 US历史 US 折叠]
## 5. 验收标准Acceptance Criteria, EARS
### 📌 当前活跃版本v0.2.0 - v0.3.0
#### 5.2 功能扩展 Bv0.2.0
- [AC-MOD-11] WHEN 用户执行操作 X THEN 系统 SHALL 返回结果 Y
- [AC-MOD-12] WHEN 参数无效 THEN 系统 SHALL 返回 400 错误
[展开显示 v0.2.0 的所有 AC]
#### 5.3 功能增强 Cv0.3.0
- [AC-MOD-21] WHEN 用户触发事件 Z THEN 系统 SHALL 执行流程 W
- [AC-MOD-22] WHEN 条件满足 THEN 系统 SHALL 更新状态
[展开显示 v0.3.0 的所有 AC]
---
### 📦 历史版本(已归档)
<details>
<summary>v0.1.0:基础功能 AAC-MOD-01~10</summary>
#### 5.1 基础功能 Av0.1.0
- [AC-MOD-01] WHEN 用户提交请求 THEN 系统 SHALL 处理并返回
- [AC-MOD-02] WHEN 请求格式错误 THEN 系统 SHALL 返回错误信息
[折叠的历史 AC]
</details>
## 6. 追踪映射Traceability
[仅保留活跃版本的映射表]
| AC ID | Endpoint | 方法 | operationId | 备注 |
|------|----------|------|-------------|------|
| AC-MOD-11 | /api/resource | GET | getResource | v0.2.0 |
| AC-MOD-21 | /api/resource | POST | createResource | v0.3.0 |
```
### 4.4 AI 执行规则(新增迭代需求时)
当用户提出新迭代需求时AI 必须:
1. **读取 frontmatter**:识别 `version``active_version`
2. **确定新版本号**:按语义化版本递增(如 `0.6.0``0.7.0`
3. **判断是否需要折叠**
- 若活跃版本已有 2 个(如 `0.5.0-0.6.0`),则将最旧的版本(`0.5.0`)折叠
- 若活跃版本仅 1 个,则保留并追加新版本
4. **追加新需求**
- 在”当前活跃版本”区域末尾追加新章节(如 `#### 5.7 新功能v0.7.0`
- 新 AC 编号延续上一版本(如上一版本最大为 `AC-AISVC-90`,则新版本从 `AC-AISVC-91` 开始)
5. **更新 frontmatter**
- `version: “0.7.0”`
- `active_version: “0.6.0-0.7.0”`
- 在 `version_history` 中追加新版本记录
6. **更新追踪映射表**:仅保留活跃版本的映射
### 4.5 折叠策略
- **保留活跃版本数**:最多 2 个版本展开(当前版本 + 上一版本)
- **折叠粒度**:按大版本折叠(如 `v0.1.0-0.4.0` 合并为一个折叠块)
- **折叠标记**:使用 `<details><summary>` 标签AI 读取时会自动跳过折叠内容
- **AC 编号连续性**:折叠不影响 AC 编号的全局唯一性,新 AC 始终递增
### 4.6 示例:从 v0.2.0 迭代到 v0.3.0
**用户需求**:新增”功能增强 C”
**AI 执行步骤**
1. 读取 `requirements.md`,识别当前 `version: “0.2.0”``active_version: “0.1.0-0.2.0”`
2. 确定新版本号为 `0.3.0`
3. 将 `v0.1.0` 的内容移入 `<details>` 折叠块
4. 在”当前活跃版本”区域追加 `#### 5.3 功能增强 Cv0.3.0`
5. 新 AC 从 `AC-MOD-21` 开始编号(上一版本最大为 `AC-MOD-20`
6. 更新 frontmatter
```yaml
version: “0.3.0”
active_version: “0.2.0-0.3.0”
version_history:
- version: “0.3.0”
ac_range: “AC-MOD-21~30”
description: “功能增强 C”
- version: “0.2.0”
ac_range: “AC-MOD-11~20”
description: “功能扩展 B”
- version: “0.1.0”
ac_range: “AC-MOD-01~10”
description: “基础功能 A已折叠
```
---
## 5. 如何执行
1. **前置步骤**:完成模块拆分与依赖接口盘点(第 0 节)。
2. **发起需求**:仅针对“一个模块”,生成初始 `requirements.md`
2. **发起需求**:仅针对一个模块”,生成初始 `requirements.md`v0.1.0
3. **定义契约**:输出 `openapi.provider.yaml``openapi.deps.yaml`,并进行接口走查。
4. **架构设计**:生成 `design.md`,明确模块内边界、数据流与依赖策略。
5. **任务执行**:生成并执行 `tasks.md`;调用方优先基于 deps 契约 Mock 并行推进。
6. **迭代需求**:按”版本化迭代规则”(第 4 节)追加新版本需求,保持文档精简。

View File

@ -1,13 +1,13 @@
---
feature_id: "ASA"
title: "AI 中台管理界面ai-service-admin需求规范"
status: "draft"
version: "0.3.0"
status: "in-progress"
version: "0.6.0"
owners:
- "product"
- "frontend"
- "backend"
last_updated: "2026-02-24"
last_updated: "2026-02-27"
source:
type: "conversation"
ref: "Scoping Result Confirmed"
@ -159,3 +159,252 @@ source:
| AC-ASA-20 | /admin/rag/experiments/stream | POST | RAG 实验流式输出SSE |
| AC-ASA-21 | /admin/rag/experiments/run | POST | Token 统计与耗时 |
| AC-ASA-22 | /admin/rag/experiments/run | POST | 支持指定 LLM 配置 |
---
## 9. 迭代需求智能客服增强管理界面v0.6.0
> 说明:本节为 v0.6.0 迭代新增,为后端智能客服增强功能提供完整的管理界面,包括 Prompt 模板管理、多知识库管理、意图规则管理、话术流程管理、输出护栏管理。
### 9.1 Prompt 模板管理
- [AC-ASA-23] WHEN 用户访问 Prompt 模板管理页面 THEN 系统 SHALL 展示当前租户下所有模板列表(含模板名称、场景标签、当前发布版本号、更新时间),支持按场景筛选。
- [AC-ASA-24] WHEN 用户点击"新建模板"按钮 THEN 系统 SHALL 展示模板创建表单(名称、场景标签、系统指令编辑区),系统指令编辑区支持 `{{variable}}` 语法高亮提示和内置变量列表参考。
- [AC-ASA-25] WHEN 用户编辑模板并保存 THEN 系统 SHALL 调用后端创建新版本,展示版本号变更提示,并在模板详情中展示版本历史列表。
- [AC-ASA-26] WHEN 用户点击"发布"按钮 THEN 系统 SHALL 弹出确认对话框,确认后调用发布接口,展示发布成功提示,并更新列表中的发布版本号。
- [AC-ASA-27] WHEN 用户在版本历史中点击"回滚到此版本" THEN 系统 SHALL 弹出确认对话框,确认后调用回滚接口,展示回滚成功提示。
- [AC-ASA-28] WHEN 用户查看模板详情 THEN 系统 SHALL 展示当前发布版本的完整内容、变量定义列表、版本历史时间线(含版本号、状态、创建时间)。
### 9.2 多知识库管理
- [AC-ASA-29] WHEN 用户访问知识库管理页面 THEN 系统 SHALL 展示知识库列表(含名称、类型标签、文档数量、优先级、启用状态),替代原有的单一文档列表视图。
- [AC-ASA-30] WHEN 用户点击"新建知识库"按钮 THEN 系统 SHALL 展示创建表单(名称、类型选择 product/faq/script/policy/general、描述、优先级权重
- [AC-ASA-31] WHEN 用户点击知识库卡片/行 THEN 系统 SHALL 进入该知识库的文档管理视图,展示该知识库下的文档列表,上传文档时自动关联到当前知识库。
- [AC-ASA-32] WHEN 用户编辑知识库信息 THEN 系统 SHALL 支持修改名称、描述、类型、优先级、启用/禁用状态。
- [AC-ASA-33] WHEN 用户删除知识库 THEN 系统 SHALL 弹出二次确认对话框(提示将删除所有关联文档和索引数据),确认后调用删除接口。
### 9.3 意图规则管理
- [AC-ASA-34] WHEN 用户访问意图规则管理页面 THEN 系统 SHALL 展示规则列表(含意图名称、关键词摘要、响应类型标签、优先级、命中次数、启用状态),支持按响应类型筛选。
- [AC-ASA-35] WHEN 用户点击"新建规则"按钮 THEN 系统 SHALL 展示规则创建表单:
- 基础信息:意图名称、优先级
- 匹配条件:关键词标签输入(支持多个)、正则表达式输入(支持多个,带语法校验提示)
- 响应配置响应类型选择fixed/rag/flow/transfer根据类型动态展示
- `fixed`:固定回复文本编辑区
- `rag`:知识库多选下拉
- `flow`:话术流程选择下拉
- `transfer`:转人工话术编辑区
- [AC-ASA-36] WHEN 用户编辑或删除规则 THEN 系统 SHALL 支持修改所有字段,删除时弹出确认对话框。
### 9.4 话术流程管理
- [AC-ASA-37] WHEN 用户访问话术流程管理页面 THEN 系统 SHALL 展示流程列表(含流程名称、步骤数、启用状态、关联意图规则数)。
- [AC-ASA-38] WHEN 用户点击"新建流程"按钮 THEN 系统 SHALL 展示流程编辑器:
- 基础信息:流程名称、描述
- 步骤编辑区:支持添加/删除/拖拽排序步骤,每个步骤包含:
- 话术内容编辑(支持 `{{variable}}` 占位符)
- 等待用户输入开关
- 超时设置(秒数 + 超时动作选择:重复/跳过/转人工)
- 下一步条件配置(关键词匹配列表 + 跳转目标步骤)
- [AC-ASA-39] WHEN 用户查看流程详情 THEN 系统 SHALL 以步骤流程图或时间线形式展示完整流程,直观呈现步骤间的跳转关系。
### 9.5 输出护栏管理
- [AC-ASA-40] WHEN 用户访问输出护栏管理页面 THEN 系统 SHALL 展示两个标签页:「禁词管理」和「行为规则」。
- [AC-ASA-41] WHEN 用户在禁词管理标签页 THEN 系统 SHALL 展示禁词列表含词语、类别标签、替换策略、命中次数、启用状态支持按类别筛选支持批量导入CSV/文本,每行一个词)。
- [AC-ASA-42] WHEN 用户点击"添加禁词"按钮 THEN 系统 SHALL 展示添加表单词语、类别选择competitor/sensitive/political/custom、替换策略选择mask/replace/block根据策略动态展示
- `replace`:替换文本输入框
- `block`:兜底话术输入框
- [AC-ASA-43] WHEN 用户在行为规则标签页 THEN 系统 SHALL 展示行为规则列表(含规则描述、类别、启用状态),支持添加/编辑/删除/启用/禁用。
- [AC-ASA-44] WHEN 用户添加行为规则 THEN 系统 SHALL 展示添加表单规则描述文本区、类别选择compliance/tone/boundary/custom
### 9.6 用户故事v0.6.0 迭代追加)
- [US-ASA-11] 作为 Prompt 工程师,我希望在界面上创建和管理不同场景的 Prompt 模板,支持版本管理和一键回滚,以便快速迭代 AI 人设和话术风格。
- [US-ASA-12] 作为运营人员,我希望创建多个知识库并按类型分类(产品/FAQ/话术/政策),以便让 AI 在不同场景下检索最相关的知识。
- [US-ASA-13] 作为运营人员,我希望配置意图识别规则,让特定问题走固定回复或话术流程,以便保证关键场景的回复一致性。
- [US-ASA-14] 作为运营人员,我希望编排多步骤的话术流程,引导用户按固定步骤完成信息收集,以便提升客服效率。
- [US-ASA-15] 作为合规人员,我希望配置禁词和行为规则,确保 AI 不会输出竞品名称、敏感信息或违规承诺,以便满足合规要求。
### 9.7 追踪映射v0.6.0 迭代追加)
| AC ID | Endpoint | 方法 | 备注 |
|------|----------|------|-----|
| AC-ASA-23 | /admin/prompt-templates | GET | Prompt 模板列表 |
| AC-ASA-24 | /admin/prompt-templates | POST | 创建模板 |
| AC-ASA-25 | /admin/prompt-templates/{tplId} | PUT | 更新模板 |
| AC-ASA-26 | /admin/prompt-templates/{tplId}/publish | POST | 发布模板 |
| AC-ASA-27 | /admin/prompt-templates/{tplId}/rollback | POST | 回滚模板 |
| AC-ASA-28 | /admin/prompt-templates/{tplId} | GET | 模板详情 |
| AC-ASA-29 | /admin/kb/knowledge-bases | GET | 知识库列表 |
| AC-ASA-30 | /admin/kb/knowledge-bases | POST | 创建知识库 |
| AC-ASA-31 | /admin/kb/documents | GET | 知识库文档列表 |
| AC-ASA-32 | /admin/kb/knowledge-bases/{kbId} | PUT | 更新知识库 |
| AC-ASA-33 | /admin/kb/knowledge-bases/{kbId} | DELETE | 删除知识库 |
| AC-ASA-34 | /admin/intent-rules | GET | 意图规则列表 |
| AC-ASA-35 | /admin/intent-rules | POST | 创建意图规则 |
| AC-ASA-36 | /admin/intent-rules/{ruleId} | PUT/DELETE | 编辑/删除规则 |
| AC-ASA-37 | /admin/script-flows | GET | 话术流程列表 |
| AC-ASA-38 | /admin/script-flows | POST | 创建话术流程 |
| AC-ASA-39 | /admin/script-flows/{flowId} | GET | 流程详情 |
| AC-ASA-40 | - | - | 护栏管理页面布局 |
| AC-ASA-41 | /admin/guardrails/forbidden-words | GET | 禁词列表 |
| AC-ASA-42 | /admin/guardrails/forbidden-words | POST | 添加禁词 |
| AC-ASA-43 | /admin/guardrails/behavior-rules | GET | 行为规则列表 |
| AC-ASA-44 | /admin/guardrails/behavior-rules | POST | 添加行为规则 |
---
## 10. 迭代需求对话流程测试与监控v0.7.0
> 说明:本节为 v0.7.0 迭代新增,为 v0.6.0 新增的四个对话流程控制功能Prompt 模板、意图规则、话术流程、输出护栏)提供完整的测试和监控能力。
### 10.1 Dashboard 统计增强
- [AC-ASA-45] WHEN 用户访问 Dashboard 页面 THEN 系统 SHALL 在现有统计卡片基础上新增四个统计卡片:
- 意图规则命中率(命中次数/总对话次数)
- Prompt 模板使用次数(按模板分组统计)
- 话术流程激活次数(按流程分组统计)
- 护栏拦截次数(按类别分组统计)
- [AC-ASA-46] WHEN 用户点击统计卡片 THEN 系统 SHALL 跳转到对应的详细监控页面(如点击"意图规则命中率"跳转到意图规则监控页面)。
- [AC-ASA-47] WHEN Dashboard 加载统计数据 THEN 系统 SHALL 支持按时间范围筛选(今日/本周/本月/自定义),并展示趋势图表。
### 10.2 对话流程测试台RAG 实验室扩展)
- [AC-ASA-48] WHEN 用户访问 RAG 实验室页面 THEN 系统 SHALL 在原有检索测试基础上新增"完整流程测试"模式切换开关。
- [AC-ASA-49] WHEN 用户切换到"完整流程测试"模式并输入测试消息 THEN 系统 SHALL 调用后端完整的 12 步生成流程,并以可视化时间线形式展示每一步的执行结果:
- Step 1: 输入扫描(敏感词检测结果)
- Step 2: 流程检查(是否存在进行中的话术流程)
- Step 3: 意图匹配(命中的规则名称、匹配的关键词/正则)
- Step 4: Prompt 模板(使用的模板名称、版本号、渲染后的系统指令)
- Step 5-7: RAG 检索检索的知识库、Top-K 结果、相似度分数)
- Step 8: 话术流程(流程状态、当前步骤、下一步动作)
- Step 9: LLM 生成使用的模型、Token 消耗、生成耗时)
- Step 10: 上下文管理(会话历史长度、记忆摘要)
- Step 11: 输出护栏(命中的禁词、替换策略、是否拦截)
- Step 12: 最终输出(完整回复内容、置信度、是否转人工)
- [AC-ASA-50] WHEN 某一步执行失败或跳过 THEN 系统 SHALL 在时间线中标记该步骤状态(成功/失败/跳过),并展示失败原因或跳过原因。
- [AC-ASA-51] WHEN 用户点击时间线中的某一步 THEN 系统 SHALL 展开该步骤的详细数据JSON 格式),支持复制和导出。
- [AC-ASA-52] WHEN 用户在测试台中修改配置(如切换 Prompt 模板、指定知识库、调整检索参数THEN 系统 SHALL 支持对比测试,并行展示不同配置下的执行结果差异。
### 10.3 意图规则测试与监控
- [AC-ASA-53] WHEN 用户在意图规则管理页面点击"测试"按钮 THEN 系统 SHALL 弹出测试对话框,输入测试消息后展示:
- 是否命中该规则
- 匹配的关键词或正则表达式
- 规则优先级排序中的位置
- 如果未命中,展示原因(关键词不匹配/正则不匹配/优先级被其他规则抢占)
- [AC-ASA-54] WHEN 用户访问意图规则监控页面 THEN 系统 SHALL 展示规则命中统计表格(含规则名称、命中次数、命中率、最近命中时间、平均响应时间),支持按时间范围筛选和导出。
- [AC-ASA-55] WHEN 用户点击规则命中统计表格中的某一行 THEN 系统 SHALL 展示该规则的详细命中记录列表(含用户消息、命中时间、匹配的关键词、响应类型、执行结果)。
### 10.4 Prompt 模板测试与监控
- [AC-ASA-56] WHEN 用户在 Prompt 模板管理页面点击"预览"按钮 THEN 系统 SHALL 弹出预览对话框,展示:
- 模板原始内容(含变量占位符)
- 变量列表及当前值(支持手动输入测试值)
- 渲染后的完整 Prompt变量替换后的最终内容
- [AC-ASA-57] WHEN 用户在预览对话框中修改变量值 THEN 系统 SHALL 实时更新渲染后的 Prompt 内容。
- [AC-ASA-58] WHEN 用户访问 Prompt 模板监控页面 THEN 系统 SHALL 展示模板使用统计表格(含模板名称、使用次数、平均 Token 消耗、最近使用时间),支持按场景标签筛选。
### 10.5 话术流程测试与监控
- [AC-ASA-59] WHEN 用户在话术流程管理页面点击"模拟执行"按钮 THEN 系统 SHALL 弹出模拟对话框,按步骤展示话术内容,用户可输入模拟回复,系统根据下一步条件推进流程,直到流程结束。
- [AC-ASA-60] WHEN 用户访问话术流程监控页面 THEN 系统 SHALL 展示流程激活统计表格(含流程名称、激活次数、完成率、平均完成时长、中断率),支持按时间范围筛选。
- [AC-ASA-61] WHEN 用户点击流程激活统计表格中的某一行 THEN 系统 SHALL 展示该流程的详细执行记录列表(含会话 ID、激活时间、当前步骤、执行状态、中断原因
### 10.6 输出护栏测试与监控
- [AC-ASA-62] WHEN 用户在输出护栏管理页面点击"测试"按钮 THEN 系统 SHALL 弹出测试对话框,输入测试文本后展示:
- 命中的禁词列表(含词语、类别、位置)
- 应用的替换策略mask/replace/block
- 处理后的文本内容
- [AC-ASA-63] WHEN 用户访问输出护栏监控页面 THEN 系统 SHALL 展示护栏拦截统计表格(含禁词、拦截次数、类别、最近拦截时间),支持按类别筛选和导出。
- [AC-ASA-64] WHEN 用户点击护栏拦截统计表格中的某一行 THEN 系统 SHALL 展示该禁词的详细拦截记录列表(含原始文本、处理后文本、拦截时间、会话 ID
### 10.7 对话追踪页面
- [AC-ASA-65] WHEN 用户访问对话追踪页面 THEN 系统 SHALL 展示对话记录列表(含会话 ID、用户消息、AI 回复、触发的规则、使用的模板、执行耗时、创建时间),支持按时间范围、规则类型、模板、流程筛选。
- [AC-ASA-66] WHEN 用户点击对话记录列表中的某一行 THEN 系统 SHALL 展开该对话的完整执行链路12 步流程详情),以折叠面板形式展示每一步的输入输出和执行状态。
- [AC-ASA-67] WHEN 用户在对话追踪页面点击"回放"按钮 THEN 系统 SHALL 以动画形式回放该对话的执行流程,逐步展示每一步的执行过程和数据流转。
- [AC-ASA-68] WHEN 用户在对话追踪页面点击"导出"按钮 THEN 系统 SHALL 支持导出对话记录为 JSON/CSV 格式,包含完整的执行链路数据。
### 10.8 用户故事v0.7.0 迭代追加)
- [US-ASA-16] 作为 AI 开发者,我希望在测试台中输入测试消息并查看完整的 12 步执行流程,以便快速定位问题和优化配置。
- [US-ASA-17] 作为运营人员,我希望查看意图规则的命中统计和详细记录,以便评估规则效果并优化关键词配置。
- [US-ASA-18] 作为 Prompt 工程师,我希望预览 Prompt 模板的渲染结果并查看使用统计,以便评估模板效果并迭代优化。
- [US-ASA-19] 作为运营人员,我希望模拟执行话术流程并查看激活统计,以便验证流程设计并优化步骤配置。
- [US-ASA-20] 作为合规人员,我希望测试输出护栏并查看拦截统计,以便确保禁词配置有效并满足合规要求。
- [US-ASA-21] 作为系统管理员,我希望查看对话追踪记录并回放执行流程,以便排查问题和分析系统行为。
### 10.9 追踪映射v0.7.0 迭代追加)
| AC ID | Endpoint | 方法 | 备注 |
|------|----------|------|-----|
| AC-ASA-45 | /admin/dashboard/stats | GET | Dashboard 统计增强 |
| AC-ASA-46 | - | - | 前端路由跳转 |
| AC-ASA-47 | /admin/dashboard/stats | GET | 时间范围筛选参数 |
| AC-ASA-48 | - | - | 前端模式切换 |
| AC-ASA-49 | /admin/test/flow-execution | POST | 完整流程测试 |
| AC-ASA-50 | /admin/test/flow-execution | POST | 步骤状态标记 |
| AC-ASA-51 | - | - | 前端详情展开 |
| AC-ASA-52 | /admin/test/flow-execution | POST | 对比测试参数 |
| AC-ASA-53 | /admin/intent-rules/{ruleId}/test | POST | 意图规则测试 |
| AC-ASA-54 | /admin/monitoring/intent-rules | GET | 意图规则监控统计 |
| AC-ASA-55 | /admin/monitoring/intent-rules/{ruleId}/hits | GET | 规则命中记录 |
| AC-ASA-56 | /admin/prompt-templates/{tplId}/preview | POST | Prompt 模板预览 |
| AC-ASA-57 | - | - | 前端实时渲染 |
| AC-ASA-58 | /admin/monitoring/prompt-templates | GET | Prompt 模板监控统计 |
| AC-ASA-59 | /admin/script-flows/{flowId}/simulate | POST | 话术流程模拟执行 |
| AC-ASA-60 | /admin/monitoring/script-flows | GET | 话术流程监控统计 |
| AC-ASA-61 | /admin/monitoring/script-flows/{flowId}/executions | GET | 流程执行记录 |
| AC-ASA-62 | /admin/guardrails/test | POST | 输出护栏测试 |
| AC-ASA-63 | /admin/monitoring/guardrails | GET | 输出护栏监控统计 |
| AC-ASA-64 | /admin/monitoring/guardrails/{wordId}/blocks | GET | 禁词拦截记录 |
| AC-ASA-65 | /admin/monitoring/conversations | GET | 对话追踪列表 |
| AC-ASA-66 | /admin/monitoring/conversations/{conversationId} | GET | 对话执行链路详情 |
| AC-ASA-67 | - | - | 前端回放动画 |
| AC-ASA-68 | /admin/monitoring/conversations/export | POST | 对话记录导出 |

View File

@ -2,7 +2,7 @@
module: ai-service-admin
title: "AI 中台管理界面ai-service-admin任务清单"
status: "in-progress"
version: "0.6.0"
version: "0.7.0"
owners:
- "frontend"
- "backend"
@ -363,3 +363,133 @@ principles:
- [x] (P12-07) 路由注册与导航菜单:将新增的 5 个页面Prompt 模板、知识库、意图规则、话术流程、输出护栏)注册到路由和侧边导航菜单
- AC: [AC-ASA-23~AC-ASA-44]
---
## Phase 13: 测试与监控系统v0.7.0
> 页面导向:为 v0.6.0 新增的四大功能提供完整的测试和监控能力。
### 13.1 Dashboard 统计增强
- [ ] (P13-01) API 服务层:创建 src/api/monitoring.ts 和 src/types/monitoring.ts
- AC: [AC-ASA-45, AC-ASA-46, AC-ASA-47]
- [ ] (P13-02) Dashboard 统计卡片组件实现四个新增统计卡片意图规则命中率、Prompt 模板使用次数、话术流程激活次数、护栏拦截次数)
- AC: [AC-ASA-45]
- [ ] (P13-03) 统计卡片跳转:实现点击卡片跳转到对应监控页面的路由逻辑
- AC: [AC-ASA-46]
- [ ] (P13-04) 时间范围筛选:实现时间范围选择器(今日/本周/本月/自定义)并集成趋势图表
- AC: [AC-ASA-47]
### 13.2 完整流程测试台RAG 实验室扩展)
- [ ] (P13-05) 测试模式切换:在 RAG 实验室页面添加"完整流程测试"模式开关
- AC: [AC-ASA-48]
- [ ] (P13-06) 流程时间线组件:实现 FlowExecutionTimeline 组件,展示 12 步执行流程
- AC: [AC-ASA-49, AC-ASA-50]
- [ ] (P13-07) 步骤详情展开实现步骤卡片点击展开详细数据JSON 格式),支持复制和导出
- AC: [AC-ASA-51]
- [ ] (P13-08) 对比测试功能:实现并行展示不同配置下的执行结果差异
- AC: [AC-ASA-52]
### 13.3 意图规则测试与监控
- [ ] (P13-09) 规则测试对话框:在意图规则管理页面实现测试对话框,展示匹配结果和冲突检测
- AC: [AC-ASA-53]
- [ ] (P13-10) 意图规则监控页面:创建 /admin/monitoring/intent-rules 页面,展示规则命中统计表格
- AC: [AC-ASA-54]
- [ ] (P13-11) 规则命中记录详情:实现命中记录列表弹窗,展示详细命中记录
- AC: [AC-ASA-55]
### 13.4 Prompt 模板测试与监控
- [ ] (P13-12) 模板预览对话框:在 Prompt 模板管理页面实现预览对话框,支持变量测试和实时渲染
- AC: [AC-ASA-56, AC-ASA-57]
- [ ] (P13-13) Prompt 模板监控页面:创建 /admin/monitoring/prompt-templates 页面,展示模板使用统计
- AC: [AC-ASA-58]
### 13.5 话术流程测试与监控
- [ ] (P13-14) 流程模拟对话框:在话术流程管理页面实现模拟执行对话框,支持步骤推进
- AC: [AC-ASA-59]
- [ ] (P13-15) 话术流程监控页面:创建 /admin/monitoring/script-flows 页面,展示流程激活统计
- AC: [AC-ASA-60]
- [ ] (P13-16) 流程执行记录详情:实现执行记录列表弹窗,展示详细执行记录
- AC: [AC-ASA-61]
### 13.6 输出护栏测试与监控
- [ ] (P13-17) 护栏测试对话框:在输出护栏管理页面实现测试对话框,展示禁词检测结果
- AC: [AC-ASA-62]
- [ ] (P13-18) 输出护栏监控页面:创建 /admin/monitoring/guardrails 页面,展示护栏拦截统计
- AC: [AC-ASA-63]
- [ ] (P13-19) 护栏拦截记录详情:实现拦截记录列表弹窗,展示详细拦截记录
- AC: [AC-ASA-64]
### 13.7 对话追踪页面
- [ ] (P13-20) 对话追踪列表页:创建 /admin/monitoring/conversations 页面,展示对话记录列表
- AC: [AC-ASA-65]
- [ ] (P13-21) 对话执行链路详情:实现对话详情弹窗,展示完整 12 步执行链路
- AC: [AC-ASA-66]
- [ ] (P13-22) 对话回放功能:实现动画回放对话执行流程
- AC: [AC-ASA-67]
- [ ] (P13-23) 对话记录导出:实现导出功能,支持 JSON/CSV 格式
- AC: [AC-ASA-68]
### 13.8 路由与导航整合
- [ ] (P13-24) 监控页面路由注册:将新增的 6 个监控页面注册到路由系统
- AC: [AC-ASA-45~AC-ASA-68]
- [ ] (P13-25) 导航菜单更新:在侧边导航菜单中添加"监控中心"分组,包含所有监控页面入口
- AC: [AC-ASA-45~AC-ASA-68]
---
## Phase 13 任务进度追踪
| 任务 | 描述 | 状态 |
|------|------|------|
| P13-01 | API 服务层 | ⏳ 待处理 |
| P13-02 | Dashboard 统计卡片 | ⏳ 待处理 |
| P13-03 | 统计卡片跳转 | ⏳ 待处理 |
| P13-04 | 时间范围筛选 | ⏳ 待处理 |
| P13-05 | 测试模式切换 | ⏳ 待处理 |
| P13-06 | 流程时间线组件 | ⏳ 待处理 |
| P13-07 | 步骤详情展开 | ⏳ 待处理 |
| P13-08 | 对比测试功能 | ⏳ 待处理 |
| P13-09 | 规则测试对话框 | ⏳ 待处理 |
| P13-10 | 意图规则监控页面 | ⏳ 待处理 |
| P13-11 | 规则命中记录详情 | ⏳ 待处理 |
| P13-12 | 模板预览对话框 | ⏳ 待处理 |
| P13-13 | Prompt 模板监控页面 | ⏳ 待处理 |
| P13-14 | 流程模拟对话框 | ⏳ 待处理 |
| P13-15 | 话术流程监控页面 | ⏳ 待处理 |
| P13-16 | 流程执行记录详情 | ⏳ 待处理 |
| P13-17 | 护栏测试对话框 | ⏳ 待处理 |
| P13-18 | 输出护栏监控页面 | ⏳ 待处理 |
| P13-19 | 护栏拦截记录详情 | ⏳ 待处理 |
| P13-20 | 对话追踪列表页 | ⏳ 待处理 |
| P13-21 | 对话执行链路详情 | ⏳ 待处理 |
| P13-22 | 对话回放功能 | ⏳ 待处理 |
| P13-23 | 对话记录导出 | ⏳ 待处理 |
| P13-24 | 监控页面路由注册 | ⏳ 待处理 |
| P13-25 | 导航菜单更新 | ⏳ 待处理 |

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -1,12 +1,12 @@
---
feature_id: "AISVC"
title: "Python AI 中台ai-service需求规范"
status: "completed"
version: "0.4.0"
status: "in-progress"
version: "0.6.0"
owners:
- "product"
- "backend"
last_updated: "2026-02-24"
last_updated: "2026-02-25"
source:
type: "conversation"
ref: ""
@ -306,3 +306,344 @@ source:
| AC-AISVC-48 | /admin/rag/experiments/stream | POST | runRagExperimentStream | RAG 实验流式输出 |
| AC-AISVC-49 | /admin/rag/experiments/run | POST | runRagExperiment | Token 统计 |
| AC-AISVC-50 | /admin/rag/experiments/run | POST | runRagExperiment | 指定 LLM 提供者 |
---
## 12. 迭代需求:智能客服增强 — Prompt 模板化 + 多知识库 + 规则引擎 + 输出护栏v0.6.0
> 说明:本节为 v0.6.0 迭代新增。目标是将 AI 中台从"单知识库 + 硬编码 Prompt"升级为可配置的拟人客服中台,支持:
> - Prompt 模板数据库驱动(按租户配置人设、语气、禁忌词)
> - 多知识库分类管理与智能路由
> - 意图识别 + 话术流程引擎(固定步骤引导)
> - 输出护栏(禁词检测、敏感内容过滤)
> - 智能 RAG 增强Query 改写、上下文感知检索、分层知识优先级)
### 12.1 Prompt 模板化(数据库驱动)
- [AC-AISVC-51] WHEN 系统启动或租户首次请求 THEN 系统 SHALL 从数据库加载该租户的 Prompt 模板配置(人设名称、语气风格、角色描述、系统指令),替代硬编码的 `SYSTEM_PROMPT`
- [AC-AISVC-52] WHEN 管理员通过 `POST /admin/prompt-templates` 创建 Prompt 模板 THEN 系统 SHALL 存储模板内容(含模板名称、场景标签、系统指令、变量占位符列表),并返回模板 ID。
- [AC-AISVC-53] WHEN 管理员通过 `PUT /admin/prompt-templates/{tplId}` 更新模板 THEN 系统 SHALL 创建新版本(版本号自增),保留历史版本,且同一时间仅一个版本为"已发布"状态。
- [AC-AISVC-54] WHEN 管理员通过 `POST /admin/prompt-templates/{tplId}/publish` 发布指定版本 THEN 系统 SHALL 将该版本标记为"已发布",旧版本自动降级为"历史版本",发布后立即生效(热更新)。
- [AC-AISVC-55] WHEN 管理员通过 `POST /admin/prompt-templates/{tplId}/rollback` 回滚 THEN 系统 SHALL 将指定历史版本重新标记为"已发布"。
- [AC-AISVC-56] WHEN Orchestrator 构建 Prompt 时 THEN 系统 SHALL 从已发布的模板中读取系统指令,并替换内置变量(如 `{{persona_name}}`、`{{current_time}}`、`{{channel_type}}`),最终拼接为 LLM 的 system message。
- [AC-AISVC-57] WHEN 管理员通过 `GET /admin/prompt-templates` 查询模板列表 THEN 系统 SHALL 返回该租户下所有模板(含场景标签、当前发布版本号、更新时间),支持按场景筛选。
- [AC-AISVC-58] WHEN 管理员通过 `GET /admin/prompt-templates/{tplId}` 查询模板详情 THEN 系统 SHALL 返回模板所有版本列表及当前发布版本的完整内容。
### 12.2 多知识库分类管理
- [AC-AISVC-59] WHEN 管理员通过 `POST /admin/kb/knowledge-bases` 创建知识库 THEN 系统 SHALL 创建独立的知识库实体,包含名称、类型(`product` 产品知识 / `faq` 常见问题 / `script` 话术模板 / `policy` 政策规范 / `general` 通用)、描述、优先级权重,并在 Qdrant 中初始化对应的 Collection。
- [AC-AISVC-60] WHEN 管理员通过 `GET /admin/kb/knowledge-bases` 查询知识库列表 THEN 系统 SHALL 返回该租户下所有知识库(含文档数量、索引状态统计、类型、优先级)。
- [AC-AISVC-61] WHEN 管理员通过 `PUT /admin/kb/knowledge-bases/{kbId}` 更新知识库 THEN 系统 SHALL 支持修改名称、描述、类型、优先级权重、启用/禁用状态。
- [AC-AISVC-62] WHEN 管理员通过 `DELETE /admin/kb/knowledge-bases/{kbId}` 删除知识库 THEN 系统 SHALL 删除知识库实体、关联文档记录,并清理 Qdrant 中对应的 Collection 数据。
- [AC-AISVC-63] WHEN 文档上传时指定 `kbId` THEN 系统 SHALL 将文档索引到指定知识库对应的 Qdrant Collection替代原有的 `kb_default` 硬编码)。
- [AC-AISVC-64] WHEN Orchestrator 执行 RAG 检索 THEN 系统 SHALL 根据意图路由结果选择目标知识库集合进行检索,而非全库搜索。若未命中意图规则,则按知识库优先级权重依次检索。
### 12.3 意图识别与规则引擎
- [AC-AISVC-65] WHEN 管理员通过 `POST /admin/intent-rules` 创建意图规则 THEN 系统 SHALL 存储规则(含意图名称、关键词列表、正则模式列表、优先级、响应类型 `flow` / `rag` / `fixed` / `transfer`、关联的知识库 ID 列表或话术流程 ID 或固定回复内容)。
- [AC-AISVC-66] WHEN 管理员通过 `GET /admin/intent-rules` 查询意图规则列表 THEN 系统 SHALL 返回该租户下所有规则(含命中统计、启用状态),支持按意图名称和响应类型筛选。
- [AC-AISVC-67] WHEN 管理员通过 `PUT /admin/intent-rules/{ruleId}` 更新规则 THEN 系统 SHALL 支持修改关键词、正则、优先级、响应类型、关联资源,更新后立即生效。
- [AC-AISVC-68] WHEN 管理员通过 `DELETE /admin/intent-rules/{ruleId}` 删除规则 THEN 系统 SHALL 删除规则并立即生效。
- [AC-AISVC-69] WHEN 用户消息进入 Orchestrator THEN 系统 SHALL 在 RAG 检索之前执行意图识别:按优先级遍历规则,依次进行关键词匹配和正则匹配。命中规则后根据 `response_type` 路由:
- `fixed`:直接返回固定回复,跳过 LLM 调用。
- `flow`:进入话术流程引擎,按步骤推进。
- `rag`:使用规则关联的知识库集合进行定向检索。
- `transfer`:直接设置 `shouldTransfer=true` 并返回转人工话术。
- [AC-AISVC-70] WHEN 所有规则均未命中 THEN 系统 SHALL 回退到默认 RAG pipeline按知识库优先级检索行为与现有逻辑一致。
### 12.4 话术流程引擎(状态机)
- [AC-AISVC-71] WHEN 管理员通过 `POST /admin/script-flows` 创建话术流程 THEN 系统 SHALL 存储流程定义(含流程名称、触发条件描述、步骤列表),每个步骤包含:步骤序号、话术内容(支持变量占位符)、等待用户输入标志、超时秒数、超时后动作(重复/跳过/转人工)、下一步条件(关键词匹配或无条件推进)。
- [AC-AISVC-72] WHEN 管理员通过 `GET /admin/script-flows` 查询流程列表 THEN 系统 SHALL 返回该租户下所有流程(含步骤数、启用状态、关联意图规则数)。
- [AC-AISVC-73] WHEN 管理员通过 `PUT /admin/script-flows/{flowId}` 更新流程 THEN 系统 SHALL 支持修改流程步骤、触发条件、启用/禁用状态。
- [AC-AISVC-74] WHEN 意图识别命中 `flow` 类型规则 THEN 系统 SHALL 为当前会话创建流程执行实例(关联 `session_id` + `flow_id`),从第一步开始执行,返回第一步的话术内容。
- [AC-AISVC-75] WHEN 会话存在进行中的流程实例 THEN 系统 SHALL 优先处理流程逻辑:根据用户输入匹配当前步骤的下一步条件,推进到下一步骤并返回对应话术。若用户输入不匹配任何条件,则重复当前步骤话术或按配置处理。
- [AC-AISVC-76] WHEN 流程执行到最后一步或用户触发退出条件 THEN 系统 SHALL 标记流程实例为"已完成",后续消息恢复正常 RAG pipeline 处理。
- [AC-AISVC-77] WHEN 流程步骤超时(用户在指定时间内未回复) THEN 系统 SHALL 按步骤配置执行超时动作(重复当前话术 / 跳到下一步 / 转人工。注超时检测由调用方Java 侧触发AI 中台提供查询流程状态的接口。
### 12.5 输出护栏(禁词检测与内容过滤)
- [AC-AISVC-78] WHEN 管理员通过 `POST /admin/guardrails/forbidden-words` 添加禁词 THEN 系统 SHALL 存储禁词(含词语、类别 `competitor` / `sensitive` / `political` / `custom`、替换策略 `mask` 星号替换 / `replace` 替换为指定文本 / `block` 拦截整条回复并返回兜底话术)。
- [AC-AISVC-79] WHEN 管理员通过 `GET /admin/guardrails/forbidden-words` 查询禁词列表 THEN 系统 SHALL 返回该租户下所有禁词(支持按类别筛选),含命中统计。
- [AC-AISVC-80] WHEN 管理员通过 `PUT /admin/guardrails/forbidden-words/{wordId}` 更新禁词 THEN 系统 SHALL 支持修改词语、类别、替换策略、启用/禁用状态。
- [AC-AISVC-81] WHEN 管理员通过 `DELETE /admin/guardrails/forbidden-words/{wordId}` 删除禁词 THEN 系统 SHALL 删除禁词并立即生效。
- [AC-AISVC-82] WHEN LLM 生成回复后non-streaming 和 streaming 均适用) THEN 系统 SHALL 执行后置过滤:扫描回复内容中的禁词,按替换策略处理:
- `mask`:将禁词替换为等长星号(如"竞品A" → "***")。
- `replace`:将禁词替换为配置的替换文本。
- `block`:丢弃整条回复,返回预配置的兜底话术(如"抱歉,让我换个方式回答您"),并在 metadata 中标记 `guardrail_triggered=true`
- [AC-AISVC-83] WHEN 用户输入包含禁词(前置检测) THEN 系统 SHALL 在 metadata 中记录 `input_flagged=true` 及命中的禁词类别,但不阻断请求处理(仅记录,不拦截用户输入)。
- [AC-AISVC-84] WHEN 管理员通过 `POST /admin/guardrails/behavior-rules` 添加行为规则 THEN 系统 SHALL 存储行为约束(如"不允许承诺具体赔偿金额"、"不允许透露内部流程"),这些规则将被注入到 Prompt 模板的系统指令中作为 LLM 的行为约束。
- [AC-AISVC-85] WHEN 管理员通过 `GET /admin/guardrails/behavior-rules` 查询行为规则列表 THEN 系统 SHALL 返回该租户下所有行为规则。
### 12.6 智能 RAG 增强
- [AC-AISVC-86] WHEN Orchestrator 执行 RAG 检索前 THEN 系统 SHALL 对用户 Query 进行改写增强:结合对话历史解析指代词(如"它"、"这个"指代上文提到的产品),补全查询语义,生成优化后的检索 Query。Query 改写通过 LLM 调用实现(使用简短的改写 Prompt
- [AC-AISVC-87] WHEN 多知识库检索返回结果后 THEN 系统 SHALL 按知识库优先级对结果进行分层排序:`script`(话术模板)> `faq`(常见问题)> `product`(产品知识)> `policy`(政策规范)> `general`(通用),同层内按相似度分数排序。高优先级知识库的命中结果优先进入 LLM 上下文。
- [AC-AISVC-88] WHEN 检索结果中存在 `script` 类型知识库的高分命中score > 配置阈值) THEN 系统 SHALL 优先使用话术模板内容作为回复参考,引导 LLM 生成贴近标准话术的回复。
### 12.7 数据预处理扩展点(架构预留)
- [AC-AISVC-89] WHEN 系统设计文档解析服务时 THEN 系统 SHALL 在 `DocumentParser` 抽象接口中预留 `AudioParser`(语音转文字)和 `VideoParser`(视频提取音轨转文字)的扩展点,但 v0.6.0 不实现具体解析逻辑。
- [AC-AISVC-90] WHEN 系统设计知识库索引服务时 THEN 系统 SHALL 在 `KnowledgeIndexer` 中预留"对话记录结构化导入"接口(接受结构化 JSON 格式的对话记录,按轮次分块索引),但 v0.6.0 不实现具体导入逻辑。
### 12.8 追踪映射v0.6.0 迭代)
| AC ID | Endpoint | 方法 | Operation | 描述 |
|-------|----------|------|-----------|------|
| AC-AISVC-51 | - | - | - | Prompt 模板数据库驱动加载 |
| AC-AISVC-52 | /admin/prompt-templates | POST | createPromptTemplate | 创建 Prompt 模板 |
| AC-AISVC-53 | /admin/prompt-templates/{tplId} | PUT | updatePromptTemplate | 更新模板(版本化) |
| AC-AISVC-54 | /admin/prompt-templates/{tplId}/publish | POST | publishPromptTemplate | 发布模板版本 |
| AC-AISVC-55 | /admin/prompt-templates/{tplId}/rollback | POST | rollbackPromptTemplate | 回滚模板版本 |
| AC-AISVC-56 | - | - | - | Prompt 变量替换与拼接 |
| AC-AISVC-57 | /admin/prompt-templates | GET | listPromptTemplates | 模板列表查询 |
| AC-AISVC-58 | /admin/prompt-templates/{tplId} | GET | getPromptTemplate | 模板详情查询 |
| AC-AISVC-59 | /admin/kb/knowledge-bases | POST | createKnowledgeBase | 创建知识库 |
| AC-AISVC-60 | /admin/kb/knowledge-bases | GET | listKnowledgeBases | 知识库列表 |
| AC-AISVC-61 | /admin/kb/knowledge-bases/{kbId} | PUT | updateKnowledgeBase | 更新知识库 |
| AC-AISVC-62 | /admin/kb/knowledge-bases/{kbId} | DELETE | deleteKnowledgeBase | 删除知识库 |
| AC-AISVC-63 | /admin/kb/documents | POST | uploadDocument | 文档指定知识库上传 |
| AC-AISVC-64 | - | - | - | 多知识库智能路由检索 |
| AC-AISVC-65 | /admin/intent-rules | POST | createIntentRule | 创建意图规则 |
| AC-AISVC-66 | /admin/intent-rules | GET | listIntentRules | 意图规则列表 |
| AC-AISVC-67 | /admin/intent-rules/{ruleId} | PUT | updateIntentRule | 更新意图规则 |
| AC-AISVC-68 | /admin/intent-rules/{ruleId} | DELETE | deleteIntentRule | 删除意图规则 |
| AC-AISVC-69 | - | - | - | 意图识别与路由执行 |
| AC-AISVC-70 | - | - | - | 未命中规则回退默认 RAG |
| AC-AISVC-71 | /admin/script-flows | POST | createScriptFlow | 创建话术流程 |
| AC-AISVC-72 | /admin/script-flows | GET | listScriptFlows | 话术流程列表 |
| AC-AISVC-73 | /admin/script-flows/{flowId} | PUT | updateScriptFlow | 更新话术流程 |
| AC-AISVC-74 | - | - | - | 流程实例创建与首步执行 |
| AC-AISVC-75 | - | - | - | 流程步骤推进 |
| AC-AISVC-76 | - | - | - | 流程完成与恢复 |
| AC-AISVC-77 | - | - | - | 流程超时处理 |
| AC-AISVC-78 | /admin/guardrails/forbidden-words | POST | addForbiddenWord | 添加禁词 |
| AC-AISVC-79 | /admin/guardrails/forbidden-words | GET | listForbiddenWords | 禁词列表 |
| AC-AISVC-80 | /admin/guardrails/forbidden-words/{wordId} | PUT | updateForbiddenWord | 更新禁词 |
| AC-AISVC-81 | /admin/guardrails/forbidden-words/{wordId} | DELETE | deleteForbiddenWord | 删除禁词 |
| AC-AISVC-82 | - | - | - | LLM 输出后置禁词过滤 |
| AC-AISVC-83 | - | - | - | 用户输入前置禁词检测 |
| AC-AISVC-84 | /admin/guardrails/behavior-rules | POST | addBehaviorRule | 添加行为规则 |
| AC-AISVC-85 | /admin/guardrails/behavior-rules | GET | listBehaviorRules | 行为规则列表 |
| AC-AISVC-86 | - | - | - | Query 改写增强 |
| AC-AISVC-87 | - | - | - | 多知识库分层排序 |
| AC-AISVC-88 | - | - | - | 话术模板优先匹配 |
| AC-AISVC-89 | - | - | - | 音视频解析扩展点预留 |
| AC-AISVC-90 | - | - | - | 对话记录导入扩展点预留 |
---
## 13. 迭代需求:对话流程测试与监控 APIv0.7.0
> 说明:本节为 v0.7.0 迭代新增,为前端测试与监控功能提供后端 API 支持。
### 13.1 Dashboard 统计增强 API
- [AC-AISVC-91] WHEN 前端通过 `GET /admin/dashboard/stats` 请求 Dashboard 统计数据 THEN 系统 SHALL 在现有统计基础上新增以下字段:
- `intentRuleHitRate`: 意图规则命中率(命中次数/总对话次数)
- `intentRuleHitCount`: 意图规则总命中次数
- `promptTemplateUsageCount`: Prompt 模板使用次数
- `scriptFlowActivationCount`: 话术流程激活次数
- `guardrailBlockCount`: 护栏拦截次数
- [AC-AISVC-92] WHEN 前端通过 `GET /admin/dashboard/stats` 请求并指定时间范围参数(`start_time`, `end_time`THEN 系统 SHALL 返回该时间范围内的统计数据。
### 13.2 完整流程测试 API
- [AC-AISVC-93] WHEN 前端通过 `POST /admin/test/flow-execution` 提交测试请求 THEN 系统 SHALL 执行完整的 12 步生成流程,并返回每一步的详细执行结果,包含:
- `step`: 步骤编号1-12
- `name`: 步骤名称
- `status`: 执行状态success/failed/skipped
- `duration_ms`: 执行耗时(毫秒)
- `input`: 步骤输入数据
- `output`: 步骤输出数据
- `error`: 错误信息(如果失败)
- `metadata`: 步骤元数据(如命中的规则、使用的模板等)
- [AC-AISVC-94] WHEN 测试请求包含对比配置参数(如不同的 Prompt 模板 ID、知识库 ID 列表THEN 系统 SHALL 并行执行多个配置的测试,并返回对比结果数组。
- [AC-AISVC-95] WHEN 测试执行过程中某一步失败 THEN 系统 SHALL 记录失败原因,并继续执行后续步骤(尽力而为模式),最终返回完整的执行链路。
### 13.3 意图规则测试与监控 API
- [AC-AISVC-96] WHEN 前端通过 `POST /admin/intent-rules/{ruleId}/test` 提交测试消息 THEN 系统 SHALL 返回测试结果:
- `matched`: 是否命中该规则boolean
- `matchedKeywords`: 匹配的关键词列表
- `matchedPatterns`: 匹配的正则表达式列表
- `priority`: 规则优先级
- `priorityRank`: 在所有规则中的优先级排名
- `conflictRules`: 同时命中的其他规则列表(优先级冲突检测)
- `reason`: 未命中原因(如果未命中)
- [AC-AISVC-97] WHEN 前端通过 `GET /admin/monitoring/intent-rules` 查询意图规则监控统计 THEN 系统 SHALL 返回规则统计列表:
- `ruleId`: 规则 ID
- `ruleName`: 规则名称
- `hitCount`: 命中次数
- `hitRate`: 命中率(命中次数/总对话次数)
- `avgResponseTime`: 平均响应时间(毫秒)
- `lastHitTime`: 最近命中时间
- `responseType`: 响应类型
- [AC-AISVC-98] WHEN 前端通过 `GET /admin/monitoring/intent-rules/{ruleId}/hits` 查询规则命中记录 THEN 系统 SHALL 返回该规则的详细命中记录列表(支持分页):
- `conversationId`: 对话 ID
- `sessionId`: 会话 ID
- `userMessage`: 用户消息
- `matchedKeywords`: 匹配的关键词
- `matchedPatterns`: 匹配的正则表达式
- `responseType`: 响应类型
- `executionResult`: 执行结果(成功/失败)
- `hitTime`: 命中时间
### 13.4 Prompt 模板测试与监控 API
- [AC-AISVC-99] WHEN 前端通过 `POST /admin/prompt-templates/{tplId}/preview` 提交预览请求含变量测试值THEN 系统 SHALL 返回预览结果:
- `templateId`: 模板 ID
- `templateName`: 模板名称
- `version`: 版本号
- `rawContent`: 原始模板内容(含变量占位符)
- `variables`: 变量列表及当前值
- `renderedContent`: 渲染后的完整 Prompt 内容
- `estimatedTokens`: 预估 Token 数量
- [AC-AISVC-100] WHEN 前端通过 `GET /admin/monitoring/prompt-templates` 查询 Prompt 模板监控统计 THEN 系统 SHALL 返回模板统计列表:
- `templateId`: 模板 ID
- `templateName`: 模板名称
- `scene`: 场景标签
- `usageCount`: 使用次数
- `avgTokens`: 平均 Token 消耗
- `avgPromptTokens`: 平均 Prompt Token 消耗
- `avgCompletionTokens`: 平均 Completion Token 消耗
- `lastUsedTime`: 最近使用时间
### 13.5 话术流程测试与监控 API
- [AC-AISVC-101] WHEN 前端通过 `POST /admin/script-flows/{flowId}/simulate` 提交模拟执行请求 THEN 系统 SHALL 创建模拟会话并返回首步话术:
- `simulationId`: 模拟会话 ID
- `flowId`: 流程 ID
- `flowName`: 流程名称
- `currentStep`: 当前步骤编号
- `stepContent`: 当前步骤话术内容
- `waitForInput`: 是否等待用户输入
- `nextConditions`: 下一步条件列表
- [AC-AISVC-102] WHEN 前端通过 `POST /admin/script-flows/{flowId}/simulate/{simulationId}/next` 提交用户模拟输入 THEN 系统 SHALL 根据下一步条件推进流程,并返回下一步话术或流程结束标志。
- [AC-AISVC-103] WHEN 前端通过 `GET /admin/monitoring/script-flows` 查询话术流程监控统计 THEN 系统 SHALL 返回流程统计列表:
- `flowId`: 流程 ID
- `flowName`: 流程名称
- `activationCount`: 激活次数
- `completionCount`: 完成次数
- `completionRate`: 完成率(完成次数/激活次数)
- `avgCompletionTime`: 平均完成时长(秒)
- `interruptionCount`: 中断次数
- `interruptionRate`: 中断率
- [AC-AISVC-104] WHEN 前端通过 `GET /admin/monitoring/script-flows/{flowId}/executions` 查询流程执行记录 THEN 系统 SHALL 返回该流程的详细执行记录列表(支持分页):
- `executionId`: 执行实例 ID
- `sessionId`: 会话 ID
- `activationTime`: 激活时间
- `currentStep`: 当前步骤
- `status`: 执行状态in_progress/completed/interrupted
- `interruptionReason`: 中断原因(如果中断)
- `completionTime`: 完成时间
### 13.6 输出护栏测试与监控 API
- [AC-AISVC-105] WHEN 前端通过 `POST /admin/guardrails/test` 提交测试文本 THEN 系统 SHALL 返回护栏检测结果:
- `originalText`: 原始文本
- `processedText`: 处理后文本
- `detectedWords`: 检测到的禁词列表(含词语、类别、位置、策略)
- `isBlocked`: 是否被拦截boolean
- `blockReason`: 拦截原因(如果被拦截)
- [AC-AISVC-106] WHEN 前端通过 `GET /admin/monitoring/guardrails` 查询输出护栏监控统计 THEN 系统 SHALL 返回护栏统计列表:
- `wordId`: 禁词 ID
- `word`: 禁词内容
- `category`: 类别
- `blockCount`: 拦截次数
- `replaceCount`: 替换次数
- `maskCount`: 掩码次数
- `lastBlockTime`: 最近拦截时间
- [AC-AISVC-107] WHEN 前端通过 `GET /admin/monitoring/guardrails/{wordId}/blocks` 查询禁词拦截记录 THEN 系统 SHALL 返回该禁词的详细拦截记录列表(支持分页):
- `blockId`: 拦截记录 ID
- `sessionId`: 会话 ID
- `originalText`: 原始文本
- `processedText`: 处理后文本
- `strategy`: 应用的策略mask/replace/block
- `blockTime`: 拦截时间
### 13.7 对话追踪 API
- [AC-AISVC-108] WHEN 前端通过 `GET /admin/monitoring/conversations` 查询对话追踪列表 THEN 系统 SHALL 返回对话记录列表(支持分页和筛选):
- `conversationId`: 对话 ID
- `sessionId`: 会话 ID
- `userMessage`: 用户消息
- `aiReply`: AI 回复
- `triggeredRules`: 触发的意图规则列表
- `usedTemplate`: 使用的 Prompt 模板
- `usedFlow`: 使用的话术流程
- `executionTime`: 执行耗时(毫秒)
- `createdAt`: 创建时间
- [AC-AISVC-109] WHEN 前端通过 `GET /admin/monitoring/conversations/{conversationId}` 查询对话执行链路详情 THEN 系统 SHALL 返回该对话的完整 12 步执行链路数据(与测试 API 返回格式一致)。
- [AC-AISVC-110] WHEN 前端通过 `POST /admin/monitoring/conversations/export` 提交导出请求含筛选条件THEN 系统 SHALL 生成导出文件JSON/CSV 格式),并返回下载链接或直接返回文件流。
### 13.8 追踪映射v0.7.0 迭代)
| AC ID | Endpoint | 方法 | Operation | 描述 |
|-------|----------|------|-----------|------|
| AC-AISVC-91 | /admin/dashboard/stats | GET | getDashboardStats | Dashboard 统计增强 |
| AC-AISVC-92 | /admin/dashboard/stats | GET | getDashboardStats | 时间范围筛选 |
| AC-AISVC-93 | /admin/test/flow-execution | POST | testFlowExecution | 完整流程测试 |
| AC-AISVC-94 | /admin/test/flow-execution | POST | testFlowExecution | 对比测试 |
| AC-AISVC-95 | /admin/test/flow-execution | POST | testFlowExecution | 失败容错处理 |
| AC-AISVC-96 | /admin/intent-rules/{ruleId}/test | POST | testIntentRule | 意图规则测试 |
| AC-AISVC-97 | /admin/monitoring/intent-rules | GET | getIntentRuleStats | 意图规则监控统计 |
| AC-AISVC-98 | /admin/monitoring/intent-rules/{ruleId}/hits | GET | getIntentRuleHits | 规则命中记录 |
| AC-AISVC-99 | /admin/prompt-templates/{tplId}/preview | POST | previewPromptTemplate | Prompt 模板预览 |
| AC-AISVC-100 | /admin/monitoring/prompt-templates | GET | getPromptTemplateStats | Prompt 模板监控统计 |
| AC-AISVC-101 | /admin/script-flows/{flowId}/simulate | POST | simulateScriptFlow | 话术流程模拟执行 |
| AC-AISVC-102 | /admin/script-flows/{flowId}/simulate/{simulationId}/next | POST | simulateScriptFlowNext | 流程模拟推进 |
| AC-AISVC-103 | /admin/monitoring/script-flows | GET | getScriptFlowStats | 话术流程监控统计 |
| AC-AISVC-104 | /admin/monitoring/script-flows/{flowId}/executions | GET | getScriptFlowExecutions | 流程执行记录 |
| AC-AISVC-105 | /admin/guardrails/test | POST | testGuardrail | 输出护栏测试 |
| AC-AISVC-106 | /admin/monitoring/guardrails | GET | getGuardrailStats | 输出护栏监控统计 |
| AC-AISVC-107 | /admin/monitoring/guardrails/{wordId}/blocks | GET | getGuardrailBlocks | 禁词拦截记录 |
| AC-AISVC-108 | /admin/monitoring/conversations | GET | getConversations | 对话追踪列表 |
| AC-AISVC-109 | /admin/monitoring/conversations/{conversationId} | GET | getConversationDetail | 对话执行链路详情 |
| AC-AISVC-110 | /admin/monitoring/conversations/export | POST | exportConversations | 对话记录导出 |

View File

@ -2,7 +2,7 @@
feature_id: "AISVC"
title: "Python AI 中台ai-service任务清单"
status: "in-progress"
version: "0.6.0"
version: "0.7.0"
last_updated: "2026-02-27"
---
@ -247,3 +247,170 @@ last_updated: "2026-02-27"
- [ ] T15.6 预留 `AudioParser``VideoParser` 扩展点(仅接口定义,不实现) `[AC-AISVC-89]`
- [ ] T15.7 预留对话记录结构化导入接口(仅接口定义,不实现) `[AC-AISVC-90]`
- [ ] T15.8 编写 Orchestrator 升级集成测试:验证意图路由 → 流程引擎 → 多知识库检索 → 护栏过滤的完整链路 `[AC-AISVC-51~AC-AISVC-90]`
---
### Phase 16: 测试与监控系统v0.7.0 迭代)
> 目标:为 v0.6.0 新增的四大功能提供完整的测试和监控 API支持前端测试台和监控页面。
#### 16.1 监控数据模型与基础设施
- [ ] T16.1 定义监控数据实体:扩展 `ChatMessage` 实体,新增 `prompt_template_id`、`intent_rule_id`、`flow_instance_id`、`guardrail_triggered`、`guardrail_words` 字段 `[AC-AISVC-91, AC-AISVC-92]`
- [ ] T16.2 实现 Redis 统计缓存层:创建 `MonitoringCache` 服务,支持实时计数器和 Top N 排行榜缓存 `[AC-AISVC-91, AC-AISVC-92]`
- [ ] T16.3 实现 `FlowTestRecord` 实体:用于存储完整流程测试的详细日志(保留 7 天) `[AC-AISVC-93, AC-AISVC-94, AC-AISVC-95]`
- [ ] T16.4 实现 `ExportTask` 实体:用于管理对话导出任务的异步处理 `[AC-AISVC-110]`
- [ ] T16.5 创建监控数据库索引:优化查询性能(按 tenant_id、created_at、intent_rule_id、flow_instance_id 等) `[AC-AISVC-91~AC-AISVC-110]`
#### 16.2 Dashboard 统计增强
- [ ] T16.6 实现 `DashboardService.get_enhanced_stats()`聚合意图规则、Prompt 模板、话术流程、护栏的统计数据 `[AC-AISVC-91]`
- [ ] T16.7 实现统计数据缓存策略Dashboard 统计结果缓存 60 秒Top N 排行榜每 5 分钟预计算 `[AC-AISVC-91]`
- [ ] T16.8 实现 `GET /admin/dashboard/stats` API 增强支持时间范围筛选startDate/endDate 参数) `[AC-AISVC-91, AC-AISVC-92]`
#### 16.3 完整流程测试台
- [ ] T16.9 增强 `Orchestrator.generate_with_monitoring()`支持可选的详细日志记录enable_detailed_log 参数) `[AC-AISVC-93]`
- [ ] T16.10 实现 12 步流程埋点:在 Orchestrator 的每一步中记录输入/输出/耗时/错误(仅在测试模式下) `[AC-AISVC-93, AC-AISVC-94, AC-AISVC-95]`
- [ ] T16.11 实现 `FlowTestService.test_flow_execution()`:调用增强版 Orchestrator 并保存测试记录 `[AC-AISVC-93]`
- [ ] T16.12 实现 `POST /admin/test/flow-execution` API支持完整流程测试返回 12 步执行详情 `[AC-AISVC-93, AC-AISVC-94, AC-AISVC-95]`
#### 16.4 意图规则测试与监控
- [ ] T16.13 实现 `IntentRuleTester.test_rule()`:测试单个规则并检测优先级冲突 `[AC-AISVC-96]`
- [ ] T16.14 实现 `POST /admin/intent-rules/{ruleId}/test` API返回测试结果和冲突检测 `[AC-AISVC-96]`
- [ ] T16.15 实现 `IntentMonitor.get_rule_stats()`:聚合规则命中统计(命中次数、命中率、平均响应时间) `[AC-AISVC-97]`
- [ ] T16.16 实现 `GET /admin/monitoring/intent-rules` API返回规则统计列表支持时间范围筛选 `[AC-AISVC-97]`
- [ ] T16.17 实现 `GET /admin/monitoring/intent-rules/{ruleId}/hits` API返回规则命中记录详情支持分页 `[AC-AISVC-98]`
#### 16.5 Prompt 模板测试与监控
- [ ] T16.18 实现 `PromptTemplateMonitor.preview_template()`:渲染模板并计算 Token 数量(使用 tiktoken `[AC-AISVC-99]`
- [ ] T16.19 实现 `POST /admin/prompt-templates/{tplId}/preview` API返回渲染结果和 Token 统计 `[AC-AISVC-99]`
- [ ] T16.20 实现 `PromptMonitor.get_template_stats()`:聚合模板使用统计(使用次数、平均 Token 消耗) `[AC-AISVC-100]`
- [ ] T16.21 实现 `GET /admin/monitoring/prompt-templates` API返回模板统计列表支持按场景筛选 `[AC-AISVC-100]`
#### 16.6 话术流程测试与监控
- [ ] T16.22 实现 `ScriptFlowTester.simulate_flow()`:模拟流程执行并分析覆盖率 `[AC-AISVC-101]`
- [ ] T16.23 实现 `POST /admin/script-flows/{flowId}/simulate` API创建模拟会话并返回首步话术 `[AC-AISVC-101]`
- [ ] T16.24 实现 `POST /admin/script-flows/{flowId}/simulate/{simulationId}/next` API推进模拟流程 `[AC-AISVC-102]`
- [ ] T16.25 实现 `FlowMonitor.get_flow_stats()`:聚合流程激活统计(激活次数、完成率、平均完成时长) `[AC-AISVC-103]`
- [ ] T16.26 实现 `GET /admin/monitoring/script-flows` API返回流程统计列表 `[AC-AISVC-103]`
- [ ] T16.27 实现 `GET /admin/monitoring/script-flows/{flowId}/executions` API返回流程执行记录详情支持分页 `[AC-AISVC-104]`
#### 16.7 输出护栏测试与监控
- [ ] T16.28 实现 `GuardrailTester.test_guardrail()`:测试禁词检测和过滤策略 `[AC-AISVC-105]`
- [ ] T16.29 实现 `POST /admin/guardrails/test` API返回护栏检测结果 `[AC-AISVC-105]`
- [ ] T16.30 实现 `GuardrailMonitor.get_guardrail_stats()`:聚合护栏拦截统计(拦截次数、替换次数、掩码次数) `[AC-AISVC-106]`
- [ ] T16.31 实现 `GET /admin/monitoring/guardrails` API返回护栏统计列表支持按类别筛选 `[AC-AISVC-106]`
- [ ] T16.32 实现 `GET /admin/monitoring/guardrails/{wordId}/blocks` API返回禁词拦截记录详情支持分页 `[AC-AISVC-107]`
#### 16.8 对话追踪与导出
- [ ] T16.33 实现 `ConversationTracker.get_conversations()`:查询对话记录列表,支持多条件筛选和分页 `[AC-AISVC-108]`
- [ ] T16.34 实现 `GET /admin/monitoring/conversations` API返回对话记录列表 `[AC-AISVC-108]`
- [ ] T16.35 实现 `ConversationTracker.get_conversation_detail()`:查询对话的完整 12 步执行链路 `[AC-AISVC-109]`
- [ ] T16.36 实现 `GET /admin/monitoring/conversations/{conversationId}` API返回对话执行链路详情 `[AC-AISVC-109]`
- [ ] T16.37 实现 `ConversationExporter.export_conversations()`异步导出对话数据CSV/JSON 格式) `[AC-AISVC-110]`
- [ ] T16.38 实现 `POST /admin/monitoring/conversations/export` API创建导出任务并返回任务 ID `[AC-AISVC-110]`
- [ ] T16.39 实现 `GET /admin/monitoring/conversations/export/{exportId}` API查询导出任务状态和下载链接 `[AC-AISVC-110]`
#### 16.9 性能优化与测试
- [ ] T16.40 实现监控数据异步更新:对话结束时异步更新统计计数器,不阻塞响应 `[AC-AISVC-91, AC-AISVC-92]`
- [ ] T16.41 实现测试日志自动清理:定时任务清理 7 天前的 `FlowTestRecord` 数据 `[AC-AISVC-93]`
- [ ] T16.42 实现导出文件自动清理:定时任务清理 24 小时过期的导出文件 `[AC-AISVC-110]`
- [ ] T16.43 编写监控服务单元测试:覆盖统计聚合、缓存策略、数据导出等核心逻辑 `[AC-AISVC-91~AC-AISVC-110]`
- [ ] T16.44 编写测试 API 集成测试:验证完整流程测试、规则测试、模板预览等功能 `[AC-AISVC-93~AC-AISVC-105]`
- [ ] T16.45 性能测试:验证监控埋点对生产环境性能影响 <5%Dashboard 统计查询 <500ms `[AC-AISVC-91, AC-AISVC-92]`
---
## Phase 16 任务进度追踪
| 任务 | 描述 | 状态 |
|------|------|------|
| T16.1 | 监控数据实体扩展 | ⏳ 待处理 |
| T16.2 | Redis 统计缓存层 | ⏳ 待处理 |
| T16.3 | FlowTestRecord 实体 | ⏳ 待处理 |
| T16.4 | ExportTask 实体 | ⏳ 待处理 |
| T16.5 | 监控数据库索引 | ⏳ 待处理 |
| T16.6 | Dashboard 统计聚合 | ⏳ 待处理 |
| T16.7 | 统计缓存策略 | ⏳ 待处理 |
| T16.8 | Dashboard API 增强 | ⏳ 待处理 |
| T16.9 | Orchestrator 监控增强 | ⏳ 待处理 |
| T16.10 | 12 步流程埋点 | ⏳ 待处理 |
| T16.11 | FlowTestService | ⏳ 待处理 |
| T16.12 | 完整流程测试 API | ⏳ 待处理 |
| T16.13 | IntentRuleTester | ⏳ 待处理 |
| T16.14 | 意图规则测试 API | ⏳ 待处理 |
| T16.15 | IntentMonitor | ⏳ 待处理 |
| T16.16 | 意图规则监控 API | ⏳ 待处理 |
| T16.17 | 规则命中记录 API | ⏳ 待处理 |
| T16.18 | PromptTemplateMonitor | ⏳ 待处理 |
| T16.19 | 模板预览 API | ⏳ 待处理 |
| T16.20 | PromptMonitor | ⏳ 待处理 |
| T16.21 | 模板监控 API | ⏳ 待处理 |
| T16.22 | ScriptFlowTester | ⏳ 待处理 |
| T16.23 | 流程模拟 API | ⏳ 待处理 |
| T16.24 | 流程推进 API | ⏳ 待处理 |
| T16.25 | FlowMonitor | ⏳ 待处理 |
| T16.26 | 流程监控 API | ⏳ 待处理 |
| T16.27 | 流程执行记录 API | ⏳ 待处理 |
| T16.28 | GuardrailTester | ⏳ 待处理 |
| T16.29 | 护栏测试 API | ⏳ 待处理 |
| T16.30 | GuardrailMonitor | ⏳ 待处理 |
| T16.31 | 护栏监控 API | ⏳ 待处理 |
| T16.32 | 护栏拦截记录 API | ⏳ 待处理 |
| T16.33 | ConversationTracker | ⏳ 待处理 |
| T16.34 | 对话追踪列表 API | ⏳ 待处理 |
| T16.35 | 对话详情查询 | ⏳ 待处理 |
| T16.36 | 对话详情 API | ⏳ 待处理 |
| T16.37 | ConversationExporter | ⏳ 待处理 |
| T16.38 | 对话导出 API | ⏳ 待处理 |
| T16.39 | 导出状态查询 API | ⏳ 待处理 |
| T16.40 | 异步统计更新 | ⏳ 待处理 |
| T16.41 | 测试日志清理 | ⏳ 待处理 |
| T16.42 | 导出文件清理 | ⏳ 待处理 |
| T16.43 | 监控服务单元测试 | ⏳ 待处理 |
| T16.44 | 测试 API 集成测试 | ⏳ 待处理 |
| T16.45 | 性能测试 | ⏳ 待处理 |