dify基于多模态模型的发票识别

susu1083018911

1235人浏览 · 2025-05-21 17:56:14

susu1083018911 · 2025-05-21 17:56:14 发布

创建chatflow

开始

开始节点点开后我们需要添加一个文件上传输入参数。点击开始节点输入字段，点击右边的“+”

选择单个文件，输入变量名称、支持的文件类型我们这里就选择图片。其他都可以默认，输入完成后，点击保存按钮

设置节点

文档提取器

选择文档提取器和开始节点连接，去掉llm和开始节点连接

我们在文档提取器，输入变量中选中变量

llm

将文档提取器的连接线和llm大语言模型连接。然后按照以下几个步骤设置

模型选择，模型我们在模型下拉列表中选择qwen2.5vl:3b
上下文，这里设置开始节点file 属性值
SYSTEM 提示词我们输入如下内容

请提取这张照片的内容，其中内容格式‘机器编号’、‘发票代码’、‘发票号码’、‘开票日期’、‘校验码’、‘购买方名称’、‘购买方纳税人识别号’、‘购买方地址、电话’、‘开户行及账号’、‘货物或应税劳务、服务名称’、‘规格型号’、‘单位’、‘数量’、‘单价’、‘金额’、‘税率’、‘税额’、‘价税合计（大写）’、‘价税合计（小写）’、‘销售方名称’、‘销售方纳税人识别号’、‘销售方地址、电话’、‘销售方地址、电话’、‘开户行及账号’、‘备注’、‘收款人’、‘复核’、‘开票人’ 字段返回信息，返回的结果信息以json格式返回

视觉点击右边按钮开启多模态
视觉输入变量选择节点file 变量

直接回复

接下来我们将LLM模型连接到直接回复的输出节点。

这个地方设置比较简单，在回复设置一下llm text文本输出以及开始节点file 输出，这样设置后。就会将发票提取的票面信息以json格式的文本信息返回，并将上传的发票图片信息一并返回给用户

完整的dsl 如下 a.yml

app:
description: ''
icon: 🤖
icon_background: '#FFEAD5'
mode: workflow
name: 增值税发票提取小工具chatflow
use_icon_as_answer_icon: false
dependencies:
- current_identifier: null
type: marketplace
value:
marketplace_plugin_unique_identifier: langgenius/ollama:0.0.6@7d66a960a68cafdcdf5589fdf5d01a995533f956853c69c54eddcf797006fa37
kind: app
version: 0.3.0
workflow:
conversation_variables: []
environment_variables: []
features:
file_upload:
allowed_file_extensions:
- .JPG
- .JPEG
- .PNG
- .GIF
- .WEBP
- .SVG
allowed_file_types:
- image
allowed_file_upload_methods:
- local_file
- remote_url
enabled: false
fileUploadConfig:
audio_file_size_limit: 50
batch_count_limit: 5
file_size_limit: 15
image_file_size_limit: 10
video_file_size_limit: 100
workflow_file_upload_limit: 10
image:
enabled: false
number_limits: 3
transfer_methods:
- local_file
- remote_url
number_limits: 3
opening_statement: ''
retriever_resource:
enabled: true
sensitive_word_avoidance:
enabled: false
speech_to_text:
enabled: false
suggested_questions: []
suggested_questions_after_answer:
enabled: false
text_to_speech:
enabled: false
language: ''
voice: ''
graph:
edges:
- data:
sourceType: llm
targetType: answer
id: llm-answer
source: llm
sourceHandle: source
target: answer
targetHandle: target
type: custom
- data:
isInIteration: false
sourceType: start
targetType: document-extractor
id: 1729851066338-source-1729851603790-target
source: '1729851066338'
sourceHandle: source
target: '1729851603790'
targetHandle: target
type: custom
zIndex: 0
- data:
isInIteration: false
sourceType: document-extractor
targetType: llm
id: 1729851603790-source-llm-target
source: '1729851603790'
sourceHandle: source
target: llm
targetHandle: target
type: custom
zIndex: 0
nodes:
- data:
desc: ''
selected: false
title: 开始
type: start
variables:
- allowed_file_extensions: []
allowed_file_types:
- image
allowed_file_upload_methods:
- local_file
- remote_url
label: file
max_length: 48
options: []
required: true
type: file
variable: file
height: 90
id: '1729851066338'
position:
x: 0
y: 277
positionAbsolute:
x: 0
y: 277
selected: false
sourcePosition: right
targetPosition: left
type: custom
width: 244
- data:
context:
enabled: true
variable_selector:
- '1729851066338'
- file
desc: ''
memory:
query_prompt_template: ''
role_prefix:
assistant: ''
user: ''
window:
enabled: false
size: 10
model:
completion_params: {}
mode: chat
name: qwen2.5vl:3b
provider: langgenius/ollama/ollama
prompt_template:
- id: 994d57b8-32bc-45cd-b30a-4a1481553627
role: system
text: 请提取这张照片的内容，其中内容格式‘机器编号’、‘发票代码’、‘发票号码’、‘开票日期’、‘校验码’、‘购买方名称’、‘购买方纳税人识别号’、‘购买方地
址、电话’、‘开户行及账号’、‘货物或应税劳务、服务名称’、‘规格型号’、‘单位’、‘数量’、‘单价’、‘金额’、‘税率’、‘税
额’、‘价税合计（大写）’、‘价税合计（小写）’、‘销售方名称’、‘销售方纳税人识别号’、‘销售方地址、电话’、‘销售方地址、电话’、‘开户行及账号’、‘备注’、‘收款人’、‘复核’、‘开票人’
字段返回信息，返回的结果信息以json格式返回
selected: false
title: LLM
type: llm
variables: []
vision:
configs:
detail: high
variable_selector:
- '1729851066338'
- file
enabled: true
height: 90
id: llm
position:
x: 589
y: 309
positionAbsolute:
x: 589
y: 309
selected: false
sourcePosition: right
targetPosition: left
type: custom
width: 244
- data:
answer: '{{#llm.text#}}

{{#1729851066338.file#}}'
desc: ''
selected: true
title: 直接回复
type: answer
variables: []
height: 105
id: answer
position:
x: 902.9145447602609
y: 298.71745887492585
positionAbsolute:
x: 902.9145447602609
y: 298.71745887492585
selected: true
sourcePosition: right
targetPosition: left
type: custom
width: 244
- data:
desc: ''
is_array_file: true
selected: false
title: 文档提取器
type: document-extractor
variable_selector:
- sys
- files
height: 92
id: '1729851603790'
position:
x: 304
y: 285
positionAbsolute:
x: 304
y: 285
selected: false
sourcePosition: right
targetPosition: left
type: custom
width: 244
viewport:
x: 21.760065728564086
y: 107.74744333699994
zoom: 0.8705505632961243