deepseek 识别图像的疑惑

V2EX = way to explore

V2EX 是一个关于分享和探索的地方

现在注册

已注册用户请登录

• 请不要在回答技术问题时复制粘贴 AI 生成的内容

DeepSeek R1 模型本身是不支持视觉（图像识别）的，DeepSeek 官方客户端的图像识别功能，虽然好像是只是 OCR 文字，但还挺好用的。但有一个问题是，官方客户端调用起来经常会失败，而第三方客户端或者是自己本地部署，又普遍不支持图像识别功能有没有好的第三方客户端能去很好的识别图像然后再去喂给 DeepSeek R1 ？

图像识别

DeepSeek

第三方客户端

9 条回复 • 2025-02-06 15:22:58 +08:00

Charon2050

1 天前

一个奇招：让另一个有视觉的模型事无巨细的描述图片内容，然后交给 R1 去推理

Charon2050

1 天前

这种自带识别的客户端肯定是没有的，估计要自己开发

reDesign

1 天前

@Charon2050 牛逼

sunnysab

1 天前

在用 ChatGLM 的免费图片描述 api ，不错。但是那个免费的对话 api 有点智商不足……

Darley

1 天前

估计还没有专门的，需要专门封装

Charon2050

1 天前

@sunnysab 特别可惜的是 GLM-4V-Flash 不支持 base64 编码图片，必须要上传到图床再发它 URL

sunnysab

1 天前

@Charon2050 可以的，你仔细看官方给的例子。

```python
async def describe_image(self, prompt: str, image: bytes | str) -> Optional[str]:
""" 图像描述 """
encoded_image = base64.b64encode(image).decode('utf-8')

response = await self.client.chat.completions.create(
model='glm-4v-flash', # TODO: 支持修改.
temperature=0.95,
top_p=0.70,
messages=[{'role': 'user', 'content': [
{'type': 'image_url', 'image_url': {'url': encoded_image}},
{'type': 'text', 'text': prompt},
]}],
)

completion_message = response.choices[0].message
response_text: str = completion_message.content
logger.debug(f'ChatGLM image description. response: {repr(response_text)}')

response_text = re.sub(r'\s\S\n', '', response_text)
return response_text
```

Charon2050

1 天前

@sunnysab #7 我测试下来是不行的哦，官网也有写 https://open.bigmodel.cn/dev/api/normal-model/glm-4v 注意同步调用 - Messages 格式 - url 那一行，「说明：GLM-4V-Flash 不支持 base64 编码」

sunnysab

12 小时 21 分钟前

@Charon2050 你试下这段代码呢？我从我的项目里复制出来的。

https://gist.github.com/sunnysab/3123fd55c2ba2a2441a11c7494800a1b

我这边可以跑，正常识别，也确实是 4v-flash ，账号也是前几天创建的普通账号。虽然文档中 flash 模型提到不能用，但我没注意到...也一直这么用着的。好神奇啊！