现在只要有额度,大家都可以调用OpenAI的多模态大模型了,例如GPT-4o和GPT-4 Turbo,我一年多前总结过一些OpenAI API的用法,发现现在稍微更新了一下。主要参考了这里:https://platform.openai.com/docs/guides/vision

其实也是比较简单的,就是本地图片需要用base 64先编码,然后再上传。我举个例子,大家应该一看就清楚(图片放在Processed文件夹里面):

from openai import OpenAI
import os
import base64

client = OpenAI(
    api_key="Your_API_Key"
)

# Function to encode the image
def encode_image(image_path):
  with open(image_path, "rb") as image_file:
    return base64.b64encode(image_file.read()).decode('utf-8')

fig_path='Processed'

for filename in os.listdir(fig_path):
    if filename.endswith('.png'):
       image_path=os.path.join(fig_path, filename)
       print(image_path)
       base64_image = encode_image(image_path)
       messages=[
        {
            "role": "user", 
             "content": [
                {"type":"text", "text":"What's in this image?"},
                {
                   "type":"image_url",
                   "image_url":{
                      "url":f"data:image/png;base64,{base64_image}"
                      }
                }
            ]
        }
        ]
       completion = client.chat.completions.create(
          model="gpt-4o",
          messages=messages
        )
       chat_response = completion
       answer = chat_response.choices[0].message.content
       print(f'ChatGPT: {answer}')

当然,大家用的时候还是要注意花费,现在感觉还是有点贵的。

Logo

分享最新的 NVIDIA AI Software 资源以及活动/会议信息,精选收录AI相关技术内容,欢迎大家加入社区并参与讨论。

更多推荐