Documentation - Chat
Overview
OpenAI offers many ways to approach using their chat models in software the main consideration is whether to use a “bulk response” or a “streaming response” in your project.
The most obvious difference is that with a “bulk response” there is a slight delay before your user will see anything from the model and then the entire response will appear all at once, whereas with “streaming response” the model will respond with each token (3-4 characters including spaces and punctuation, etc.) as they are generated. Many users prefer to see a “streaming response” because it seems both faster and more natural instead of seeing a paragraph or more suddenly appear after a pause.
Side Note: Tool Usage is “Bulk Response” Only
Tool Calls (functions) are currently only supported (by OpenAI) when using the “bulk response” method. OpenAI says Tool Calls will be added to “streaming responses” in the future and the “ChatCrafters AI Suite” will be updated as soon as they make this change.
Basic Chat Model Usage
Using a “bulk response” is the easiest way to use the OpenAI Chat models.
Basic Chat Conversation
Using a “bulk response” to create a simple chat bot using OpenAI Chat models.
Streaming Chat Conversation
Using a “streaming response” is the slightly more involved than “bulk chat”, but mostly only because the output needs more handling.
Chat Conversation with Tool Calls
Currently, OpenAI only offers the use of Tool Calls (functions) while using the “bulk response”.