How does Answerly consume tokens?

Fatos · December 8, 2023, 1:04pm

Hello here! Thank you for taking your time to learn more about how Answerly uses tokens.

My goal is to give you a clear understanding of how we consume tokens, so you can better manage your usage.

What uses the most tokens?

Firstly, it’s important to point out that the biggest consumer of tokens is maintaining the conversation history with the agent.

As your conversation grows larger, every message you send to an agent will use more and more tokens, until it reaches the maximum tokens allowed by your selected model.

Token categories

There are three categories of token usage:

Conversational
Functional
Embeddings

What are Conversational tokens?

This category encompasses the tokens that are consumed when you chat with an Agent.

For each message you send to an Agent, approximately 6000 tokens are allocated for datasets, personality, your business info, and other features such as language and behavior.

However, don’t worry – ‘allocated’ does not mean ‘spent’. The actual amount you spend is entirely dependent on the attributes of your specific conversation.

For instance, the size of your business info (max 1024 tokens), your personality (max 1024 tokens), and the diversity of your dataset.

What are Functional tokens?

Functional tokens facilitate the creation of Answerly features like Human Takeover and Quality Control. They always use a low-cost model (like gpt3.5), regardless of the model you choose in Language Learning Model (LLM) options.

Human takeover spends around 256 tokens per visitor query.
Quality control options like hallucinations and unrelated conversations each spend a total of 1024 tokens per request if activated.

What are Embedding tokens?

Embedding tokens are used for training your agent, and they are very cost-effective. For example, if you were to train an Agent with a standard knowledge base (comprising around 30 pages or 100,000 words) – the cost would be roughly 0.71 cents.

I hope this gives you a better understanding of how Answerly uses tokens.

A.J · December 10, 2023, 12:52am

@Fatos This is very helpful. I still am not sure how best to configure our chatbots to minimize excessive token usage though but you’ve provided some good information in your post.

Can you provide some additional tips or a list of steps you would take to minimize token usage? I’m wondering about the following:

Would it be best to keep the Identity and Business Info very short as its passed over every conversation or is it uploaded once during the training session?
For the Knowledge Hub, would it be better to have multiple short Summaries or just have everything in one Word document? Any tips to minimize tokens here?
Is it not ideal to train on a webpage as it pulls in a lot of useless data? How often is that trained data sent over (& consume tokens), just once when trained or during every conversation?
How about the Chatbot Quick Replies, does having too many of them consume a lot of tokens?
How many additional tokens do the Quality Control options take? Is it only a few more or do they double (or more) the token count?
How about Human Takeover, is it just a slight bump in token usage or a doubling?
Do Custom Prompt settings use a lot of tokens? If I use an extremely short custom prompt, would that use less tokens as the standard prompt used is pretty long or would it not really make a difference?

Any tips would be much appreciated as I’m in the process of setting up our standards for chatbot creation going forward and don’t want to accidentally set up our system to use a lot more tokens than necessary, thank you!

Santofer · December 10, 2023, 8:43am

Interesting point. Would love to know more about this aspect because it’ll allow to price our offer accordingly to our clients.

Marvin · December 11, 2023, 7:00pm

Absolutely great questions. Also would love to know.

maddie · December 19, 2023, 5:44pm

would love an answer on this one!

Simone · January 8, 2024, 11:36am

Identity and Business Info: The tokens used for Identity and Business Info are equivalent to the characters entered in those fields. It is generally recommended to keep this information short and concise.

Knowledge Hub: Summaries in the Knowledge Hub are limited to 2048 characters, When creating a long unique document the system will scan and divide it into smaller fragments. It is important to craft the document with precise information to ensure that the agent can recognize and provide accurate responses. If the document is too long, there is a possibility of incomplete information being captured. there is not better option here, if the information are well distrbuited it will works in both cases,with one document or with multiples summaries.

Training on a webpage: Training on webpage it’s effective and it can be used ofcourse, its effectiveness depends on the website’s content and structure. If the webpage has a lot of relevant information that can be utilized for training, it can be beneficial. However, if there is a significant amount of irrelevant or extraneous data on the webpage, it may be better to create your own well-structured documentation to train the model effectively.

Chatbot Quick Replies and Quality Control options: Enabling features such as Chatbot Quick Replies, Quality Control options, Human Takeover, and Custom Prompts may increase token usage, but it is not expected to double or significantly increase the token count. The additional token consumption for these features is generally minimal.

drkitesurf · March 9, 2024, 1:15pm

Is there a viable solution to

reset the conversation on the backend at a certain time so it doesn’t maintain a costly conversation history?
use a metering solution to reflect the individual conversation usage (per question) (e.g. - limiting a conversation to set number of questions?)
put the chatbot behind a time or usage metered paywall?