May 9, 2023

How to get the benefits of ChatGPT without sending your commercial data to OpenAI

Lots of businesses we speak to want to harness the power of GenerativeAI, but integrating Large Language Models (like the one ChatGPT is based on) into their work flows presents at least as many problems as solutions. 

Where does your data go?

OpenAI claims they “will not use data submitted by customers via our API to train or improve our models, unless you explicitly decide to share your data with us for this purpose”. Notably absent from this statement, however, is data that is entered into the ChatGPT interface, which is believed to be retained for 30 days. 

For most businesses, this level of assurance is unlikely to be sufficient to hand over commercially valuable or proprietary data. The risk of it being shared further is too high. Where customer data is concerned, sharing this information with OpenAI may represent a breach of business’ own safeguarding obligations. For particularly sensitive data, such as recorded voice, or financial or health records, sharing this via ChatGPT would represent a serious confidentiality breach. 

A key concern for all managers we have spoken to is understanding what their teams are currently doing and how to set policies to determine what can and can’t be put into ChatGPT. 

That said, GenAI systems, including ChatGPT offer such profound productivity improvements, for everything from customer support to interrogating internal policies and historical decisions, that companies are keen to ask: how can we have a version of ChatGPT that works for us, without exposing information we shouldn’t. 

Why might you build a proprietary Large Language Model yourself? 

Whilst still requiring the efforts of a contractor with ML Engineering experience, building, or adapting in house, offers a number of key benefits:

  1. Keep your data private 

Whether the data needs to be retained for legal, trust or purely commercial reasons, a bespoke language model can give you assurance that any prompts you enter stay firmly within your business. This is crucial if building LLM powered applications to respond to clients and would give managers confidence that employees can use GenAI tools for productivity gains, without risking a data leak. 

  1. Make it learn about your business

GPT4 has impressive general intelligence, but knows nothing of the specifics of your work (unless these details are readily to be found on Wikipedia or Stackoverflow!). It is like a highly trained graduate turning up for their first day of work - every single day. This dramatically reduces the utility, compared to a language model which could understand how we do things around here. Whether this is a knowledge of the org chart to correctly direct customer queries, or appreciation that a project like this was tried before when writing a strategy presentation, local, situational knowledge is key to extracting competitive advantages with these tools. 

  1. Spend less - significantly less

For the first time in many years running software now has appreciable marginal costs. Whilst not noticeable for casual ChatGPT use, as soon as you build an application on top of the OpenAI API you will start receiving bills. Each token generated shows up as a bill at the end of the month. If you are automating a high volume service, such as an AI customer service agent (or assistant to a human agent) these costs will be significant. 

This arises because each response is generated by running inference calculations across all of GPT4’s 100 trillion parameters (or GPT3’s 175bio). Models this big are necessary for flexible, general AI assistants, but for specific tasks within a business a much smaller model can give just as good results at a fraction of the computing cost. 

Fortunately, with technical advances of the past few months, building a solution specifically for you is now within reach of many businesses. There are a range of options, depending on your needs and budget, including:

  1. Downloading a compressed LLM (e.g. GPT4All) and running it locally - fully offline (or “on-prem”) 
  2. Fine tuning an existing open source model to your own business context and data 
  3. Training a new model from scratch

We will explore the pros, cons and costs of each of these approaches in the next article in this series

Meanwhile, if you’d like to discuss how any of these topics affect your business specifically get in touch with james@paradigmjunction.com 

Related posts

Computers aren't supposed to be able to do that

18 months ago he would have been right. To me, it had stopped being remarkable they now can.

Introduction to Futures and Foresight for Emerging Technologies

No one is sure about how the capabilities of AI will develop, let alone how business, government and society will respond to them. But this doesn’t mean that you should stand still. Tools exist for helping decision makers make smart choices in the face of uncertainty about the future, where traditional forecasts are liable to falter.

Apple Vision Pro - Seeing The World Through A New Lens

The Vision Pro isn’t only an AR/VR play from Apple - it’s a first bid to equip us with the tools we will want to be using in the future world of work.

Models and Beauty Contests

In the face of a changing world, refusing to act until you can be sure of the outcome is a decision in and of itself. Sometimes wait and see is the correct decision. Other times, it is an invitation to competitors to steal a march.