One of the reasons I love the AI community is the openness to share research and build on top of each other’s work. It is great to see the community continue to publish state of the art results on Arxiv and GitHub. This helps the community grow and in theory allows anyone to reproduce the results.
There is also an awesome trend of releasing open source models that can run in as few lines of code as possible. You can generate an image from one line of text. You can run a complex pose estimation model in 10 lines of code. This seems like a great starting point for many products to integrate AI, but I argue that it is a dangerous road to go down. We are optimizing for inference when instead we should be optimizing for learning.
Inference is the process of taking an input, analyzing it, and making a prediction. Learning is the process of improving the inference when it fails.
Inference is taking an image and predicting a bounding box around the person. Learning is taking incorrect prediction, and having the model correct its errors.
Sure, you can try to prompt engineer your way out of a problem, but if the base model doesn't know the facts or domain, prompting can't be the full solution. Being able to iterate on the core knowledge a model contains means you must fine-tune the actual weights.
Prompting will only get you so far when your model falls flat on its face.
If your product falls flat on its face too many times, guess what? You go out of business. Even if you manage to stay in business, mistakes from a model can be costly in many other ways if not corrected.
only 13% of data science projects… actually make it into production.[1]
for too many companies, A.I. doesn’t deliver the financial returns company officials expect. That’s borne out in a slew of recent surveys, where business leaders have put the failure rate of A.I. projects at between 83% and 92%. “As an industry, we’re worse than gambling in terms of producing financial returns,” Arijit Sengupta says.[2]
If this is the case, why are so many “machine learning” models released with only the ability to predict?
One common practice is releasing models behind HTTP interfaces. Releasing a model behind an HTTP API allows other organizations to interact with it. This is a good way to provide a paywall for a service. It can help integrations between codebases. It can also be too computationally expensive for a model to run locally, so can be advantageous for one company to focus on the challenges of hosting the model.
Another reason is that sometimes you need to export an inference graph for performance reasons. Your model may need to run on mobile. All the operations required for training are often not required for inference. You can also distill model weights down to a more performant subset.
At the end of the day, running a model in your actual product often does not need the training loop, so why spend too much time on it?
The reason is once your model is out in the model, you are inevitably going to want to iterate on it. After all, this is the power of machine learning. In theory you no longer have to write new code, just add in more data, and the system will improve.
Once you have a model artifact, it is tempting to get the MVP of your product up and running. You can (and should) get a solid baseline on accuracy, throughput, and latency. These are all key to your products success. If the model is too slow, not accurate enough, or can’t handle the load, there is no use in going further. The first version of your product could even run off of rules or human validation. The key is to get something working end to end, you can always improve it later.
Then comes the fateful day you start to test your ML product outside your own four walls. Deploying the model into the wild. Your computer vision model can track your face around the office. You even tested your pose estimation model on every single one of your coworkers. Your model saw tens of thousands of images during training. It should be robust to any environment right? Wrong. Deep learning models are huge black boxes of millions of parameters. It is quite hard to know the long tail of data they will fail on. It is even harder to know how users with interact with the system in their natural environment.
📸 Computer vision applications may fail because of:
- Different camera parameters
- Diverse lighting
- Weather patterns
- Variety in human body shape, size, gender, race, pose
✍️ Natural language processing will run into
- Out of vocabulary words
- Phrases it’s never seen before
- Users playing the “Turing Test”
🔊 Audio inputs can throw a model off based on
- Background noise
- Different user accents
- Slang
- Unexpected pauses
- Audio quality differences
Some of these variables are easy to control for, but this is by no means an exhaustive list. The real world is dynamic and always changing. Customers will start to lose faith in your product every failure case they run into.
You need a clear path to tracking failure cases, and improving your the data as well as the model. If you only run inference, it could cost your company days, weeks, even months to catch back up to where you need to be. Grabbing an off the shelf model can sound amazing at first glance, but can be a glimmer of false hope in the long run.
To enable a model to learn, you must have a robust training data pipeline established. Iterative data practices are essential for taking your weekend hack to production. We need our models to have the ability learn from their mistakes. This is a key component many companies overlook when shipping v1 of their product. This is why it’s called machine learning and not machine inference.
For computational reasons, it might be unlikely that you are re-training foundation models from scratch. A more likely scenario is fine tuning on top of a foundation model for your specific use case. Tracking and managing the data that goes into the fine tuning will differentiate your product from the others.
At Oxen.ai we are putting the learning back in machine learning. We build tools to help manage the data that goes into your model. Start with the training data. Don’t take the shortcut grabbing a model that can only do inference. You will avoid countless hours of headache in the future when you “state of the art” model starts to fail in the wild.
To learn more about Oxen and how we help manage the data that goes into your model contact us at https://oxen.ai and follow us http://twitter.com/oxendrove.
At OxenAI we want to see what you are building! Reach out at hello@oxen.ai, follow us on Twitter @oxendrove, dive deeper into the documentation, or Sign up for Oxen today. http://oxen.ai/register.
Sources:
[1] “Why do 87% of data science projects never make it into production?”, Venture Beat. July 19, 2019. https://venturebeat.com/ai/why-do-87-of-data-science-projects-never-make-it-into-production/
[2] “Want your company’s A.I. project to succeed? Don’t hand it to the data scientists, says this CEO”. Fortune. Jeremy Kahn. July 26, 2022. https://fortune.com/2022/07/26/a-i-success-business-sense-aible-sengupta/
[3] “AI project failure rates near 50%, but it doesn’t have to be that way, say experts”. WSJPro. JOhn McCOrmick. Aug 7, 2020. https://www.wsj.com/articles/ai-project-failure-rates-near-50-but-it-doesnt-have-to-be-that-way-say-experts-11596810601