If 2023 was the biggest year yet in AI history, 2024 looks set to be even bigger. According to McKinsey, Gen AI is the #1 priority for CEOs in 2024, whilst Forbes has predicted 2024 ‘will trigger a metamorphosis across the global economic landscape’. It is our job at TitanML to ensure our clients can move fast with confidence when building AI applications, allowing them to maintain their competitive advantage. It is why we spend so much time researching and planning for what is coming next. So without further ado, I am summarizing our leadership team’s predictions for what 2024 is going to hold for enterprise AI:
GPT-4-quality, but with an open source model:
Historically, proprietary large language models, including GPT-4, have been considered the gold standard, whilst open source models were seen as significantly cheaper but ultimately, poor-quality substitutes. Yet, in 2023, there were significant improvements in the quality of open source models. In December, Mistral AI’s Mixtral demonstrated significantly better performance than GPT-3.5. As major players, including Middle Eastern nations and Meta, continue to invest heavily within this space, we expect Llama 3 (or equivalent) to be as good, if not better than GPT-4. This point at which open source models will be as good as proprietary ones, will mark a significant turning point for the industry. It will mean the use of API-based models over self-hosted models will no longer be a decision taken solely on the basis of model quality, and instead move to a more complex one, which takes privacy, control, ease of use and cost into account. We therefore expect a significant number of enterprises to move from deploying API-based models, to self-hosted ones. Many of our clients have already planned for this eventuality and are now using the Titan Takeoff Inference Server to make the process of self-hosting models as pain free as possible.
Mixture of expert (MoE) models dominating open source leaderboards:
As referenced above, Mistral AI’s Mixtral release in December has been significant. It is a mixture of expert model, which Mistral says makes it more powerful and efficient than its predecessor, Mistral 7B. GPT-4 is also assumed to be a mixture of expert model. Post-Mixtral’s release, we anticipate an increasing number of these models to rank highly on Hugging Face’s model leaderboard.
Multi modal applications increasing in popularity
Increasingly, our clients are building multi modal applications - combining models such as Whisper (for transcription) and LLaVA (for vision to text reasoning). We expect this trend to continue into 2024.
As a result of this increasing popularity, the Titan Takeoff Inference Server will support multi-modal models from Q1 2024 so our clients can work within a single serving and deployment framework for all their GenAI applications.
Enterprise applications will be dominated by non-chat applications:
In 2023, most of these chatbots have struggled to reach production. Partly because chatbots are very hard to do well (Chevrolet learned this the hard way) and also partly because most enterprise applications do not require chat-style interaction. We are seeing much greater utility with document processing and summarization use cases, and we expect many enterprise teams will be focusing on these less flashy but high value methods throughout 2024.
Decreased focus on latency and increased focus on throughput and scalability:
As a result of an increased focus on non-chat applications, metrics such as throughput (volume of data passing through a network in a given time) will become more important than latency (time delay when sending data). This is because many non-chat applications are batched and deployed at scale, meaning the speed at which the first token is returned does not matter; instead, what matters is how quickly millions of tokens are returned.
Our clients have already begun to switch their use of the Titan Takeoff Inference Server from optimizing for latency to optimize for throughput as they begin to move from demos to production scale. We built the Titan Takeoff Inference Server for these enterprise-scale applications, meaning deploying to millions of users is no harder than deploying to just one or two.
More inference is done on non-NVIDIA accelerators like AMD, Inferentia, and Intel:
We are seeing an increasingly competitive AI accelerator landscape, with AMD, AWS, and Intel all offering competitively priced options for AI inference. NVIDIA’s grip on the AI industry has been largely reliant on CUDA, however with expected AMD and Intel support for Triton, this gap is narrowing. And with alternative accelerators being much more widely available and cost effective we expect to see many more applications being inference on non-NVIDIA accelerators.
As a result, from Q1 2024, the Titan Takeoff Inference Server will support AMD and Intel GPUs equally as well as we support NVIDIA, so clients aren’t limited on where they choose to deploy their applications.
Enterprises build applications with interoperability and portability in mind:
Because 2023 was all about delivering POCs and quick value with AI, less attention was given to security and stability, meaning many applications have been built with single points of failure (for example, using a single model source such as OpenAI). In 2024, as more and more applications hit production scale, we will see clients adopting engineering ‘best practices’ such as interoperability and portability of models.
Many of our clients use the Titan Takeoff Inference Server as it offers a single, locally-hosted API through which to deploy all of their open source language models. This makes switching between models simple. In 2024, we will make our API compatible with OpenAI’s API so our clients are able to seamlessly switch between API-based models and locally-hosted models deployed via the Titan Takeoff Inference Server.
Are you ready for AI in 2024? Please reach out if you would like to discuss your enterprise AI strategy and learn about best practices.