This article summarizes the experience of using LLM-powered solutions in various applications highlighting successful and less successful use cases calling for selective application of the technology.

Today, large language models are applied in multiple contexts, from working with texts to imagery, coding, software development, scientific research and more. With AI being a buzzword in practically every industry, many adopt a maximalist approach to using LLM for virtually every application.

Indeed, when organizations like Goldman Sachs predict that GenAI is to raise global GDP by 7% in the next 10 years, harnessing the power of LLMs for practical applications sounds like a go-to solution for every business. Meanwhile, there is already a track record of cases where using large language models has proven ineffective or even counterproductive. To steer clear of these pitfalls, let's dive into the good and not-so-good cases of using LLMs and learn how and when to use the new technology to its fullest potential.

Good Use Cases

There is no shortage of examples where LLMs enhance creativity, help automate routine tasks, improve problem-solving or assist in creating automated chatbots. Here are just a few of the most prominent use cases as of the date of writing this article.

Productivity and Collaboration Tools

Tools like GitHub Copilot or Microsoft 365 Copilot draw on LLM foundational capability to generate the most probable token based on the sequence of preceding tokens. By expanding LLM's ability to "autocomplete" user prompts, GitHub Copilot boosts coding by suggesting blocks of code, contributing to coding efficiency and fewer errors. In a similar vein, the Microsoft 365 Copilot tool, embedded in the Microsoft 365 apps, streamlines document creation via summarizing user data, for example, long email threads, and offers action items, suggestions or derivative content, such as email replies, presentations or visualizations.

Content Generation

Best known for human-like text output, LLMs are widely used in various frameworks for text and imagery generation. Tools such as Jasper, CopyAI, or ChatGPT help marketers, writers and creators to generate copy and content based on prompt engineering, helping to make the output even more relevant and accurate. Meanwhile, multiple other tools are under development, for example, an experimental Dramatron system using the prompt chaining approach to produce theatre scripts and movie screenplays based on log lines (briefs), bringing LLM storytelling capabilities to a completely new level.

Customer Support Automation

LLMs' abilities to generate human-like responses and the accessibility of AI technology make LLM-powered chatbots an industry standard for customer support software. Companies of all sizes utilize AI for building automated assistants, providing for automated customer service responses and personalized interactions. One example is the UltimateGPT tool, which integrates ChatGPT to enable human-like responses to customer queries, now utilized by such companies as TransferGo for customer helpdesk.

Bad and Problematic Use Cases

As businesses aim to utilize AI in new applications, the existing limitations of the technology and its reliance on prompts call for the selective application of LLMs depending on the specifics of the task, the model used and other factors. On the contrary, when LLMs are used without regard to their limitations, the results can be frustrating or even lead to negative experiences. Here is the list of real-world incidents where using LLM led to unintended or unexpected results.

Generating misleading information

The plausibility of data produced by LLMs presents significant risks as the resulting output can look accurate on the face of it but be inherently incorrect. Coupled with LLMs' propensity to produce hallucinations, this tendency calls for gate-keeping LLM responses by human experts, for example, in contexts involving financial, medical or legal advice as well as educational content.

For example, in legal practice, there have already been several cases where lawyers over-relied on ChatGPT to prepare court filing which resulted in fictitious citations and non-existent judicial opinions. In one instance, a federal court fined a law firm from New York for presenting made-up court filings created by ChatGPT. Interestingly, the judge mentioned that fake judicial decisions generated by the chatbot "have some traits that are superficially consistent with actual judicial decisions" while other portions were "nonsensical gibberish."

Biased output

Since large language models are trained on data published on the Internet, LLMs are inherently subject to various biases. The studies and practical applications show that despite rejecting biases when explicitly asked, LLMs still have deep-rooted biases "under a veneer of fairness" against various demographics. For example, in a study by Shashank Gupta et al. [2024], researchers identified biases based on race, gender, religion, disability and political affiliation in all four LLMs used for the experiment, including two versions of ChatGTP-3.5, GPT-4-Turbo and Llama 2-70-b-chat. These findings call for special attention when leveraging LLMs in applications involving human interactions, for example, recruitment technology or matching software.

Sometimes less could be more

While developers aim to make LLMs even larger, expanding their context window and the number of parameters, in some situations, less could be more, with smaller purpose-built models outperforming LLMs on specific, non-generation tasks. In practice, deploying large models in real-world applications can be challenging due to their size, requirements for GPU memory and computation power. Meanwhile, studies such as Hsieh et al. [2023] demonstrate that training small task-specific models can help to achieve better performance with considerably smaller model sizes and a lesser amount of training data.

Key Takeaways

While the application of large language models is constantly expanding, researchers, scientists and innovators unanimously acknowledge that our understanding of how LLMs work internally is still very limited. The researchers from OpenAI were even more explicit by saying, "It might be difficult to detect from their (LLM) output whether they use biased heuristics or engage in deception."

That said, learning from LLMs' successful and not-so-successful use cases can help leverage the power of AI for practical application with more certainty, especially when aided by platforms like VectorShift, offering intuitive SDK interfaces and no-code functionality. To learn more about using AI and LLMs, stay tuned to our blog or get in touch with our team for a free consultation and demo.