As a specific category of AI, generative artificial intelligence (GenAI) generates new content that resembles what humans create. The rapid development of GenAI systems has created a huge amount of new data on the Internet, posing new challenges to current computing and communication frameworks. Currently, GenAI services rely on the traditional cloud computing framework due to the need for large computation resources. However, such services will encounter high latency because of data transmission and a high volume of requests. On the other hand, edge-cloud computing can provide adequate computation power and low latency at the same time through the collaboration between edges and the cloud. Thus, building GenAI systems at scale is attractive by leveraging the edge-cloud computing paradigm. 

There are three basic paradigms for implementing large-scale computing systems. They are 1) cloud computing, 2) multi-access edge computing (MEC), or previously mobile edge computing, and 3) edge-cloud computing. Among the three, cloud computing carries out computationally demanding projects using many online servers to serve many remote users.

The cloud has much larger computing resources than a local site. Moving compute-intensive tasks to the cloud has been an efficient way of data processing. The concept of cloud computing was introduced in the early 60s. It has rapidly progressed in the last several decades and has become a mature business service model. Examples include Amazon Web Services (AWS)4, Microsoft Azure5, Google Cloud Platform (GCP), IBM Cloud, Salesforce, etc.

Technical challenges to keep in mind 

There are technical challenges in the deployment of GenAI services. The major ones include: 

  • Growth in model sizes, 
  • Power consumption,
  • Latency, and 
  • Infrastructure reliability.

GenAI systems adopt larger models with more model parameters and computation over time to achieve better performance in various applications. The growth rate of their model sizes is an exponential function of time. Power consumption is a major concern in cloud computing. The centralized computation infrastructure consumes significant electricity in running user requests and training large models. 

For real-time GenAI applications such as VR and gaming, reducing latency is of utmost importance. The latency calculation in three different computing frameworks is illustrated. 

Cloud servers need many GPUs to handle user requests at scale. Meta has just started a supercomputing centre with 16,000 Nvidia A100 GPUs to support their GenAI services. Setting up such robust but costly infrastructures in many sites globally is unrealistic. Furthermore, such a huge single-site infrastructure is vulnerable to physical and cyberspace attacks. Distributed computing with multiple lightweight cloud servers and many more edge servers will offer a more robust AI computational infrastructure in the future.

Design considerations

Training and deployment of GenAI services should be considered separately. For the exercise of GenAI models, a larger amount of computational resources and training data are needed. Key considerations include computation offloading, personalization, privacy, and information recency. After models are trained, deploying them on user devices is desired for lower latency and power consumption. 

There are two main considerations: lightweight inference models and multimedia content. Lightweight models are essential for the former because of limited resources on edge servers and user devices. For the latter, multimedia content will become the main media for humans to acquire information, as evidenced by the popularity of videos on the Internet nowadays. Cross-domain content generation and interface at edges should be considered carefully.

Conclusion 

As one of the most famous GenAI services nowadays, ChatGPT provides a generic GenAI model at the expense of a large model size and a high running cost. It may be advantageous to trade breadth for depth of generated content to lower the service cost and enhance the quality of services. For instance, the accuracy of generated content is the top priority in some application domains such as healthcare, financial advice, etc. 

The deployment of GenAI services at scale poses a new challenge to the design of modern edge-cloud computational systems due to extremely large model sizes, heavy power consumption, and potential latency caused by a lack of computational and network resources.

Want to publish your content?

Publish an article and share your insights to the world.

Get Published Icon
ALSO EXPLORE