In a series of articles focusing on code optimisation, we discussed how optimising the performance of AI applications and machine learning models can bring about various benefits to businesses. Optimised model code reduces inefficiencies, boosts model speed, and facilitates teams to take models from conceptualisation to production in weeks, not months. These factors are inevitably tied to profitability. When developing and deploying machine learning models, another key element to factor into profitability is the computational cost of building models. In this article, we particularly discuss how code optimisation can help contain demand for memory during model training and inference, leading to savings in millions of dollars for companies.
Implementing machine learning systems necessitates different memory requirements, particularly based on the scale of the task that is being achieved. Volatile memory or primary storage, such as RAM or VRAM (VRAM refers to Video RAM used to render graphics in a GPU), stores data that needs to be fetched quickly and frequently, and can be particularly expensive. Increasingly more data is also stored and processed in the cloud to enable synchronous work. In order to execute state-of-the-art machine learning models efficiently, underlying model code needs to be effective but compact, consuming as little memory as possible.
Here we discuss four areas where optimising models for memory can help companies secure the benefits of machine learning.
Deploying at the edge: The Internet of Things (IoT) and edge devices are seeing increased demand across multiple fields. Industries such as healthcare, travel, and retail take advantage of a range of edge devices, including wearable tech, sensors, and security equipment. For instance, stores such as Amazon Fresh utilise novel IoT solutions to provide a seamless shopping experience to users. Amazon systems are powered by edge devices such as cameras, weight and pressure sensors, and infrared sensors, paired with machine learning solutions. Often, in the case of IoT, machine learning models are embedded on edge devices to generate faster analytics and insights in real-time.
However, IoT devices are compact and capacity constrained, which makes memory and energy optimisation critical to deploy machine learning models efficiently at the edge. For instance, the RAM of a series 7 (latest) Apple watch is 1GB, whereas generally, in order to run a deep learning model, at least 16 GB of RAM is recommended. Therefore, executing a machine learning model on an edge device requires a high level of memory efficiency, especially if the device were to provide useful real-time insights such as health analytics or fall detection. Optimisation significantly contributes to making machine learning models perform optimally at the edge. By eliminating inefficient elements and streamlining layout, code optimisation refines machine learning models to best suit edge devices.
Reducing computational cost: Utilising an AI-based solution to enhance a business process contains two essential steps: (1) develop, train, and test the system, and (2) deploy and maintain the solution. Memory is a key element required for these tasks. The alternative is to train on the cloud, reducing the need for memory on local machines, yet cloud computing can still require large financial investments. Training bleeding edge models can cost companies over a million dollars; small-scale businesses with low budgets may even have to forego certain functionalities of AI solutions to make projects financially feasible.
Code optimisation is one way to reduce a model’s memory consumption, minimising the need to invest in additional resources, and thereby cutting costs. In particular, machine learning models used for computationally heavy tasks such as image recognition demand the usage of GPUs, which are further limited in volatile memory. Optimising machine learning model code used for such tasks will save businesses significant computational costs.
Boosting performance: Maintainability, reliability, clarity, and consistency are characteristic of high-quality code. Compromising code quality not only reduces the performance of a programme but can also lead to memory leakage. For instance, low-quality code may contain redundant variables, which, especially during iterative processes, can lead to significant inefficiencies in volatile memory usage.
A report by Consortium for Information and Software Quality (CISQ) estimates the total cost of poor software quality in the US in 2020 as a staggering $2.08 trillion1, highlighting the value of code quality in developing software. Code optimisation reduces redundancies in code bases, thereby curtailing memory leaks and boosting performance.
Minimising environmental impact: The rapid uptake of AI and machine learning we see now is predominantly due to their ability to generate faster insights with greater precision. One unfortunate downside of AI-based analytics is its environmental cost. In order to improve the predictive capacity of a machine learning model, a business may acquire hardware with better performance, but this can only come at a higher level of energy consumption, and often with a more significant carbon footprint.
Digital waste is another driver of the negative environmental impact of AI. One instance is when data and algorithms developed as a part of an AI solution lie idle on cloud servers once their purpose has been served. AI debris still costs the environment. The cloud has a greater carbon footprint than the airline industry, and a single data centre can consume the equivalent electricity of 50,000 homes.
By reducing memory usage, code optimisation can reduce the energy demands of training, testing, deploying, and storing machine learning models, ultimately leading to greener AI.
Build memory-efficient models with TurinTech
TurinTech has over 10 years of research expertise in code optimisation, and is specifically looking at ways to optimise machine learning models to reduce memory consumption. In one of our studies, we were able to use optimisation techniques to improve memory consumption of code bases by up to 45%. Our proprietary platform evoML allows users to build, optimise, and deploy efficient machine learning models with ease. Users are able to simply upload a dataset and have evoML conduct feature engineering, model development, and model optimisation with a few clicks. Inspired by the Darwinian theory of evolution, evoML uses evolutionary algorithms, meta-learning and search-based software engineering, to automatically find and tune the most suitable version of model code for less memory and energy usage, lower latency and higher throughput. Users are also provided performance metrics, visualisations, and most importantly model code so that models can be easily customised and deployed.
With evoML, businesses are able to ensure that their models perform optimally, while also minimising computational costs and environmental impact. These approaches enable companies to harness the true potential of AI in order to stay ahead of the competition.
About the Author
Malithi Alahapperuma | TurinTech Technical Writer
Researcher, writer and teacher. Curious about the things that happen at the intersection of technology and the humanities. Enjoys reading, cooking, and exploring new cities.
1 Estimates have been made based on: (1) cost of unsuccessful IT/software projects, (2) cost of poor quality in legacy systems, (3) cost of operational software failures, (4) cost of cybersecurity and technical debt. See section 3 of the report (https://www.synopsys.com/content/dam/synopsys/sig-assets/reports/CPSQ-2020-report-final.pdf) for complete methodology.