Foundations for Scalable Quantum Machine Learning

by Sonika Johri | May 5, 2025

In this post, we will walk through this recent paper that outlines a strategy for training large quantum models for the problem of classification. In this problem, we are given a labeled dataset — for example, images of handwritten digits where each image is paired with a label indicating which digit it represents. The task is to learn a model that, given a new image as input, can accurately predict the correct label as output.

There are three major criteria any scalable quantum model must satisfy:

Universal Approximation:
The architecture of the model should be such that as the number of qubits and quantum gates increase, the function realized by the model systematically approaches the ideal function describing the relationship between the inputs and outputs.
Trainability:
We need a reliable method to start training and converge the model — ideally one does not rely on fragile, hyperparameter-sensitive optimizers.
Hardware and Simulator Utility:
Ideally, current quantum simulators and early quantum hardware should already be useful stepping stones for this path.

The paper develops techniques that directly address all of the above criteria for classification, and demonstrates it on the MNIST dataset.

1. Bit-Bit Encoding: Maximizing Expressivity with Minimal Resources

Traditionally, quantum machine learning has struggled with the “data loading problem” — how to efficiently represent classical data with quantum states. Loading real-world datasets (like MNIST images) into quantum circuits without losing key information has seemed impractical.

This work introduces bit-bit encoding, a simple yet powerful idea:

Compress classical data down to the most predictive bits via efficient classical preprocessing.
Encode both inputs and outputs directly as binary strings.

Critically, this method ensures universal approximation: any function between the input bits and output bits can be represented by a unitary operation. Unlike amplitude or angle encoding — where expressivity is limited — bit-bit encoding allows the model to learn any relationship as quantum gate depth and qubit count increase.

It flips what one might intuitively assume, showing that sparsity in data loading enables better expressivity in quantum models, in contrast to dense loading.

2. Optimizer-Free Training: Guaranteed Convergence Without Gradients or Hyperparameters

Another major obstacle has been trainability. Classical machine learning relies heavily on gradient-based optimizers like Adam or SGD, which require careful tuning of learning rates and other hyperparameters. Worse, these optimizers can get stuck at saddle points which occur frequently in high-dimensional optimization.

Here, we propose a radical shift: exact coordinate updates. Instead of guessing step sizes or tuning learning rates:

We analytically compute the exact minimum for each parameter update.
Only two or three measurements per update are needed, matching the sample complexity of gradient methods.
No hyperparameters like learning rate need to be set.
Guaranteed convergence to a local minimum, something classical methods can’t promise!

Moreover, this method naturally avoids getting stuck at saddle points — an advantage that becomes more critical as models grow in size since large models are much more susceptible to getting stuck at saddle points than they are at bad local minima.

3. Sub-Net Initialization: Overcoming Barren Plateaus

Large variational quantum circuits often suffer from barren plateaus on initialization — regions where gradients vanish and training grinds to a halt.

The solution here is sub-net initialization:

Train smaller quantum models first (with fewer qubits and parameters).
Use the trained parameters to initialize larger models incrementally.
Initialize newly added nodes connected to the trained sub-network as identity gates.

This avoids random initialization that causes barren plateaus and creates a “ladder” where each rung builds smoothly on the last. It’s like growing a deep neural network layer-by-layer — but with exact parameter transfer, something classical ML can’t easily replicate.

It also enables a powerful fail-fast strategy: If a small model gets stuck in a poor local minimum, retrain it (cheaply) before scaling up, rather than wasting time on huge models stuck in bad basins.

Putting It All Together: Training Large Quantum Models

Using these three techniques, we demonstrated scalable quantum learning on the MNIST dataset (digits 0–3 and 0–9), scaling from 4 to 16 qubits on quantum simulators. We designed an ‘entanglement net’ architecture which consists of entangling nodes between all possible pairs of qubits, as well as a funnel-like architecture that systematically concentrates the relevant information until it is measured. This architecture can encode arbitrary correlations and can approximate any function between the inputs and outputs as the depth increases thus satisfying the universal approximation property.

The paper finds that

Loss consistently decreased with the number of qubits.
Optimizer-free training successfully converged across different seeds.
Sub-net initialization outperformed random initialization in building larger models.

Most importantly, this framework gives near-term quantum computers — even ones that can’t yet beat the performance of classical systems on machine learning, but have gone beyond the latter’s simulation capabilities — a crucial role: Training smaller sub-models today prepares better initializations for bigger quantum models tomorrow. Thus, every generation of quantum hardware becomes a stepping stone toward quantum advantage.

Final Thoughts

This paper is a major step toward building practical, scalable quantum machine learning models.

It offers:

An information-theoretically motivated encoding scheme (bit-bit).
A training method (exact coordinate updates) that beats classical methods in convergence guarantees.
A strategy (sub-net initialization) to systematically scale without falling prey to barren plateaus.

Unlike many approaches that mimic classical machine learning, these methods embrace quantum-native thinking — leveraging sparsity in data encoding, strongly-entangling unitaries, a training strategy that eschews classical optimizers and instead exploits the functional form of the quantum loss, and the fact that entangling nodes can be added to the entanglement net as an identity operator in order to do sub-net initialization. In contrast to earlier studies, it demonstrates that quantum models can be extremely expressive and trainable at the same time. And importantly, it demonstrates that you do not need QRAM (quantum random access memory) or multiple data reuploadings in order to get a model that has the property of universal approximation.Coherent Computing is pioneering high-performance and user-friendly software platforms that unlock the full potential of quantum hardware.

Get in touch with us to learn more.

Author

Sonika Johri

How many qubits does a machine learning problem require? →