How many qubits does a machine learning problem require?

by Sydney Leither, Michael Kubal, Sonika Johri | Aug 31, 2025

What if you could examine a dataset and quickly calculate how many qubits you’d need to model it on a quantum computer? For quantum machine learning, that kind of clarity has been missing. And as a result, researchers and companies have been left guessing about when quantum models will truly beat classical ones.

We know of only a few cases where quantum advantage is proven, Shor’s algorithm for factoring integers being the classic example. In most other cases, especially in machine learning, the situation is fuzzier: speedups are anticipated, but it’s not clear where the boundary lies between classical and quantum relevance. This matters most in areas like biology, where the data is huge, complex, and deeply interconnected, exactly the kind of setting where quantum resources should shine.

That’s the gap we set out to close. In our latest paper, “How many qubits does a machine learning problem require?”, we introduce the first framework that can tell you how many qubits are needed for a quantum model to reach a desired accuracy on a dataset of interest.

We find that medium-sized classical machine learning problems from OpenML require about 20 logical qubits to reach 100% training and testing accuracy – something within the reach of today’s classical simulators – suggesting no quantum advantage. However, when applied to subsets of the Tahoe-100M dataset, a transcriptomic dataset with 100 million samples, the required qubits quickly exceed the practical limit for classical simulation. But they are still well within the reach of the quantum computers that are expected to be viable in the next 5 years! These results support that some of the largest, most information-rich problems in biology and beyond may be prime candidates for quantum machine learning.

As a further technical result, we find that the number of qubits required to reach 100% train and test accuracy did not necessarily increase with the number of features of the Tahoe-100M subsets. The key lies in the bit-bit encoding technique we introduced earlier this year, which efficiently compresses classical data into a fixed qubit budget while preserving the information most relevant for learning.

In conclusion, by offering a concrete way to connect datasets to quantum hardware requirements, we mark a turning point in the field of quantum machine learning. Instead of speculating about where quantum advantage may appear, researchers can now calculate it. For companies exploring AI and quantum technology, that means clearer roadmaps, smarter allocation of resources, and a sharper sense of when quantum computing will make a meaningful impact.

To learn more about this research, check out the paper https://arxiv.org/pdf/2508.20992 .

Stay tuned for a convenient online tool to calculate whether or not your own datasets are likely to benefit from quantum advantage! If you would like alpha access to our tool or want to learn more about our other offerings, please feel free to reach out to us at info@coherentcomputing.com.

← Foundations for Scalable Quantum Machine Learning