### Nanoscale 2D NMR experiments

The details for the experiment are shown in Fig. 7.

#### Sample and setup

A single NV center in a CVD-grown diamond with natural abundance (1.1%) of ^{13}C nuclear spins is used in experiment. External 1580 Gauss magnetic field is applied here. Thus the Larmor frequency of ^{13}C nuclear spin is *ω*_{L} = 1.69 MHz. A NV electron spin and ^{13}C nuclear-spin pair system is studied in our work.

#### Initialization

The initialization of nuclear spin is shown in Fig. 7a. Firstly, a 1.5 μs laser pulse is applied to initialize the electron spin to *m*_{s} = 0 state. Secondly, a half *π* pulse transform the electron spin to *x*–*y* plane. Then a train of periodical *π* pulses ({(frac{tau }{2}-pi -frac{tau }{2})}^{text{N}}) is applied to the NV sensor. *τ* is set as *τ* = 7/(2*ω*_{nuc}) to be resonant with the nuclear spin. *τ* = 2045 ns, *N* = 40 in our experiment. Then, end is another *π*/2 pulse with 90° in the microwave pulse. The procedure is not actually the initialization of the nuclear spin, but to correlate the nuclear spin state with the electron spin state^{31}.

*π*/2 pulse

The *π*/2 pulse on nuclear spin is realized by a periodical *π* pulses on NV sensor^{32}. The time interval is set to be *τ* = 292 ns.

#### Correlation read

The last readout protocol is similar with initialization process. The microwave control pulses are the same. But the laser pulse is added in the end to readout the change of the correlation of the nuclear spin and the NV sensor.

#### 2D NMR procedure

A two-dimensional protocol is performed in analog to the COSY spectroscopy in conventional NMR. The Hamiltonian is (H=D{S}_{z}^{2}+{gamma }_{e}{bf{B}}cdot {bf{S}}+{gamma }_{c}{bf{B}}cdot left({{bf{I}}}_{1}+{{bf{I}}}_{2}right)+{bf{S}}cdot left({{bf{A}}}_{1}cdot {{bf{I}}}_{1}+{{bf{A}}}_{2}cdot {{bf{I}}}_{{bf{2}}}right)+{{bf{I}}}_{1}cdot {bf{J}}cdot {{bf{I}}}_{2}), where **S**, *S*_{z} denotes the NV electron spin, **I**_{α} denotes the *α*th nuclear spin, **A**_{α} denotes the hyperfine coupling tensor between NV and nuclear, **J** denotes dipolar coupling interaction between two nuclear spins. Then target nuclear spins are then evolve under [initialization − free evolution *t*_{1} − *π*/2 pulse − free evolution *t*_{2} − correlation read] as shown in Fig. 5a. During the first interval of 0 to *t*_{1}, the two nuclear spins **I**_{1} and **I**_{2} interact and a phase *ϕ*_{1} = (*ω*_{L} + *a*_{1,∥}*S*_{z} + *J*_{zz}*m*_{2})*m*_{1}*t*_{1} is accumulated in the first nuclear spin, where *a*_{1,∥} is the parallel component of the hyperfine interaction of the NV sensor with **I**_{1}, and *J*_{zz} is the zz component of the coupling between nuclear spin **I**_{1} and **I**_{2}. The second free evolution comes after the *π*/2 pulse. Another phase *ϕ*_{2} = (*ω*_{L} + *a*_{1,∥}*S*_{z} + *J*_{zz}*m*_{2})*m*_{1}*t*_{2} accumulates. In the end, correlation read pulse read out the transverse component of the nuclear spin.

#### Data acquirement

Sweeping the duration time *t*_{1} and *t*_{2} from 4 μs to 0.9 ms with 18 μs step, a 50 × 50 size data is collected. However, due to large experiment time cost, only 80% data are collected. The data correspond to nuclear spin transverse component, which are collected by NV sensor through optical detected magnetic resonance technique. The correlation spectroscopy map in time domain is shown in Fig. 5c.

### Data simulation

For supervised learning, enough amount of data with ground truth should be provided. In our scenario, the pairs of partially sampled NV spectrum map and the corresponding one with a full resolution is necessary. For 2D NV NMR, data acquisition is highly time-consuming, which hinders to train deep neural networks on real experimental data. Therefore we propose to utilize simulation to generate the training data. The simulation is performed under Schrödinger equation with the NV-^{13}C system. The Hamiltonian is

$$begin{array}{l}H={S}_{z}mathop{sum }nolimits_{m = 1}^{2}({A}_{m}^{zx}{I}_{m}^{x}+{A}_{m}^{zy}{I}_{m}^{y}+{A}_{m}^{zz}{I}_{m}^{z})\ +,J(3({{bf{I}}}_{1}cdot {{bf{r}}}_{1})({{bf{I}}}_{2}cdot {{bf{r}}}_{2})-{{bf{I}}}_{1}{{bf{I}}}_{2})+{omega }_{L}({I}_{1,z}+{I}_{2,z}),end{array}$$

(1)

where *S*_{z} is the NV electron spin, *I*_{m,α} is the *m*-th nuclear spin, *A*_{m,αβ} is the hyperfine coupling between NV and nuclear, *J* is dipolar coupling interaction between two nuclear spins.

The initial state is

$$left(begin{array}{*{20}{l}}1&0\ 0&0end{array}right)otimes {rho }_{n},$$

(2)

where *ρ*_{n} is the maximum mixed state of nuclear spin system as the initial state for them. Then the NV-nuclear spins interacting system evolves under dynamical decoupling sequences: *π*/2∣_{x} − (*τ*/2 − *π* − *τ* − *π* − *τ*/2)^{N/2} − *π*/2∣_{y} − *t*_{1} − *π*/2∣_{RF} − *t*_{2} − *π*/2∣_{y} − (*τ*/2 − *π* − *τ* − *π* − *τ*/2)^{N/2} − *π*/2∣_{x}. In the end we extract the population in *m*_{s} = 0 state as results.

### Convolution layers, ReLU activation layers, and pooling layers

#### Convolution layers

Given a convolution kernel *κ* and an input matrix *f*, the output of a convolution layer is *g*_{m,n} ≡ (*f***κ*)[*m*, *n*] = ∑_{i}∑_{j}*κ*_{i,j}*f*_{m−i,n−j}, where *m* and *n* are the row and column indices of the matrix, respectively. Intuitively, CNNs attempt to learn a variety of kernels or “shape templates”. These kernels are slid over the matrix *f* to each position [*m*, *n*], and computes a weighted sum of *f* around the position [*m*, *n*]. The weighting is given by the kernel *κ*: if the values of *f* around the position [*m*, *n*] have a similar pattern with the kernel template *κ*, the convolutional response *f***κ* at the position [*m*, *n*] will be high, otherwise it will not. By the design philosophy of CNNs (local connectionism), the kernel is of small size, i.e. the convolution response *f***κ* at the position [*m*, *n*] is only related to the small region around the position [*m*, *n*] of the input matrix *f*, called receptive field (Fig. 8a).

#### ReLU activation layers

In CNNs, it is common to apply a certain threshold to the convolutional response *f***κ* via the means of an activation function and a bias term. One popular choice in many CNN architectures is the ReLU, defined as (sigma (z)=max (0,z)). Combined with a bias term *b*, the ReLU activation layer extracts only the “meaningful” convolutional responses that are above the threshold *b*, via (max (0,(f* kappa )[m,n]+b)). The filtered convolutional response (a(m,n;kappa,b)=max (0,(f* kappa )[m,n]+b)) forms an activation map *a*. In a convolution layer, multiple activation maps *a*_{1}, ⋯ , *a*_{d} corresponding to different kernels *κ*_{1}, ⋯ , *κ*_{d} and biases *b*_{1}, ⋯ , *b*_{d} constitute “channels” of the output. In other words, a convolution layer utilizes different kernels (shape templates) to codify the input image into a stack of activation maps *a* = [*a*_{i}]. The stack of activation maps form essentially a 2D image with *d* channels and, therefore, convolutions can be applied back to back. This enables to develop a pyramid of features. With consecutive convolutions, the area of receptive fields are expanded. Hence, the convolutions in later (deeper) layers are promoted to capture patterns in a broader context, while the convolutions in earlier (shallower) layers focus on smaller primitives (Fig. 8d).

#### Pooling layers

The process of constructing the pyramid of features can be made more aggressive by introducing a down-sampling layer called pooling. It partitions the input image into a set of non-overlapping rectangles and, for each of such sub-regions, outputs the maximum or the values computed by convolving with learned kernels using a step size bigger than 1. The latter is utilized in the proposed DLNet, as it can make the network to learn the pooling pattern. It should be sufficient to note that pooling is an operation for reducing the resolution of activation maps into a substantially lower resolution such that the size of the receptive fields increases more aggressively (Fig. 8b).

### Robustness of different sampling matrices

We also conducted an analysis to test the robustness of the proposed DLMC method for different sampling schemes. We randomly generated five sampled spectrum matrices from the experimental ground truth one, with a fixed sampling coverage of 10%. Each compared method was applied to each of those five spectrum matrices to reconstruct the spectrum in a full resolution. The average RMSE and its standard deviation, and the average SNR and its standard deviation, were computed and recorded. The results are shown in Fig. 9a, b. Compared to the MC-only method, it can be found that DLMC and DL-only have smaller variances of RMSE and smaller variance of SNR in the frequency domain, which indicates that DLMC is more robust than the MC-only method.

### Evaluation for the effect of artifacts

In addition to SNR and RMSE, we further introduced the structural similarity index measure (SSIM)^{26} to quantify the performance of the spectrum reconstruction. Previous study has shown that both SNR and RMSE can well assess the quality of noisy images. D. C. Peters et. al. also demonstrated that a similar metric to RMSE can be used to qualify the artifacts in a 2D image^{33}. However, neither SNR nor RMSE may be able to measure the similarity of structural contents between images. We thus proposed to additionally utilize the SSIM metric for quality assessment based on the degradation and distortion of structural information. The SSIM metric is defined as,

$${rm{SSIM}}(x,y)=frac{(2{mu }_{x}{mu }_{y}+{c}_{1})(2{sigma }_{xy}+{c}_{2})}{({mu }_{x}^{2}+{mu }_{y}^{2}+{c}_{1})({sigma }_{x}^{2}+{sigma }_{y}^{2}+{c}_{2})},$$

(3)

where *x* and *y* are two spectra data, *μ*_{x} and *μ*_{y} are their respective averages, ({sigma }_{x}^{2}) and ({sigma }_{y}^{2}) are the variances, *σ*_{xy} is their correlation coefficient, and *c*_{1} and *c*_{2} are small numbers to stabilize the division. A value of 0 indicates no structural similarity and value 1 is only reachable in the case of two identical images and therefore indicates perfect structural similarity. As shown in Fig. 10, the proposed DLMC method improved the SSIM values (Eq. 3) for different sampling coverages comparing to those of the original under-sampled spectrum maps. In the future, we plan to include the SSIM metric (Eq. 3) in the loss function to train the DLNet using the training data with simulated distortions to further reduce distortions in the reconstructed spectrum maps.

### Deep-learning algorithm on multiple nuclear spins

Calculations on multiple nuclear spin systems are carried out to shown the speed up ability of our method on general multiple nuclear spin systems. The training data is generated by simulation on two-dimensional spectra of 5 ^{13}C nuclear spins of two isotopic labeled amino acids in a single avian pancreatic polypeptide molecule adjacent to the NV center with a 1500G external field along NV axis. The positions and orientations of molecule are randomly chosen. 0.85 MHz sampling rate and 500 points scale are chosen considering the physical system and limited computing resources.

The training data is a set of calculation outputs with random spatial parameters. The algorithm is tested with another set of random spatial simulations. The coordinates of two different nuclear spin system are shown in Fig. 11a, d. The simulated full sampled spectra and reconstructed 10% sampled spectra are shown in Fig. 11b, e and 11c, f, respectively. The RMSE of the reconstructed spectra is 0.018 ± 0.002. The RMSE is much smaller than previous because the simulated data has no experimental error and the simulated spectra amplitude here is normalized.