NMR Analysis, Processing and Prediction: December 2013

Saturday 28 December 2013

Smaller NMR files

Background

One important issue we had noticed with Mnova NMR files is that they can be quite large, particularly when a document contains several 2D spectra. At first sight, file size should not be a big concern, especially considering the large storage capabilities available today, either locally (i.e. hard disks with sizes in the order of Terabytes) or in the cloud (Dropbox, Google Drive, Skydrive, etc).

On the other hand, the tremendous advancements on both technological and methodological fronts have made possible the acquisition of enormous volumes of data. For example, IBM has estimated that 2.5 quintibillion bytes of data are being generated each day, with more than 90 per cent of which created in the last two years. Whilst it is difficult to scale this level of information into analytical data (i.e. NMR spectra), it is quite likely that they also follow a similar growth.

At Mestrelab we have devoted major efforts to the development of new technologies which would allow Mnova to reduce the size of NMR spectra while preserving their informational content. This will be elaborated in the following section.

Lossless and lossy compression

Roughly speaking, there are two two different classes of compression methods: lossless and lossy.

Lossless techniques allow the data to be compressed, then decompressed back to its original state without any loss of data. Well-known algorithms for this type of compression are Zip and Rar methods. Compression rates for lossless techniques vary but are typically around 2:1 to 3:1, e.g in medical images. In the particular case of high resolution NMR spectra, there re some relevant characteristics that diminish the performance of this type of algorithms.NMR spectra consist mostly of a noisy background and hence appear as essentially random numbers to the algorithm which makes lossless compression rather ineffective; in general, NMR spectra can be compressed by no more than 10-30% (on average) using lossless compression schemes.

Lossy techniques do not allow the exact recovery of the original data once it has been compressed, but this loss of information can be modulated in such a way that it can be virtually negligible. In the particular case of NMR, we have applied several advanced compression techniques [1, 2] which afford extraordinarily high compression rates while preserving all the spectral information. In some cases, compression rates in the order of 800:1 can be achieved, although for practical uses and in order to avoid any potential loss of information, more moderate rates are recommended.

An example

In the figure below, the DQF-COSY of Taxol (Paclitaxel) is shown at its original uncompressed format (left) and after being compressed 100 times with the new built-in compression algorithm in Mnova and decompressed back (right). Both spectra have been displayed with the same contour levels. Can you spot the differences?

Whilst we have done lots of numerical tests to make sure that at this high level of compression all the spectral information is preserved (see [1] and [2] for more details), a simple yet intuitive way to visualize whether the compression has been effective is by subtracting the uncompressed spectrum with the compressed counterpart. In this example, this is the residual spectrum:

Basically, all that it remains is noise and no structures (cross peaks) are visible on the residual.

A practical guide with Mnova 9.0

This is how compression works in Mnova NMR. First, all the compression options are available in the global Preferences of the software (command Edit / Preferences), in the NMR/Save page (see below):

At this point, there are two different compression mechanisms:

FID compression: The FID is the most important component of an NMR spectra where all the actual recorded information is stored. We don’t want to miss even a single bit of this data and hence, the FID is only compressed using a lossless algorithm. Of course, the compression ratio will be much more modest, but it is critical to preserve all this information.

FT spectrum Compression: This is where the lossy compression algorithm can be applied, in the frequency domain spectrum. Actually, it is also possible to use a lossless algorithm but in order to achieve high compression ratios, the lossy method should be selected. Whilst values of 100:1 or even higher should give good results, it would be more sensible to use more moderate values, in the range of 10:1 – 20:1.

Final notes

The fact that Mnova NMR documents keep both the original recorded FID (which can optionally be compressed using the lossless technique) as well as the processed NMR spectrum (which can optionally be compressed using the lossy technique) explains why the resulting compressed document is not as small as one could expect after having compressed the data with high compression ratios. The FID might contribute significantly to the final file size. Of course, the differences will be more appreciated in 2D NMR spectra processed with Zero Filling or Linear Prediction so that the final data matrix becomes significantly larger than the time domain vectors.

On the other hand and considering again the point that Mnova always keeps a copy of the original FID, why we don’t just save this FID plus the processing commands required to reconstruct the processed spectrum as other NMR applications do? Actually, this is a nice approach (under some circumstances) and would yield the best compression ratio achievable. Unfortunately, this does not work well for many applications and introduce some additional difficulties. Just to give a simple example: You have processed a 2D spectrum which was acquired with a NUS scheme and you have applied some additional time-consuming analysis operations (i.e. 2D-GSD based peak picking). In this particular case, opening this single spectrum would take several seconds (if not minutes). Having the ability to access directly to the processed spectrum without the need to reprocess it may be very handy.

References:

[1] Carlos Cobas, Pablo G. Tahoces, Manuel Martin-Pastor, Mónica Penedo, F. Javier Sardina (2004), Wavelet-based ultra-high compression of multidimensional NMR data sets, J. Magn. Reson. 168: Pages 288–295.
DOI: http://dx.doi.org/10.1016/j.jmr.2004.03.016

[2] C. Cobas, P. G. Tahoces, I. Iglesias Fernández (2008), Compression of high resolution 1D and 2D NMR data sets using JPEG2000, Chemometrics and Intelligent Laboratory Systems, 91, 141-150
DOI:: http://dx.doi.org/10.1016/j.chemolab.2007.10.009

Thursday 26 December 2013

NMR Baseline Correction - New method in Mnova 9

One of the most ubiquitous issues present in FT-NMR spectra is the existence of baseline artifacts which might adversely affect the identification and quantification of NMR resonances. Whilst modern NMR instruments are equipped with powerful digital filtering employing also oversampling techniques that produce high quality baselines, it is usually the case that some minor baseline corrections might be needed in order to get optimal results. Also, it should not be forgotten that there are thousands of old NMR instruments lacking those latest instrumental advances where the necessity of a post-processing baseline correction might be critical.

Many baseline correction algorithms have been published since the very early era of FT-NMR, ranging from manual to fully automatic methods. Some of them have been implemented first in MestReC and then in Mnova. Whilst the automatic methods give quite satisfactory results in most of the cases, there are spectra in which a manual procedure could be more convenient.

Former versions of Mnova included the so-called ‘Multipoint Baseline Correction’ in which the User had to identify the points corresponding to baseline regions (also known as control points) which are then used by the software to build a baseline model using different interpolation algorithms (linear segments, polynomials, splines, etc).

Unfortunately, this manual method was not as robust as we initially thought and the process of selecting the control points was fully manual.

We thought that it would be very useful to implement a quick button to automatically detect these control points so that the User would only need to review them and if need be, edit or add a few more in order to get the optimal baseline.

This is exactly what is available now in version 9 of Mnova NMR: This new button runs a novel algorithm that analyzes all the points in a spectrum which is further split in different spectral windows. As a result of this process, a number of control points are automatically added to the spectrum.

Once all the control points are available, this module offers several possibilities to create the final baseline model: Whittaker, linear segments, smoothed linear segments, polynomials and splines. Of these, we recommend the cubic splines, they usually give very good results provided there are a sufficient number of control points well spread across the spectral width.

Automating the new algorithm

After having implemented this algorithm, we found that it would make sense to fully automate it and add it to our set of automatic baseline correction algorithms, both for 1D and 2D. It works as simple as this: First the algorithm detects automatically all the control points using the same method that has just been mentioned. Next, the baseline distortion is modeled using splines that go through all those control points.

This new algorithm is available from the baseline correction command:

These are just some examples:

Monday 23 December 2013

Faster NMR Data Processing with Mnova 9

For nearly a decade, computer CPU chip makers have gradually adopted the use of multiple cores to increase performance. For instance, the computer from which I’m writing this entry has 4 cores. Roughly speaking, this makes it possible to run different tasks in each core so ideally, depending on the specific application or algorithm; it would be possible to make some operations faster proportionally to the number of available cores.

However, Mnova NMR has not exploited this technological advantage until now so the number of cores in your computer would not make any difference. It is also true that most of the algorithms in Mnova have been highly optimized and, typically, its computational performance is usually more than adequate to provide a sufficiently smooth experience. Nevertheless, it is not sensible to let this technological opportunity pass and so, in the past few months we have been parallelizing a number of routines in Mnova in order to take full advantage of these multi-core CPUs.

This is just a starting point and ultimately, we will parallelize ALL algorithms in Mnova but, for the moment, we have just selected a few of the most computationally expensive algorithms, namely:

2D Linear Prediction

Mnova NMR includes two procedures for Linear Prediction, the so-called Toeplitz and the Zhu-Bax algorithms. Whilst the former is already extremely fast, it is mathematically less robust than the Zhu-Bax, which, in our experience gives much better results, especially in non-phase sensitive (i.e. magnitude-like) 2D spectra. However, because Zhu-Bax was quite slow, Mnova NMR had, as a default method, the Toeplitz one.

Now that we have parallelized the Zhu-Bax method, this has become the new default forward LP algorithm. In our tests, this algorithm performs nearly as fast as the (non-parallelized) Toeplitz counterpart, but with the additional advantage of its mathematical robustness.

Non Uniform Sampling (NUS)

This algorithm was initially developed in a single thread mode (in Beta versions of the software) but Mnova 9 comes with a highly optimized parallelized version.

Processing of multiple (stacked) spectra

Stacked or arrayed spectra are perfect candidates for parallelization as it is possible to process each spectrum in different cores. Whilst this advantage might be negligible for basic processing, parallelization really makes a difference when all these spectra need to be analyzed using, for example, Global Spectral Deconvolution (GSD).

2D contour plots

Calculation of 2D contour lines have also been parallelized resulting in a faster display of 2D spectra.

More to come

Again, our ultimate goal is to optimize / parallelize every single processing and analysis algorithms in Mnova. Nevertheless, I believe that these enhancements are already worth the upgrade to this new version of Mnova.

Sunday 22 December 2013

Mnova goes NUS

This is one example of a NUS spectrum (HMQC) acquired by Dr. Manuel Martín-Pastor at the University of Santiago de Compostela and processed with Mnova 9.0.

Friday 20 December 2013

Non Uniform Sampling (NUS) NMR Processing

Background

In the last few years, Non-Uniform Sampling (NUS) has emerged as a very powerful tool to significantly speed up the acquisition of multidimensional NMR experiments due to the fact that only a subset of the usual linearly sampled data in the Nyquist grid is measured.

Unfortunately, this fast acquisition modality introduces a new challenge as the normal Fourier Transform will fail and consequently, special processing techniques are required.

A number of sophisticated methods have been proposed for reconstructing sparsely sampled 2D and higher dimensionality NMR data, including Maximum Entropy, CLEAN, multidimensional decomposition method (MDD), Forward Maximum entropy (FM) and its fast version (FFM), SIFT and IST [1]. Most of these procedures are computationally very expensive and usually require the adjustment of some parameters.

NUS processing and Mnova 9.0: M.I.S.T

It has been the objective of Mestrelab to implement within Mnova 9.0 a new 2D NUS processing module that fulfills the following criteria:

It must be computationally very fast whilst reconstructing the data reliably.
It should work fully automatically without user intervention. A minimum set of adjustable parameters might be used for special cases
It should be compatible with any 2D acquisition protocol and with NMR instrument.
All these requirements have been met with the development of M.I.S.T, a Modified Iterative Soft Thresholding algorithm

Proof of Concept: 1D NUS Processing:

Initial development of the MIST algorithm was done using synthetic, noise-free 1D-FIDs in which a number of points have been randomly set to zero using a Poisson gap sampling method. After having optimized the algorithm under these conditions, the same procedure was carried out using experimental 1D spectra.

Figure 1 shows the results obtained with the 1H NMR spectrum of Ondansetron in which 75% of samples have been set to zero using a random Poisson gap sampling method. Regular FFT of this spectrum shows a spectrum heavily corrupted with noise. Finally, reconstruction of the FID using the MIST algorithm shows a spectrum that resembles the ideal FT spectrum very closely.

Figure 1: (a) Standard, regularly sampled 1H NMR spectrum of Ondansetron. (b) FFT spectrum of the same experimental FID where 75% of the original data points have been set to zero using a random Poisson gap sampling method. (c) Result of reconstructing previously ‘corrupted’ FID using the MIST algorithm

Next step in our work consisted in extending the 1D MIST algorithm to operate with 2D spectra.

MIST in action: 2D NUS Processing:

The performance of the algorithm is demonstrated with the HSQC spectrum shown in Figure 2. On the left, the uniformly sampled spectrum acquired with 96 complex increments in the indirect t1 dimension is shown. On the right, the NUS spectrum acquired with 48 complex increments randomly sampled (50% NUS).

Figure 2: (a) Linearly sampled HSQC spectrum (96 complex increments) (b) MIST reconstruction of a NUS spectrum acquired with 48 complex increments randomly sampled. The two figures are shown using the same contour levels

Processing of the NUS spectrum was done fully automatically (just drag & drop into Mnova) and total processing time was less than 4 seconds (in my 4 core computer).

Supported NMR experiments

Presently, NUS algorithm implemented in Mnova 9.0 supports HSQC and HMBC experiments, both magnitude and phase sensitive. We have also found good results with COSY spectra. We have also tried it successfully with some NOESY/ROESY experiments, although we have to warn that with a few of them the performance has not been so good.

CONCLUSIONS

Mnova 9.0 supports now NUS 2D spectra acquired in Bruker or Agilent instruments (more vendors will be included shortly).

Processing of these spectra is done via the new MIST algorithm. It has been shown that this algorithm is very fast, robust and can be executed in a fully unattended way. Furthermore, our method is not sensitive to phase distortions.

Note: Mnova 9 will be available in Mestrelab Web site (www.mestrelab.com) very soon. Meantime, this version can be downloaded it directly from HERE (Windows only for now). This link will only work for a few days though.

Acknowledgments:

I thank Frank Delaglio, David Russell, Paul J Bowyer and Manolo Martin for kindly providing 2D NUS spectra

[1] S. G. Hyberts et al., “Application of iterative Soft thresholding for fast reconstruction of NMR data non-uniformly sampled with multidimensional Poisson Gap scheduling”, J. Biomol. NMR 52, 315–327 (2012) and references therein

Mnova 9.0

I’m very happy to announce that after a long period of very intensive work, version 9.0 of Mnova is finally ready! From our point of view, this version is probably the most ambitious release we have attempted since Mnova was created. Aside from many improvements and bug fixes, this new version comes with great new features, including support for Non Uniform Sampling (NUS), a powerful PCA module, Reference Deconvolution, Absolute Referencing and many, many more.

We are currently updating our Web site from where this new version can be downloaded and more details about all these new exciting features of Mnova will be explained in more detail. Also, in the next few days, I will be putting together some blog entries to describe some of the new functionality.

If you don’t want to wait until the new version is available from our Web site, you can download it from HERE (Windows only for now).

I really hope that this new version will meet your expectations and we are looking forward to your feedback!