Systems & Products

  • Products

    • CD-TREAT Diet: A liquid-only diet (without any of a patient’s normal food or drink) for 8 weeks, called Exclusive Enteral Nutrition (EEN), is the best initial treatment for cases with active Crohn’s Disease (CD). In the BINGO Group, we have shown previously that this liquid-only diet works by changing the bacteria (germs) in the gut. This liquid-only diet is however very restrictive, and patients can find it difficult to stick to it for a long time, particularly if they are adults. Therefore, there is a lot of interest and enthusiasm to develop new diets that work as well as the liquid-only diet, but do not involve stopping all solid food. Such a solid food diet is more acceptable to most patients than a liquid-only diet. Our main product from the BINGO group is a recently developed solid food diet using everyday foods (called CD-TREAT), which we hope will work as well as the liquid-only diet. We have shown that CD-TREAT changes the gut bacteria of healthy people in a similar way to the liquid-only diet. The solid diet also improved gut inflammation in animal experiments. For, further details, see the Project Page.


      Publication: [DOI:10.1053/j.gastro.2018.12.002]

  • Systems

    • NanoAmpli-Seq: Amplicon sequencing, particularly sequencing of the small subunit rRNA (SSU rRNA) gene and internal transcribed spacer regions, is widely used for profiling of microbial community structure and membership. The introduction of single-molecule sequencing platforms, such as Pacific Bioscience's (PacBio's) single-molecule real-time sequencing (SMRT) and single-molecule sensing technologies on the Oxford Nanopore Technologies (ONT) MinION platform, has opened the possibility of obtaining ultra-long reads. MinION is a hand-held DNA sequencer, and has been heralded as revolutionary in bringing real-time sequencing closer to fruition. We have developed both sample processing and sequencing library preparation workflow as well as the software tools.

      Publication: [DOI:10.1093/gigascience/giy140]
    • Full-scale Hybrid EGSB Reactor (to operate at low temperatures): We demonstrated for the first time, direct, high-rate, low-temperature anaerobic digestion (AD) of low-strength municipal wastewater at full-scale. An 88 m hybrid reactor was installed at the municipal wastewater treatment plant in Builth Wells, UK and operated for 290 days. Ambient temperatures ranged from 2 to 18 ᵒC, but remained below 15 ᵒC for > 100 days. Influent BOD fluctuated between 2 and 200 mg L-1. However, BOD removal often reached > 85%. 16S rRNA amplicon sequencing of DNA from the biomass revealed a highly adaptable core microbiome. The combination of data and analysis from this study provides strong evidence in favour of treating dilute municipal wastewater directly and at low ambient temperatures using AD technology. Given the immense quantities of municipal wastewater produced daily around the world, and the energy and land requirement for conventional treatment, a low-temperature anaerobic solution could have an enormous positive environmental impact - especially for temperate climates. Our data suggest that the biologically-mediated processes underpinning AD are possible under these conditions. Moreover, the microbial community is adaptable and resilient to the nutrient limitations and low temperatures that have been generally considered unfavourable with respect to municipal wastewater treatment. This full-scale modular demonstration provides game-changing possibilities for the treatment of municipal wastewater.

      Impact case study arising from publications (DOI:10.1016/j.biortech.2021.125786; DOI:10.1093/femsec/fiy095): NUI Galway - Transforming Sustainable Water Management: Creating Jobs and Saving Energy
    • Solar Septic Tank: The Solar Septic Tank (SST) is a novel septic tank design that uses passive heat from the sun to raise in-tank temperatures and improves solids degradation, resulting in a cleaner effluent. Treatment has been shown to exceed conventional systems, however, the underlying biology driving treatment in the system is poorly understood. With Professor William T. Sloan (through EPSRC EP/P029329/1 grant), Dr Stephanie Connelly, and colleagues at Asian Institute of Technology Thailand (technology inventors), we have used next generation sequencing (Illumina Miseq (San Diego, CA, USA), V4 region 16S DNA) to monitor the microbiology in the sludge and effluent of two mature systems, a conventional septic tank and an SST, during four months of routine operation in Bangkok, Thailand, and evaluated the ecology against a suite of operating and performance data collected during the same time period. We have shown that there are significant differences in the microbial communities between conventional septic tanks and solar septic tanks with the key species that make up the differences can be used to index enhance degradation of waste and hence will inform management strategies. The solar septic tank technology that we are working on, with colleagues in Thailand, has been installed in a School in Bangalore India as part of a Scottish Government initiative.

      Publication: [DOI: 10.3390/w11122660]
    • Smart Raman Activated Cell Sorting System: With MPhil student (Yuchen Fu) and Professor Huabing Yin (through NERC NE/P003826/1 grant), we have developed a smart Raman activated cell sorting system (thesis: Smart Raman Activated Cell Sorting System) where using microfluidics, stable isotopes, and machine learning algorithm (probabilistic LDA), we have been able to sort microbes based on function.
    • microbiomeSeq (designed with Alfred Ssekagiri): An R Package for multivarite statistical analysis of microbial community analysis in an environmental context.
    • RvLab (with collaborators at HCMR, Greece): This website makes use of R which is a statistical processing environment widely used by scientists working in many biodiversity related disciplines. It supports an integrated and optimized (in respect to computational speed-up and data manipulation) online R environment. This vLab tackles common problems faced by R users, such as severe computational power deficit. Many of the routines operating under the R environment, such as the calculation of several biodiversity indices and the running of the multivariate analyses, are often of high computational demand and cannot deliver a result when the respective datasets are in the form of large matrices.

      Publications: [DOI:10.3897/BDJ.4.e8357]
    • seqenv: Understanding the distribution of taxa and associated traits across different environments is one of the central questions in microbial ecology. High-throughput sequencing (HTS) studies are presently generating huge volumes of data to address this biogeographical topic. However, these studies are often focused on specific environment types or processes leading to the production of individual, unconnected datasets. The large amounts of legacy sequence data with associated metadata that exist can be harnessed to better place the genetic information found in these surveys into a wider environmental context. The software carry out precisely such a task by performing similarity searches of short sequences against publicly available online repository (NCBI) and, out of every hit, extracts–if it is available–the textual metadata field. After collecting all the relevant fields, a text mining algorithm is run to identify and parse words that are associated with the Environmental Ontology (EnvO) controlled vocabulary. This, in turn, enables us to determine both in which environments individual sequences or taxa have previously been observed and, by weighted summation of those results, to summarize complete samples.

      Publications: [DOI:10.7717/peerj.2690]
    • SeqEnv-Ext (designed by Ali Z. Ijaz): A taxa-centric extension to seqenv pipeline, which consisted of two parts, each providing environmental annotations under different context, with first part providing taxon abundance on a per term basis while the second part lists environmental term abundance under a per taxon context. A separately developed program that required the use of the original seqenv pipeline, this enabled two different methods of viewing environmental annotations, which significantly augments the analysis capability of the pipeline.
      Code: SEQenv-Ext, TaxaSE System

      Publication: [DOI:10.7717/peerj.3827]
    • CONCOCT: A software for binning metagenomic contigs with coverage and composition.

      Publication: [DOI:10.1038/nmeth.3103]
    • CViewer (designed with Orges Koci): The past few years have seen an increased utility of shotgun metagenomics for microbial community surveys over traditional amplicon sequencing. This is made possible by the technological advancement in methods development that enables us now to assemble short sequence reads into longer contiguous regions that can be binned together to identify species they are part of, e.g., through CONCOCT. The advantage of shotgun metagenomics is that coding regions of these contigs can further be annotated against public databases to give an assessment of the functional diversity. With integrated solutions gaining importance by complementing metagenomics with other meta’omics technologies (e.g., metabolomics), there is a need to have a single platform to consolidate all these realisations on the same sample space. Thus, we have developed CViewer, a Java-based statistical framework to integrate all levels of gene products, mRNA, protein, metabolites for microbial communities and allows exploration of their response to environmental factors through multivariate statistical analysis.
    • pyTag (designed with Orges Koci): With an unprecedented growth in the biomedical literature, keeping up to date with the new developments presents an immense challenge. Publications are often studied in isolation of the established literature, with interpretation being subjective and often introducing human bias. With ontology-driven annotation of biomedical data gaining popularity in recent years and online databases offering metatags with rich textual information, through our pyTag workflow, it is now possible to automatically text-mine ontological terms and complement the laborious task of manual management, interpretation, and analysis of the accumulated literature with downstream statistical analysis.

      Publication: [DOI:10.7287/10.7717/peerj.5047]
    • NMGS: Neutral models which assume ecological equivalence between microbial species provide null models for community assembly. In Hubbell's Unified Neutral Theory of Biodiversity (UNTB), the microbial communities are modelled as many local communities connected to a single metacommunity through differing immigration rates. The software is an efficient Bayesian fitting strategy for the multi-site UNTB.

      Publication: [DOI:10.1109/JPROC.2015.2428213]
    • GlobalView: At University of Oxford (2011-2012), I worked on a project that investigated methods to infer time-varying networks from multiple time signals (slides). The time signals pertain to Google trends, Twitter feeds, stock prices, exchange rates, commodity prices, weather statistics and transport statistics. My responsibilities were: established a template of usage requirements from relevant stakeholders in UK government and other beneficiaries; worked on housing datasets in collaboration with Institute of Public Policy Research and Rightmove; identification of unusual behaviour in single signals, in particular weak signal changes that are distributed across many variables but causing global changes in network topology; detrending the data to remove seasonal or periodic components, and irregular fluctuation; forecasting of future values of individual signals using Gaussian Processes based regression; constructed a hierarchy of increasingly sophisticated methods for network inference; it is particularly important that we establish which of the wide range of available methods is most appropriate for the data we have and that we infer plausible networks of relations. I considered simple correlations with sliding windows; a range of causal methods such as Dynamic Bayesian Networks and Granger Causality; those that are based on Markov Random Field (Eric Xing's work at Carnegie Mellon), and some that are based on State-Space Models (Zoubin Ghahramani's work at Cambridge); and identification of high trending keywords from Google Trends Datasets using residual time series obtained from the difference between the linearly interpolated values and original values and by using various outlier detection methods; and developed a cross-platform prototype software tool GlobalView in C++ for dynamic network inference [Code; Project Page]
    • Hybrid 3D Ultrasonic Imaging System: At University of Cambridge (2008-2011), I developed a Hybrid 3D Ultrasonic Imaging System. The project focused on: tracking the trajectory of a 3D ultrasound probe based on the image-based registration of acquired data and the output of an inertial position sensor; calibration of the hybrid system; correction of artifacts in the data caused by variations of the pressure from the probe during the scan; differentiation of backscatter into diffuse and directional components using the overlap data from multiple scans; and development and evaluation of software tools to enable the system to be used effectively in a Hospital environment. The system was implemented in Stradwin software (written in C++ and using wxWidgets to provide cross-platform compatibility and OpenGL for 3D visualisation). The software was then modified to run on a mobile ultrasound machine Ultrasonix Sonix RP; and to communicate with the inertial sensor through it's serial port. The software was also modified to provide a calibration protocol to compensate for the orientation in which the sensor, Intersense Inertia Cube 3, was mounted on the ultrasound probe. Additionally, the keypad controls for the ultrasound machine were fully integrated with the developed software.
      Developed System:

      Publication: [DOI:10.1259/bjr/46007369]
    • Dynamic Electrical Impedance Tomography System: During my PhD (2004-2008), I focused primarily on the development of static and dynamic algorithms for inverse problems that arise in a variety of engineering applications including but not limited to electrical impedance tomography (EIT). I developed novel tomographic imaging methods using EIT to manipulate measurement data from electrodes attached to the surface of a pipeline in order to estimate the multidimensional distribution of physical parameters inside. As compared to the traditional EIT, those scenarios were considered in which the object to be imaged is changing very rapidly during the data acquisition; necessitating a desire for reasonable spatio-temporal resolution. Rather than considering the inverse problem as a traditional tomography reconstruction problem, the problem was formulated as a state estimation problem utilising different kinematic evolution models for the physical parameters along with an observation model based on finite element analysis (FEM). In particular, the Kalman-type inverse algorithms were developed for estimation of time-varying interfacial boundary in stratified flows of immiscible liquids (targeting liquid hydrocarbon transportation in pipelines that often contain free water).
      Developed System:

      Publication: [DOI:10.1016/j.jcp.2007.12.025]
    • 2D Electrophoresis Gel Image Processor for Matlab: During my PhD (2004-2008), I worked on a joint project with the Systems Biology Group, JNU and developed a software, 2D electrophoresis Gel Image Processor for Matlab. This software is useful for the analysis of bio-markers by quantifying individual proteins, showing the separation between one or more protein "spots" on a scanned image of a 2D gel, and measuring running differences between gels. The salient features of the software include but not limited to:
      Software Demo:


      Publication: [DOI:10.1109/FBIT.2007.95]
    • SalmoSim: A device with series of bioreactors innoculated with microbes and applied with biofluids originating from the salmon's gastro-intestinal tract. This is aimed at understanding the relationship between aquaproducts, nutrient bioaccessibility and gut microbes in order to improve the productivity and sustainability of farmed salmon. Aquafeed is fermented in our system, physiological parameters are carefully regulated and bioaccessible nutrients are dialysed. Samples are taken throughout to analyse microbial life, nutrients and biochemical properties.

      Publications: [DOI:10.1101/2020.10.07.328427; DOI:10.1101/2020.10.06.327858 ]
    • AmpliPyth (designed by David Meltzer): Python-based pipeline that processes metagenomic amplicons (16S rRNA/18S rRNA and Fungal ITS) and generates an HTML report.
    • SNPCallPHYLO (designed by Cosmika Goswami): Python-based SNPs calling pipeline that processes whole genome shotgun sequencing data from single genome isolates and generates an HTML report.
    • AMPLImock: A python based pipeline for analyzing 16s rRNA amplicons generated from mock communities [Code, Usage]
      To collate statistics and frequencies from multiple samples, use collateResults.pl
      To generate transition probabilities from alignment files, use usearch_aln_transition_prob.py
    • AMPLICONprocessing: A bash based pipeline for generating taxonomic profiles for Illumina paired-end reads using CREST and RDP classifiers [Usage]
    • METAmock: A python based pipeline for analyzing Whole-Genome Shotgun sequences generated from mock communities [Code, Usage]
    • TAXAenv: A website useful for multivariate statistical analysis of microbial community structure (abundance tables) in an environmental context (metadata).
    • TAXAassign: A bash based pipeline for generating taxonomic prifles using NCBI's Taxonomy.
    • CLUSThack: A python package that has an embarassingly-parallel (multithreaded + utilises streaming SIMD extensions) implemention for qgram-based edit distance measurement and is useful for hierarchical clustering of 16S rRNA sequences.
      Package: CLUSThack_v0.2.tar.gz
      Test files: hclust.py, test.fasta, test.pdf.
      Usage: $ time ./hclust.py -f test.fasta -t 32
    • Interactive tools for visualising abundance tables from metagenomic surveys:
      • PHYLOmap: A software for drawing heatmaps with phylogenetic trees from metagenomic surveys based on Interactive Tree of Life (ITOL) API.
      • HEATcloud: Web-based interactive heatmap viewer (programmed using javascript and jQuery) for abundance tables.
      • SUMMARIZEplot: Web-based interactive stacked barplot viewer (programmed using D3.js) for abundance tables.
      • PHYLObar: Web-based interactive viewer (programmed using D3.js) for trees in Newick format.
    • clust_validity.R: This script takes a CSV file of N D-dimensional features, performs K-means or dp-means clustering and chooses the optimum number of clusters based on either of the implemented internal clustering validation indices. Additionally, if the csv file contains a column titled "True_Clusters" containing true clusters membership for each object, you can use it to validate clustering performance using several external clustering indices.
      Code: clust_validity.R (Tutorial, Reference slides (maths), Example datasets)
    • GraphicalLasso.tar.gz: Generates network of associations between OTUs as a DOT file which can then be visualised in GraphViz.
    • Faster blastn searches using GNUparallel: For an Illumina dataset with 6 million reads, blastn_parallel.sh took 2.5 minutes on 45 cores as compared to blastn.sh which took 86 minutes.
    • Extracting representative sequences from OTU clusters generated in AmpliconNoise. extract_clust_seqs_fasta.pl is an extension of AmpliconNoise's Typical.pl and can give the most abundant sequence, consensus sequence, first sequence, and the longest sequence of each cluster as representative OTU.
    • remove_colinear_terms.R: An R script to iteratively remove colinear variables by calculating step-wise Variance Inflation Factor (VIF) of terms (columns) in a CSV file.
    • convIDs.pl : This script takes a (TAB/COMMA) delimited file and converts words in a particular column to those provided by a COMMA delimited IDs list.
    • collate_CSV.R: This R script takes two frequency tables in CSV format and collates them together by either taking union or intersection of columns. Furthermore it inserts "F1_" and "F2_" as prefixes to rownames of both CSV files, respectively.
    • collateResults.pl: This script is useful for combining CSV files generated by TAXAassign, AMPLICONprocessing or any software that produces two-column CSV files without header information. It takes as an input the path to a main folder where subfolders contain CSV files, each matching a particular pattern in their names.
    • collateGCMSResults.pl: This script is useful for collating data generated from GC-MS machines.
    • Google Colaboratory Workflows: Google Colaboratory allows one to write and execute Python in web browser (whether running on smart phones or laptop) and requires just a valid google account to run python notebooks (which have .ipynb extensions). With ~30GB DISK and 12GB RAM "FREELY" available on Colaboratory associated with Google accounts, the following workflows serves as modular streams for microbial informatics in the absence of any dedicated computing cluster. These are highly reusable, customisable, and serve to communicate research findings to funders/external collaborators as well as software outputs for the grants (Research Fish etc.). The run time for these workflows is ~3 to 4 hours on mediocre datasets.
      • conda_qiime2.ipynb: A proof-of-concept that QIIME2 can be run on Google Colab, and can process a proper 16S rRNA amplicons study
        Inputs: Paired-end Illumina FASTQ files (successfully tested with 97 samples comprising 7.7M reads).
        Outputs: Abundance Table of Amplicon Sequence Variants (ASVs) with Taxonomy (feature_w_tax.biom); Phylogenetic Tree (tree.nwk).
        Usefulness: Can use the workflow for current experiments as well as existing samples (meta-analysis studies).
      • conda_pangenome.ipynb: Prokaryotic pan genome analysis with Prokka and Roary
        Inputs: Prokaryotic Strains/Draft Assemblies in FASTA format (successfully tested on 12 strains of Peptobacterium atrosepticum).
        Outputs: Fully Annotated Genomes (*.GFF/*.GBK files); Phylogenetic Tree of Strains; Core/Accessory Genes.
        Usefulness: Can analyse functional gain/loss in prokaryotic strains of interest in intervention/case-control studies.
Close