Grant from the Bill & Melinda Gates Foundation to support a three-year project to research and develop new concepts for on-site sanitation in developing countries. This is done in collaboration with London School of Hygiene and Tropical Medicine to study pit latrines.
2011-2012: Senior
Postdoctoral Research Assistant "Network Inference
from Signals", Oxford
Complex Systems Group, University
of Oxford, UK; Advisor: Nick
Jones; Supported by EPSRC grant number EP/I005986/1; Project
Page
2008-2009: Teaching Associate, Queen's College, University of Cambridge, UK
2004-2008: PhD
Electrical and Electronics Engineering "Dynamic Phase
Boundary Estimation in Electrical Impedance Tomography", Control
Engineering Lab, Inverse
Problems in
Engineering Research Group,
Jeju National University, Korea;
Advisor: Professor Kyung
Youn Kim
Contribution to the following grants/projects:
The 2nd Phase Brain
Korea 21 (BK21) Project
Korea Science and Engineering Foundation Grant
No. R01-2007-000-20155-0, Research Grant of Jeju National University
Korea Research Foundation Grant No. KRF-2005-013-D00075
Hyocheon
Research Fund of the Cheju National University Development Foundation
Korea Science and Engineering Foundation Grant No. R01-2004-000-0040-0
Ministry of Commerce, Industry and Energy, Korea Grant No. S1005503
2001-2003: Graduate
Assistant, Faculty of Computer Science and
Engineering, GIK Institute of Science
and Technology, Pakistan
2001-2003: MS
Computer Systems Engineering (Major: Artificial
Intelligence), GIK Institute of
Science and Technology,
Pakistan; Advisor: Professor Anwar Majeed Mirza
2000-2001: Junior
Software Engineer, Askari
Information Systems, Pakistan
1996-2000: BS
Computer Systems Engineering, GIKI
Institute of Science and Technology,
Pakistan
Final Year Project: Virtual Shopping Agent sponsored by Virtual
Shopping Inc., California, USA
Excellence in
Research Award - Dean, Graduate School, Jeju National University (2008)
Best
Researcher of the Year. Cash Prize of 1,500,000 Korean won given by the
2nd phase BK-21 project (2008)
Best
Researcher of the Year. Cash Prize of 1,000,000 Korean won given by the
2nd phase BK-21 project (2007)
Excellent
Achievement Award, The Second Prize - LG Electronics (2006)
Excellent
Paper Award - ITISF 2005
Scholarship
(300,000 Korean won per month) from 2nd Phase BK-21 Project (2007-2008)
Scholarship
(350,000 Korean won per month) from Ministry of Commerce, Industry and
Energy, Korea Grant No. S1005503 (2005-2006)
Korean
Government IT Scholarship (56,000,000 Korean won) by Institute of
Information Technology Advancement, Korea (2004-2008)
Jeju National
University Category - A Scholarship. Full tuition fee waiver during
Ph.D.
coursework (2004-2006)
President, Foreign Students Association, Jeju National University (2005-2006)
Certificate of Excellence for earning first position in MS Batch
2001 with CGPA 4.0 - Dean of Faculty of Computer Science and
Engineering, GIK Institute of Science and Technology (2003)
Certificate of Appreciation for aggressive learning exercises - Dr Anwar Majeed Mirza (Member, Faculty of Computer Science and Engineering, GIK Institute of Science and Technology, 2003)
Certificate of Appreciation for aggressive learning exercises - Dr Syed Afaq Hussain (Member, Faculty of Computer Science and Engineering, GIK Institute of Science and Technology, 2003)
Merit Certificate (Fall Semester 2001) - Pro-Rector Academics, GIK Institute of Science and Technology (2002)
GIK Institute of Science and Technology Merit Scholarship. Full tuition fee waiver during M. S.
coursework (2001-2003)
President, Graduate Students Association, GIK Institute of Science and Technology (2001-2003)
President, Entertainment Club and Badminton Club, Askari Information Systems (2000-2001)
Employee of the Month Award, Askari Information Systems (2001)
Major Research Projects
Environmental Bioinformatics (Metagenomics)
I joined the University of Glasgow in April 2012 as a Research Fellow to
work on a joint grant between TSB and Unilever to develop Metagenomics
for industrial applications. Metagenomics research is based on
sequencing data from 16S rRNA
amplicon, or large-scale shotgun whole-genome metagenomic sequencing.
The primary goal of metagenomics is the
assessment of taxonomic and functional diversity of microbial
communities. Two forms of interaction
in biogeochemical cycle hold my interest: a) Microbiological; and b)
Chemical Interdependence. In
metagenomics, communities of uncultured microorganisms are studied. My
interest lies in the wiring diagrams of
life/integrated picture of how
a microbial cell or community operates, especially in phylogenetic and
genomic complexity. I want
to identify the major players to understand and predict ecosystem
functioning and microbial
impact on ecosystem with stress on: interactions (Obligate to Minimal);
and presence (Ubiquitous
or endemic).
Some of the research questions I am interested in answering are: a) How
diverse are the metabolic pathways and networks? b) How do organisms
and protein-coding genes interact with each other
to lead to the overall
function of the system? c) How do environmental stimuli impact
ecosystem functioning and long-term
stability as a
whole?
Research Activities
Development of the software pipelines for both metagenomics and taxanomic data analysis
TAXAenv website: A webservice useful for multivariate analysis of microbial community structure in an environmental context. Microbial diversity is measured by sequencing homologous genes, typically the 16S rRNA, through the next-generation sequencing platforms. After extracting the abundances of the observed taxa by classifying the sequences, you can use this tool to investigate the correlations between diversity patterns and environmental parameters to lead you to a better understanding and prediction of ecosystem functioning and the microbial impact on the ecosystem.
TAXAenv Website Tutorial
NOTE: This website will be released to public domain after publication/completion.
SEQenv pipeline: A pipeline capable of annotating genetic sequences based on environment descriptive terms occurring within their records and/or in relevant literature.Given a set of sequence files (in FASTA format) SEQenv retrieves highly similar sequences from public repositories (such as SILVA and GenBank). Subsequently, from each of those records text fields carrying environmental context information (such as the reference title and the isolation source) are being extracted. Existing links to PubMed abstracts are also being followed and the relevant abstracts collected.Once the relevant pieces of text for each matching sequence have been gathered they are being processed by a text mining module capable of identifying any Environment Ontology (EnvO) environment descriptive terms mentioned in them.The identified EnvO terms along with their mention frequency are then subjected to clustering analysis and multivariate statistics. As a result tagclouds and heatmaps of environment descriptive terms characterizing different set of sequences (e.g. orginitating from different samples) are being generated.
Project Page
NOTE: This software will be released to public domain after publication/completion.
TAXAassign: This software is capable of annotating unknown sequences at a certain hierarchical level of NCBI's Taxonomy.
Tutorial: Resolving unknown sequences using TAXAassign
NOTE: This software will be released to public domain after publication/completion.
Some useful bioinformatics scripts
Graphical Lasso algorithm for finding relationship between OTUs in a species abundance file. The matlab function generates the network in dot file which can be visualised in GraphViz. This species abundance file can be obtained by running sequences through a noise removal software such as AmpliconNoise. Graphical Lasso - code [tar.gz]
Faster blastn searches using GNU parallel. For an Illumina dataset with 6 million reads, blastn_parallel.sh took 2.5 minutes on 45 cores as compared to blastn.sh which took 86 minutes.
Data extraction from uniprot databases: extract_fasta_swissprot.py. This script extracts a FASTA file from a compressed swissprot file based on a supplied search pattern (types include: protein names, organism name, keywords, comments, gene name, accession number, organism classification).
Searching for motifs within proteins that are likely to be phosphorylated by specific protein kinases using Scansite given a FASTA file:extract_scansite_motifs_fasta.pl.
Extracting representative sequences from clusters generated in AmpliconNoise. extract_clust_seqs_fasta.pl is an extension of AmpliconNoise's Typical.pl and gives abundant sequences, consensus sequences, first sequences, and longest sequences of each cluster.
Extracting organism names and NCBI's taxonomy IDs for blast output files that are processed using -outfmt 6 switch in blastn: blast_concat_name_taxa.py (runs on a single core) blast_concat_name_taxa.sh (multicore version that runs blast_concat_name_taxa.py in parallel on each record of the blast output file using GNU parallel). The organism name and taxonomy ID is appended at the end of each record.
An RScript to iteratively remove colinear variables by calculating
step-wise Variance Inflation Factor (VIF) of terms (columns) in a CSV file against other columns:remove_colinear_terms.R
convIDs.pl : This script takes a (tab/comma) delimited file and converts words in a particular column to those provided by a comma delimited IDs map.
A short tutorial on how you can use BioSQL to resolve taxonomic path along with taxonomic ranks given NCBI taxon ID. The tutorial requires the stored procedure path_to_root_node.sql
A short tutorial on how you can use BWA, bedtools, and samtools to identify low-coverage genomes in metagenome samples.
Network Inference from Signals
As part of the EPSRC Grant No. EP/I005986/1 ”Global View” in the
University
of Oxford, I explored the possibilities to scope a form of dash-board
that gives policy makers an integrated view of the state of the UK,
both at the current time, and into the past. If we are equipped with a
better view of the UK, we can ensure that it is more resilient to
shocks. The project investigates methods to infer time-varying networks
from multiple time signals. The time signals pertain to Google trends,
Twitter feeds, stock prices, exchange rates, commodity prices, weather
statistics and transport statistics. Furthermore, we also investigated
the feasibility of mining text data obtained from the web in order to
help classify and enhance the information that can be obtained from
signal data. This will help us in identifying events that are likely to
have had some effect on the signal data sets and allow users to track
public and expert opinion.
Activities:
Establishing a template of usage requirements from relevant
stakeholders in the UK government and other beneficiaries: Currently
working on housing datasets in collaboration with Institute of Public
Policy Research (IPPR) and RightMove.
Identification of unusual behaviour in single signals, in
particular weak signal changes that are distributed across many
variables but causing global changes in network topology
Detrending the data to remove seasonal or periodic
components,
and irregular fluctuations
Forecasting of future values of individual signals using
Gaussian
Processes based regression
Constructing a hierarchy of increasingly sophisticated methods for network inference: I have
considered simple
correlations with sliding windows; a range of causal methods such as
Dynamic Bayesian Networks and Granger Causality; those that are based
on Markov Random Fields (Eric Xing’s work at Carnegie Mellon), and some
that are based on State-Space Models (Zoubin Ghahramani’s work at
Cambridge).
Identification of high trending keywords from Google Trends
Datasets using residual time series obtained from the difference
between the linearly interpolated values and original values and by
using the outlier detection methods
Developing a cross-platform prototype software tool
GlobalView in
C++ for dynamic network inference. Version 0.2 has the following
features
Parsers
Google Trends Datasets from http://www.google.com/trends
UK Climate Impact Program Indicators (UKCP09) from
http://www.data.gov.uk
Meteriological Grided Land Daily Temperatures from
http://hadobs.metoffice.com/hadghcnd/download.html
Network Inference Algorithms
Graphical Lasso Method for static networks
Dynamic Bayesian Network for time-varying causal networks
As part of EPSRC Grant No. EP/F016476/1 "Hybrid 3D Ultrasonic
Imaging" in the University of Cambridge, I developed a hybrid
ultrasound
imaging system for creation of 3D panoramas. The project focuses on:
tracking the trajectory of a 3D ultrasound probe based on the
image-based registration of acquired data and the output of an inertial
position sensor;
calibration of the hybrid system;
correction of artifacts in the data caused by variations of
the
pressure from the probe during the scan;
differentiation of backscatter into diffuse and directional
components using the overlap data from multiple scans;
and development and evaluation of software tools to enable
the
system to be used effectively in a Hospital environment;
The system was implemented in Stradwin software
(written in C++ and using wxWidgets to provide cross-platform
compatibility and OpenGL for 3D visualisation). The software was then
modified to run on a mobile ultrasound machine Ultrasonix Sonix RP; and
to communicate with the inertial sensor through it’s serial port. The
software was also modified to provide a calibration protocol to
compensate for the orientation in which the sensor, Intersense Inertia
Cube 3, was mounted on the ultrasound probe. Additionally, the keypad
controls for the ultrasound machine were fully integrated with the
developed software. This has made the data acquisition process easy for
clinicians as they can hold the 3D probe with one hand and with other
hand can click on the keypad to acquire the volumetric data. The system
was then shipped to Addenbrookes’ Hospital, Cambridge, where it
successfully completed the review by the ethics committee. It was then
used by a clinician in a pilot clinical study on pregnant women
attending routine gynaecological exam. The system has recently shown
promising results on datasets from our pilot clinical study and will
appear in British Journal of Radiology. Project
Page
RW Prager, UZ Ijaz , AH Gee, and GM Treece. Three-dimensional
ultrasound imaging. Proceedings of the Institution of
Mechanical Engineers Part H-Journal of Engineering in Medicine,
224(2):193-223, 2010.
Electrical Impedance and Capacitance Tomography
This work is relevant to chemical engineers. During my Ph.D.(2004-2008)
at JNU, South Korea, I focused primarily on the development of static
and dynamic algorithms for inverse problems that arise in a wide range
of engineering areas, e.g. electrical impedance tomography (EIT),
electrical capacitance tomography (ECT), mobile ad hoc networks, global
positioning systems, and inverse heat conduction problems. I have
developed novel tomographic imaging methods using EIT and ECT to
manipulate measurement data from electrodes attached to the surface of
a pipeline in order to estimate the multidimensional distribution of
physical parameters inside. As compared to the traditional EIT and ECT,
I have considered the scenarios in which the object to be imaged is
changing very rapidly during the data acquisition; necessitating a
desire for reasonable spatio-temporal resolution. Rather than
considering the inverse problem as a traditional tomography
reconstruction problem, the problem was formulated as a state
estimation problem utilising different kinematic evolution models for
the physical parameters along with an observation model based on finite
element analysis (FEM). In particular, I have developed Kalman-type
inverse algorithms for:
estimation of the concentration distribution by the
convection-diffusion equation that allowed for approximation of the
velocity field;
estimation of time-varying interfacial boundary in stratified
flows of immiscible liquids (targeting liquid hydrocarbon
transportation in pipelines that often contain free water);
imaging of a stirrer vessel for detection of air distribution
and
detection of air bubbles;
estimation of settling curves and velocities in the
sedimentation
process for different layers under the influence of gravity (targeting
industrial applications such as mining, waste water treatment, and the
pulp and paper industry);
and visualisation of two-phase flow through rod bundles.
I have implemented an adaptive estimator based on a semi-Markovian process for estimating the time-dependent boundary heat flux in two-dimensional heat conduction domain with heated and insulated walls.
For the estimation, the algorithm requires only the temperatures measured at the insulated walls. The estimator consists of a bank of parallel, adaptively weighted Kalman filters and also predicts the bias in the measurements in addition to input heat fluxes.
From 2005-2006, I worked on a joint project with the Systems Biology
group at JNU and developed a software, 2D electrophoresis Gel Image
Processor for MATLAB. This software is useful for the analysis of
bio-markers by quantifying individual proteins, showing the separation
between one or more protein "spots" on a scanned image of a 2D gel, and
measuring running differences between gels. Salient features of the
software are as follows:
2D GELS Pre-processing (RGB, grayscale and Binary filtering
including anisotropic diffusion and homomorphic filtering)
Shape-based clustering (K-means, Fuzzy C means)
Color thresholding (OTSU’s method, local, relative and joint
entropy methods)