This is the web resource for NTU's Bio-Data Science and Education Laboratory

View the Project on GitHub gohwils/biodatascience

Wilson Wen Bin Goh

Wilson Goh

I graduated with my PhD in Bioinformatics and Computational Systems Biology, Imperial College London in 2014, where I was jointly supervised by Marek Sergot (Imperial College London) and Limsoon Wong (National University of Singapore). I did my work on network theory, where I demonstrated how networks could be used for resolving coverage and consistency issues in high-dimensional biological data, particularly proteomics data. You can check out my thesis here.

It is said that we never end up using most of what we learnt in our degrees. And as it turns out, very true. Upon graduation, I’ve worked on a wide variety of problems, and often, venturing quite far from bioinformatics. For example, I never quite fancied myself the statistician, but got very inspired by odd observations reported in recent literature. This includes the entire debacle about the nature and future of the p-value, to how most biological research is irreproducible, and how many reported multi-gene biomarkers do worse than random sets of genes. This led us (myself and my research collaborators) down the path where we got interested in understanding how confounding issues, experimental design isues and ill-conceived choice of data transformation method, generates false results, while masking true effects. It turns out, in actual practice, generating false results is really very easy as statistical tests are not as foolproof or as objective as imagined.

This led us to a new area of applied statistical research, where we now try to better understand the nature of batch effects, the impact of various normalization procedures on statistical feature selection and data modelling, how do we ensure that a feature set is not merely statistically valid but also informative, and finally, dealing with weak validation practices across various areas of data science research (and not merely limited to just machine learning and AI). We (the lab) realized along the way that as we veered more and more into this area, we were doing more data science than bioinformatics. And so, we decided to relabel ourselves as a data science lab, with strong interest in biological and healthcare applications.

And as it turns out, data science became really popular in recent times. It is considered transformative and we wholly welcome this change. We were glad to see it got people interested in bioinformatics again. But as we revised existing bioinformatics training materials and curricula, we noticed that data science applied on biological problems, is not necessarily bioinformatics. It is a different emerging discipline. And this is where we got interested in the idea of researching and promoting education-related themes. We initially started with developing courses and curriculum for a biomedical data science programme, but soon realized that it is much more interesting (and fun) to apply the data science skills towards the analysis of education-related data. We noted that as technologies and AI becomes more advanced, people need to get smarter. And the way forward, is to get people out of rote-learning, and become more self-reliant in their learning habits, to ask better questions, and think of how concepts across different disciplines can be joined together creatively and synergistically. We got interested in understanding how cognitve load and problem-solving competencies can be honed and developed more rigorously via high-impact pedagogical practices. But more importantly, we want to know how data analytics and machine learning can be applied to automatically identify the manifestations of such deep learning traits. We believe that this is a more useful endeavour than programming chatbots to remind you to submit your tutorial assignment.

Of course, the lab will always maintain a portion of its activities towards biological and biomedical science research. We believe that bioinformatics and bio-data science will play an essential role in developing new treatments, identifying novel drug targets, and deepen our understanding about our bodies.

More about me

To get more information on my latest works, do visit my researchgate site at You may also connect with me on my LinkedIn account at