This internet browser is outdated and does not support all features of this site. Please switch or upgrade to a different browser to display this site properly.

Complimentary R&D workshop

What you’ll learn: Practical skills for data preparation & visualisation

A hands-on session designed for researchers and HDR students. This workshop provides a practical introduction to data wrangling and visualisation for statistical and econometric analysis. Participants will learn how to wrangle, summarise, and visualise data using Python and other open-source tools — *** no prior coding experience required.

Recommended open-source tools/software

Participants should install beforehand.

Local installation of Jupyter Notebook and Python is encouraged. For beginners, it is recommended to install them via the Anaconda Distribution, which simplifies setup and includes commonly used data science packages. However, local installation is not mandatory. All workshop materials can also be accessed and run online via Kaggle, a free platform for data science that supports notebook execution without any software installation.

Workshop description

In today’s data-driven research environment, the ability to prepare, clean, and structure data is just as important as the analysis itself. This hands-on workshop, led by Professor Felix Chan, will introduce participants to the core concepts and practical skills of data wrangling and visualisation using freely available open-source tools.

The increasing availability of large structured and unstructured data presents a challenge, as these data often do not come in a form suitable for analysis. In fact, it is widely acknowledged that only 20% of the data analytics life cycle is dedicated to data analysis, while the majority of efforts are spent on cleaning, managing, and wrangling data into a form ready for analysis.

The objective of this workshop is to explore fundamental techniques that facilitate data wrangling for further statistical or econometric analysis. These techniques also enable the generation of summaries of data from various perspectives, which are valuable in determining the most efficient analytic methods to extract deeper insights.

The workshop will introduce the concept of split-apply-combine, a fundamental philosophy in data wrangling, and demonstrate these techniques through the use of Jupyter notebook and Python. Python is a widely adopted open-source tool in data science, freely accessible to all researchers.

The workshop will also introduce and explore the FAIR data principle, discussing how tools such as the Jupyter notebook and other open-source tools can assist in ensuring the reproducibility of research findings.