Python Programming for Data Science: Introduction

Overview

Data science is a discipline that uses scientific methods, processes, and algorithms to extract meaningful information, knowledge, and insights from structured and unstructured data.

The aim of this course is to provide an introduction to programming for data science, using the Python programming language. The course seeks to introduce the basics of the data science process, from collecting data, pre-processing it (cleaning/correcting it), performing exploratory data analyses, visualizing data, and sharing analysis results.

The course will rely on Jupyter Notebooks for interactive Python programming as they are widely used in Data Science.

Before attending this course, students will need to know:

  • the fundamentals of linear algebra: what is a matrix and how matrix addition and multiplication are performed;
  • the following fundamental concepts of statistics: mean, median, variance and standard deviation, interquartile range; 
  • the fundamentals of algebra: real and complex numbers, exponential and logarithm, and trigonometric functions.

If you choose to register for accreditation and assessment, to complete the assignment you will need access to a computer capable of running the open-source software used in the course and access to the Internet. A limited amount of class time will be allocated to working on the class assignment, so you should ensure that you also have access to a computer outside of class.

If you are unable to attend this course in Oxford, an online version is also available.

Programme details

Course starts Tuesday 27 January 2026

This is an in-person course which requires your attendance at the weekly meetings in Oxford on Tuesdays, 7-9pm.

Week 1: Introduction to Data Science. Course set-up. Intro to Python: numbers

Week 2: Python basics: built-in types, functions, and methods, if statement

Week 3: Python data structures: lists  ,tuples, dictionaries, and sets; for loops

Week 4: Numpy 

Week 5: Pandas for data science I

Week 6: Pandas for data science II

Week 7: Data visualisation: matplotlib and seaborn

Week 8: Object-oriented programming: classes, inheritance, and applications 

Week 9: Data gathering and cleaning. Text pre-processing for Natural Language Processing (NLP)

Week 10: Time Series Analysis

Certification

Credit Accumulation Transfer Scheme (CATS) Points

Only those who have registered for assessment and accreditation will be awarded CATS points for completing work to the required standard. Please note that assignments are not graded but are marked either pass or fail. Please follow this link for more information on Credit Accumulation Transfer Scheme (CATS) points

Digital Certificate of Completion 

Students who are registered for assessment and accreditation and pass their final assignment will also be eligible for a digital Certificate of Completion. Information on how to access the digital certificate will be emailed to you after the end of the course. The certificate will show your name, the course title and the dates of the course attended. You will be able to download the certificate and share it on social media if you choose to do so.

Please note students who do not register for assessment and accreditation during the enrolment process will not be able to do so after the course has begun.

Fees

Description Costs
Course fee (with no assessment) £300.00
Assessment and Accreditation fee £60.00

Funding

If you are in receipt of a UK state benefit, you are a full-time student in the UK or a student on a low income, you may be eligible for a reduction of 50% of tuition fees. See details of our concessionary fees for short courses

Tutor

Ms Nazneen Ali

Nazneen holds a Master of Technology in Data Science and Engineering, a Master’s degree in Electronics, and a Bachelor’s in Computer Science (Hons). With over 15 years of teaching experience, she has consistently supported students in achieving academic and professional success. Her expertise includes mentoring Data Science projects with a strong emphasis on Machine Learning, Deep Learning, and Python programming. She possesses in-depth knowledge of the complete machine learning pipeline, a broad range of machine learning and deep learning algorithms, and advanced statistical analysis techniques. Known for her clarity and fluency in explaining complex technical concepts, Nazneen excels at making challenging topics accessible and engaging for learners.

Course aims

The course aims to give students the opportunity to learn the basic aspects of Python programming for data science and to gain an appreciation for the end-to-end process of obtaining data, processing it, and presenting results.

Course objective: to be able to build a simple data processing pipeline by the end of the course.

Teaching methods

Each weekly session will consist of lectures and hands-on programming exercises, class discussions and interactive programming demonstrations by the tutor.

Learning outcomes

By the end of the course, students will have been given the opportunity to have learnt how to be able to write procedural code using the Python language and tools to:

  • import data from local and/or remote sources and preprocess it;
  • extract significant information from the gathered data;
  • visualise the relevant features extracted from the data.

After attending this course:

  • how to perform fundamental Python operations such as variable creation, numerical operations on scalar, vectors and matrices,  iteration through a collection, manipulation of elements in a collection;
  • how to use NumPy and pandas to import a dataset and extract important statistics from it using techniques such as split-apply-combine (for example, finding the mean, median or max of a quantitative variable for each category in a categorical variable);
  • given a dataset, how to select the appropriate visualisation graph depending on the information to be conveyed, and use the matplotlib and seaborn library to draw it and add title, captions and figure legends;
  • how to create and add state and behaviour to a class in Python;
  • how to use nltk or spaCy to preprocess a text and convert it to a numerical representation that can be manipulated by information retrieval algorithms (eg for sentimental analysis, semantic search or machine learning algorithms);
  • how to perform time series analysis.

Assessment methods

Only those students who have registered for assessment and accreditation, in advance of the course start date, can submit coursework for assessment.

Students will be asked to submit a portfolio of two exercises for their coursework assignment. The first exercise will be given in week 5 for early submission, and the second in week 9.

Application

How to enrol

Please use the 'Book now' button on this page. Alternatively, please complete an enrolment form.

How to register for accreditation and assessment

To be able to submit coursework and to earn credit (CATS points) for this course, if you wish to do so, you will need to register and pay an additional £60 fee. You can do this by ticking the relevant box at the bottom of the enrolment form or when enrolling online. 

Students who do not register for CATS points during the enrolment process will not be able to do so after the course has begun.

If you are enrolled on the Certificate of Higher Education at the Department you need to indicate this on the enrolment form but there is no additional registration fee.

Level and demands

The Department's Weekly Classes are taught at FHEQ Level 4, ie first year undergraduate level.

Experience in using a programming or scripting language is beneficial. The basic elements of programming using the Python programming language will be introduced throughout the course. However, each student should consider that this course requires a certain amount of homework (2–3 hours per week) to familiarise with the concepts explained during the class. This is especially true for students who are not familiar with programming. This is a course on data science, so some mathematical concepts will be discussed but this will be kept to a minimum. Expect some exposition to (1) linear algebra (eg matrices operations), (2) statistics, and (3) calculus. 

 

IT requirements

The course will rely on Jupyter Notebooks for interactive Python programming as they are widely used in Data Science.

In order to complete the assignment (and in order to get the full benefit from the course), students will need access to a computer capable of running the open-source software used in the course and access to the Internet. Only a limited amount of class time will be allocated to working on the assignment, so students should ensure that they have access to a computer outside of class.