Python Programming for Data Science: Introduction

Overview

Data science is a discipline that uses scientific methods, processes and algorithms to extract meaningful information, knowledge and insights from structured and unstructured data.

The aim of this course is to provide an introduction to programming for data science, using the Python programming language. The course seeks to introduce the basics of the data science process, from collecting data, pre-processing it (cleaning/correcting it), performing exploratory data analyses, visualizing data, and sharing analysis results.

In order to complete the assignment (and in order to get the full benefit from the course), students will need access to a computer capable of running the open-source software used in the course and access to the Internet. A limited amount of class time will be allocated to working on the class assignment, so students should ensure that they have access to a computer outside of class.

The course will rely on Jupyter Notebooks for interactive Python programming as they are widely used in Data Science.


This course combines online study with a weekly 1-hour live webinar led by your tutor. Find out more about how our short online courses are taught.


Programme details

This course begins on the 16 Sep 2025 which is when course materials are made available to students. Students should study these materials in advance of the first live meeting which will be held on 23 Sep 2025, 6:30-7:30pm (UK time).

Week 1: Introduction to Data Science. Introduction to Git and the Anaconda environment

Week 2: Python basics: built-in types, functions and methods, if statement

Week 3: Python data structures: list, dictionaries, tuples; for...in loops

Week 4: NumPy

Week 5: Pandas for data science I 

Week 6: Pandas for data science II

Week 7: Matplotlib for Data visualisation

Week 8: Object-oriented programming: classes, inheritance, and applications 

Week 9: Data gathering and cleaning. Text pre-processing for Natural Language Processing (NLP)

Week 10: Introduction to experimental design

Certification

Credit Application Transfer Scheme (CATS) points 

Coursework is an integral part of all online courses and everyone enrolled will be expected to do coursework. All those enrolled on an online courses are registered for credit and will be awarded CATS points for completing work at the required standard.

See more information on CATS points

Digital credentials

All students who pass their final assignment will be eligible for a digital Certificate of Completion. Upon successful completion, you will receive a link to download a University of Oxford digital certificate. Information on how to access this digital certificate will be emailed to you after the end of the course. The certificate will show your name, the course title and the dates of the course you attended. You will be able to download your certificate or share it on social media if you choose to do so. 

Please note that assignments are not graded but are marked either pass or fail. 

Fees

Description Costs
Course Fee £360.00

Funding

If you are in receipt of a UK state benefit, you are a full-time student in the UK or a student on a low income, you may be eligible for a reduction of 50% of tuition fees. Please see the below link for full details:

Concessionary fees for short courses

Tutor

Dr Nick Day

Dr Nicholas (Nick) Day is a Departmental Lecturer in Lifelong Learning for Data Science and Computing at OUDCE. He has taught at the department since 2016 on a range of programming, software engineering, artificial intelligence and data science courses. He completed his PhD in Computer Science Education (CSEd) in 2020 and now applies his pedagogical research to the development of courses and contributes to the department’s AI Steering Group. 

 

Furthermore, Nick is a Senior Lecturer and Programme Leader for Buckinghamshire New University’s (BNU) undergraduate Computing course. He has been a Fellow of the Higher Education Academy (FHEA) since 2015 and is now preparing an application for Senior Fellowship (SFHEA). Nick is a Member of British Computing Society (MBCS) and is an AdvanceHE certified External Examiner, presently reviewing Cardiff University’s postgraduate Computing degrees.

Course aims

  • To learn the basic aspects of Python programming for data science.
  • To gain an appreciation for the end-to-end process of obtaining data, processing it, through to presenting results.
  • To be able to build a simple data processing pipeline by the end of the course.

Teaching methods

Learning takes place on a weekly schedule. At the start of each weekly unit, students are provided with learning materials on our online platform, including one hour of pre-recorded video, often supplemented by guided readings and educational resources. These learning materials prepare students for a one-hour live webinar with an expert tutor at the end of each weekly unit which they attend in small groups. Webinars are held on Microsoft Teams, and provide the opportunity for students to respond to discussion prompts and ask questions. The blend of weekly learning materials that can be worked through flexibly, together with a live meeting with a tutor and their peers, maximise learning and engagement through interaction in a friendly, supportive environment.

Learning outcomes

At the end of the course, the student will be able to write procedural code using the Python language and tools to:

  • import data from local and/or remote sources and preprocess it;
  • extract significant information from the gathered data;
  • visualise the relevant features extracted from the data;

After attending this course, students will know

  • how to perform fundamental Python operations such as variable creation, numerical operations on scalar, vectors and matrices,  iteration through a collection, manipulation of elements in a collection;
  • how to use NumPy and pandas to import a dataset and extract important statistics from it using techniques such as split-apply-combine (for example, finding the mean, median or max of a quantitative variable for each category in a categorical variable);
  • given a dataset, how to select the appropriate visualisation graph depending on the information to be conveyed, and use the matplotlib library to draw it and add title, captions and figure legends;
  • how to create and add state and behaviour to a class in Python;
  • how to use nltk to preprocess a text and convert it to a numerical representation that can be manipulated by information retrieval algorithms. (e.g. for sentimental analysis, semantic search or machine learning algorithms).

Assessment methods

Students will be asked to submit a portfolio of exercises for their coursework assignment. I will give the first exercise midway through the course for early submission, the second to be completed at the end of the course. 

In order to complete the assignment (and in order to get the full benefit from the course), students will need access to a computer capable of running the open source software used in the course and access to the Internet. Only a limited amount of class time will be allocated to working on the assignment, so students should ensure that they have access to a computer outside of class.

You will be set independent formative and summative work for this course. Formative work will be submitted for informal assessment and feedback from your tutor, but has no impact on your final grade. The summative work will be formally assessed as pass or fail.

Application

Please use the 'Book' or 'Apply' button on this page. Alternatively, please complete an Enrolment form for short courses | Oxford University Department for Continuing Education

Level and demands

Experience in using a programming or scripting language is beneficial. The basic elements of programming using the Python programming language will be introduced throughout the course. However, each student should consider that this course requires a certain amount of homework (2–3 hours per week) to familiarise with the concepts explained during the class. This is especially true for students who are not familiar with programming. This is a course on data science, so I will discuss some mathematical concepts even though I will try to keep these to a minimum. Expect some exposition to (1) linear algebra (e.g. matrices operations), (2) statistics, and (3) calculus. 

The Department's short online courses are taught at FHEQ Level 4, i.e. first year undergraduate level. FHEQ level 4 courses require approximately 10 hours study per week, therefore a total of about 100 study hours.

English Language Requirements

We do not insist that applicants hold an English language certification, but warn that they may be at a disadvantage if their language skills are not of a comparable level to those qualifications listed on our website. If you are confident in your proficiency, please feel free to enrol. For more information regarding English language requirements please follow this link: https://www.conted.ox.ac.uk/about/english-language-requirements

Selection criteria

Before attending this course, prospective students will know:

  • The fundamentals of linear algebra: what is a matrix and how matrix addition and multiplication are performed.
  • The following fundamental concepts of statistics: mean, median, variance and standard deviation, interquartile range.
  • The fundamentals of algebra: real and complex numbers, exponential and logarithm, and trigonometric functions.