Big Data for Development
Introduction
As part of the capacity building pillar of the Big Data for Development project, AIMS-NEI designed a Big Data for Development (BD4D-SCP) based training program taught on the set of the AIMN-NEI network, first in Rwanda, now in Senegal, and soon in Cameroon.
The course is aimed at people passionate about data science in general and more particularly in the analysis and processing of big data, having at least four years of undergraduate studies or at least two to three years of experience as a statistical professional or any other subject related to data science.
A number of short-term trainings are underway to achieve our BD4D project goals of increasing the number of users of scientific data Africa and providing a platform for practitioners to interact.
Also as part of capacity development, AIMS-NEI will organize the first training workshop for senior executives, titled: Harnessing the Power of Big Data (LPBD). The aim of this workshop is to introduce executives to the era of Big Data, demonstrating how this phenomenon disrupts traditional businesses and opens the door to new products and services.
Course Overview
Datasets are getting bigger and bigger as the world’s population grows and things get more and more connected. Traditional data processing software and techniques cannot handle these large scale datasets. This course teaches the essentials of processing large-scale datasets using Python.
In addition, the course also teaches how to perform common computing tasks such as managing data and building machine learning models with Python. This course takes a hands on approach to equip participants with the most essential tools in a timely manner.
This course emphasizes practice-related learning, as such it includes many exercises to allow participants enough time to practice
Approach
This course takes a hands-on approach of equipping participants with the most essential tools in a timely manner. Classes start with the fundamentals of Python and focus primarily on data structures, then move quickly to major libraries for data science in Python.
Next, the course moves on to big-data processing by first providing brief theoretical concepts on the subject, then teaches Apache Spark, an advanced tool for processing large data sets. Afterwards, it offers introductory machine learning lectures before moving on to a detailed explanation of how to build these algorithms in python. This course promotes learning by the hands-on method.
Course Objectives
- Understand the advanced concepts of the Python language: data structures, functions, classes etc.
- Perform computerized tasks on dat using Python language: data ingestion, processing, visualization, web retrieval etc.
- Process a large scale (20GB+) data set on a personal computer using Apache Spark and use ‘Cloud Computing’ platforms.
- Familiarize yourself with the theoretical bases of common machine learning algorithms.
- Be able to build and evaluate machine learning models using the ‘scikit-learn’ library.
Course Schedule
Day 1: Advanced Concepts in Python. On this first day, the course will focus on the Python programming language to build a solid foundation for the rest of the course materials. Participants will be introduced to practical techniques from intermediate to advanced level, such as writing functions, classes, error handling, packing of Python code, and more.
Day 2: Python for Data Science: Day 2 focuses on performing common Data Science tasks using Python. We’ll explain how to use data, process, analyze, visualize, ‘Web Scraping’, and more using Python, while introducing essential packages (Pandas, Geopandas, Numpy, Matplotlib, etc) to perform these tasks.
Day 3: Big Data Handling: On the third day, the course covers handling large data sets using Python.
The following topics will be covered in addition to introduction to Big Data, multiprocessing in Python, Apache Spark, use of common cloud platforms etc.
Day 4: Machine Learning (ML) in Python. On the fourth day, the course will begin with an introductory lecture on Machine Learning. the remainder of the day will be spent completing various ML tasks (e.g data preparation, model building, evaluation and interpretation) using the scikit-learn package in Python.\
Day 5: Putting it all together: In the last day, we will focus on the skills learned in this course to solve real-world data science problems by examining case studies.
Potential case studies to cover include: how to process nighttime satellite images(geo-spatial), how to process large call records from cellphones (mobile data), and how to create ML models to impute sensor data missing (sensor data).
Preconditions
Programming: possibility to write a simple program in Python (basic Python level)
Maths and Statistics: Training in statistics, data science of quantitative sciences.
This training will take place from July 1 to 5, 2019 in Dakar, Senegal and will be held in English. Participation in the course is limited to 40 people and is free. Lunches and coffee breaks are available on site at the time of registration. AIMS-NEI does not provide any financial assistance to successful applicants for this short training and encourages each successful applicant to make their own provisions to cover all costs associated with their participation in this program, including transportation. Instructor Profile Dr. Dunstan Matekenya is a senior scientist with over 10 years of experience in the fields of traditional statistics and modern methods of machine learning. He is currently working as a Data Scientist at the headquarters of the World Bank Group (WBG) in Washington DC. Prior to joining the WBG, Dr. Dunstan Matekenya completed his PhD at the University of Tokyo in 2016. His PhD in research is focused on the use of machine learning methods to explore information gleaned from phone data mobile. Before reorienting his career towards data science, Dr. Dunstan Matekenya previously worked as a statistician at the National Bureau of Statistics in Malawi from 2007 to 2017, where he actively contributed to flagship projects in 2008, such as the Census of population and housing as leader of the GIS unit. His passion is to contribute to the modernization of official statistics in developing countries through the use of alternative data sources such as mobile phone data, as well as to the improvement of capacities in field science. data.
All candidates interested in applying for this short intensive training in data processing with Python should use the online link to complete and submit their application with all supporting documents by the deadline indicated on the AIMS Senegal website. We will notify shortlisted applicants to provide additional information to finalize their application. Shortlisted candidates will be assessed for reassurance of their Python skills After one week of the deadline, we will notify the successful applicants. Application deadline: June 7, 2019 – 11:59 p.m. (UT). Any inquiries regarding this short training should be sent to: aii@nexteinstein.org.