DelftX: Data Creation and Collection for Artificial Intelligence via Crowdsourcing

A one-stop shop to get started on the key considerations about data for AI! Learn how crowdsourcing offers a viable means to leverage human intelligence at scale for data creation, enrichment and interpretation, demonstrating a great potential to improve both the performance of AI systems and their trustworthiness and increase the adoption of AI in general.

6 weeks

4–5 hours per week

Self-paced

Progress at your own speed

This course is archived

View course materials

I would like to receive email from DelftX and learn about other offerings related to Data Creation and Collection for Artificial Intelligence via Crowdsourcing.

About this course

Skip About this course

Advances in Artificial Intelligence and Machine Learning have led to technological revolutions. Yet, AI systems at the forefront of such innovations have been the center of growing concerns. These involve reports of system failure when conditions are only slightly different from the training phase and they also trigger ethical and societal considerations that arise as a result of their use.

Machine learning models have been criticized for lacking robustness, fairness and transparency. Such model-related problems can generally be attributed to a large extent to issues with data. In order to learn comprehensive, fine-grained and unbiased patterns, models have to be trained on a large number of high-quality data instances with distribution that accurately represents real application scenarios. Creating such data is not only a long, laborious and expensive process, but sometimes even impossible when the data is extremely imbalanced, or the distribution constantly evolves over time.

This course will introduce an important method that can be used to gather data for training machine learning models and building AI systems. Crowdsourcing offers a viable means of leveraging human intelligence at scale for data creation, enrichment and interpretation with great potential to improve the performance of AI systems and increase the wider adoption of AI in general.

By the end of this course you will be able to understand and apply crowdsourcing methods to elicit human input as a means of gathering high-quality data for machine learning. You will be able to identify biases in datasets as a result of how they are gathered or created and select from task design choices that can optimize data quality. These learnings will contribute to an important set of skills that are essential for career trajectories in the field of Data Science, Machine Learning, and the broader realms of Artificial Intelligence.

At a glance

Institution: DelftX
Subject: Data Analysis & Statistics
Level: Intermediate
Prerequisites:
Some prior experience with a programming language (e.g. Python, Java) is recommended but not required.

Language: English
Video Transcripts: اَلْعَرَبِيَّةُ, Deutsch, English, Español, Français, हिन्दी, Bahasa Indonesia, Português, Kiswahili, తెలుగు, Türkçe
Associated programs:
- Professional Certificate in Data Skills for Artificial Intelligence
Associated skills:Data Science, Data Quality, Machine Learning, Artificial Intelligence

What you'll learn

Skip What you'll learn

At the end of this course you will be able to:

Examine the use of crowdsourcing for gathering data
Explain how cognitive biases and other human factors influence data quality
Describe the use of active learning in the creation of crowdsourced training data
Demonstrate the design of crowdsourcing tasks with quality control mechanisms
Discuss the evaluation of ML models with humans in the loop

Syllabus

Skip Syllabus

Week 1: Crowdsourcing for High-quality Data Collection and The ImageNet Story

Artificial Intelligence is at the center of many recent advancements across areas such as transportation and finance. One of the reasons for this is that in the past decade we have designed methods to harness human intelligence at scale.

We will introduce and discuss the crowdsourcing paradigm and the importance of high-quality data.