Obtaining data

How long should you spend obtaining data?

Data inventory

Source Amount Cost Time
Owned 100 hrs $0 0
Crowdsourced - Reading 1000 hrs $10K 14 days
Pay for labels 100 hrs $6K 7 days
Purchase data 1000 hrs $10K 1 day

Labeling data

Data pipelines

Data pipeline example:

Screen Shot 2022-05-02 at 12.25.44 PM

Meta-data, data provenance and lineage

Data pipeline example:

Screen Shot 2022-05-02 at 12.41.24 PM

Balanced train/dev/test splits