Addressing data mismatch

If your training set comes from a different distribution, than your dev and test set, and if error analysis shows you that you have a data mismatch problem, what can you do?

Artificial data synthestis

Example 1

Let’s say you have 10,000 hours of audio data and 1 hour of car noise

Example 2

You could use computer graphics to synthesize car pictures, but again this results in overfitting the model to the set of computer graphics.