A common misconception about Big Data is that it is a black box: you load data and magically gain insight. This is not the case. As this New York Times article “For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights” describes, loading a big data platform with quality data with enough structure to deliver value is a lot of work. Data scientist spend a comparatively large amount of time in the data preparation phase of a project. Whether you call it data wrangling, data munging, or data janitor work, the Times article estimates 50%-80% of a data scientists’ time is spent on data preparation. We agree.
Big data requires lots of prep!