βMicrosoft on Team DS Lifecycle - "The Team Data Science Process (TDSP) is an agile, iterative data science methodology to deliver predictive analytics solutions and intelligent applications efficiently. TDSP helps improve team collaboration and learning by suggesting how team roles work best together. TDSP includes best practices and structures from Microsoft and other industry leaders to help toward successful implementation of data science initiatives. The goal is to help companies fully realize the benefits of their analytics program.
This article provides an overview of TDSP and its main components. We provide a generic description of the process here that can be implemented with different kinds of tools. A more detailed description of the project tasks and roles involved in the lifecycle of the process is provided in additional linked topics. Guidance on how to implement the TDSP using a specific set of Microsoft tools and infrastructure that we use to implement the TDSP in our teams is also provided."
"When I used to do consulting, Iβd always seek to understand an organizationβs context for developing data projects, based on these considerations:
Strategy: What is the organization trying to do (objective) and what can it change to do it better (levers)?
Data: Is the organization capturing necessary data and making it available?
Analytics: What kinds of insights would be useful to the organization?
Implementation: What organizational capabilities does it have?
Maintenance: What systems are in place to track changes in the operational environment?
Constraints: What constraints need to be considered in each of the above areas?"
(really good) Practical advice for analysis of large, complex data sets - distributions, outliers, examples, slices, metric significance, consistency over time, validation, description, evaluation, robustness in measurement, reproducibility, etc.