Data Analytics Lifecycle
Value
Time-focused
Easy transition of the Project
Repeatable and validable
Lifecycle
1. Discovery:
Learn
the business domain to determine the general problem type and learn from the past relevant experience;
Access
available technology, raw data, right people and time scope
Formulate
initail hypothesis,
2. Data Preparation
Prepare work space (the analytic sandbox )
Preform ELT(Extract Load Transform Data)
Understand the data: Compare What you needVSwhat you have
Clean & Normalize data
Decriptive Statistics & Visualize to have an overview of the data qulity
3. Model Planing
Select methods based on data volume and structure, hypothesis and bussiness objectives
Determine workflow of candidate tests
Identify modeling assumptions
Explore dastaset and select significant variables via certain dimension reduction method
4. Model Building
Split the availble data into training data and test data
Get best environment to run the model(fast hardware, parallel)
5. Communicate Result
Interpret the results to identify key findings
Quantify bussiness value acording to the customers
6. Operationalize
Run a pilot to assess the benefits
Deliver and excute the final result in operation
Define process to improve the model as needed
Key roles for a Anylytic Project
Bussiness User
Project Sponsor
Project Manager
Business Intelligence Analyst: Usually come from the customer company with domain expertise so that they have deep understanding of the data, APIs
Data Engineer: with deep technical skills such as SQL queries and extraction data for analyse
DBA: who configures database enviroment to support analytic needs
Data Scientist: conduct data modeling and valid analyse to meet overall analytic objectives
Deliverables to meet stakeholders' needs
Presentation for Sponsors:
Big picture takeaways and key messages aiding decision-making
clean and easy visulization to understand
Presentation for Analysts
Bussiness Process changes
Reporting changes
More technical graphs (ROC curves, density plots,histograms)