Data Scientist's Toolbox

2017-07-28  本文已影响0人  猪努力

【W1-01】 Specialization Motivation

About this course

The key word in data science is "science", not "data"

1 【W1-01】Specialization Motivation

1.1 Why do data science

Credits blongs the person who's actually trying to ==get things done==, even when there are obstacles in the way.

It's important to strive the valiantly do these sorts of things, even if you're going to take some criticism.

1.2 The key challenge of Data Science

The heart of philosophy about data science is ==answering question with data==. The question should come first and then data follow after.

Answering the question that you are interested in, and with the data that you have.

1.3 Why data science

1.4 Why statistical data science

1.5 Why now

1.6 Why R

1.7 Who is data scientist

1.8 Goal

Data science Venn diagram

2 [W1-02] The Toolbox

2.1 What data scientist do

2.2 Main workinghorse

3 [W1-03] Getting Help and Finding Answers

3.1 Asking questions

3.1.1 How to ask an R question.

3.1.2 How to ask a data analysis question

3.2 Find the answer for yourself

3.3 Getting help with R ( see Evernote )

3.4 Key characters of hacker

3.5 How to search

4 Types of Data Science Questions

4.1 Descriptive analysis

Goal: Describe a set of data

4.2 Exploratory analysis

Goal: Find new relationships but not necessarily confirm them

4.3 Inferential Analysis

Goal: Extrapolate or generalize a small sample of data to a large population

4.4 Predictive Analysis

Goal: To use the value on some objects to predict values for another object

4.5 Causal Analysis

Goal: To find out what happends to one variable when you change another variable

4.6 Mechanistic Analysis

Goal: Understand the exact changes in variables that lead to changes in other variables for individual object

5 What is Data

5.1 Definition of Data

Data are values of qualitative or quantitative variable, belonging to a set of items

* set of items: Sometimes called the population; the set of objects you are interested in; a set of things you make measurement on
* variables: A measurement or characteristic of an item
* qualitative: not necessrily orderd and not necessarily measured in scale
* quanlitative: usually measured on a continuous scale, and have an ordering on that scale

5.2 Data is the Second Most Important Thing in Data Science

6 What about Big Data

7 Experimental Design

7.1 Why should we care

A exciting result can lead you astray if you are not very careful about experimental design and analysis

Be aware of when performing experimental design or data science project:

7.2 Formulate your question in advance

7.3 Statistical inference

7.3.1

image.png

7.3.2 Confunding and spurious correlation

7.3.3 Deal with potential confounders: Randomization and Blocking

7.4 Prediction

7.4.1

image.png

7.4.2 Prediction versus inference

7.4.3 Prediction key quantities

image.png

7.5 Data dredging

Data dredging (also data fishing, data snooping, and p-hacking) is the use of data mining to uncover patterns in data that can be presented as statistically significant, without first devising a specific hypothesis as to the underlying causality.

上一篇 下一篇

猜你喜欢

热点阅读