본문 바로가기

인생 살기/통계학6

Extrapolation and Effects of Outliers on Regression Line Extrapolation(외삽법) Making predictions outside the range of data provided by using statistical calculations(usually regression line). Extrapolation should be avoided if possible. Effect of outliers in regression Different types of outliers have different effect on regression line. - If the outlier’s x value is outside the cluster’s x value range, it is called outlier in the x direction. - If the .. 2022. 8. 2.
Residuals Residual(잔차) Shows how far off the predicted value is from the actual value. Also can be said as error in the prediction. If a residual is positive (positive residual), the predicted value is smaller than the actual value. If a residual is negative (negative residual), the predicted value is bigger than the actual value. 2022. 8. 2.
Regression and R-squared Regression(회귀) line Drawing a line that represents a pattern in data. Regression line predicts the change in y when x increases by one unit. b0: intercept b1: slope. y hat: predicted value of y R-squared Literally means the square of r. (r: correlation variable) Has value between 0 and 1. Meaning: measure of how close each data point fits to the regression line. Tells us how well the regression .. 2022. 8. 2.
Correlation Correlation(상관) Linear relationship between two quantitative variables. Correlation variable r shows the direction and strength of the relationship. - Direction: Positive r means that if one variable increases, the other variable will increase too. Negative r means that if one variable increases, the other variable will decrease. - Strength: If the data points are all in one line, r is equal to .. 2022. 8. 2.
Symmetry and Skewness Symmetry(대칭) Shape of distribution is thought to be symmetrical if it can be divided into two parts that looks equal. Skeweness(비대칭도, 왜도) Shape of distribution is thought to be skewed if it is asymmetrical. Skewed to the left means that most of the values in data is clustered on the bigger side. In this case, the mean is less than the median. Skewed to the right means that most of the values in .. 2022. 8. 2.
Five number summary It is, literally, a summary of data by using five numbers: Minimum, 1st percentile, Median, 3rd percentile, and Maximum. Box Plot A box plot is drawn using these five numbers: Outliers in Box Plot In a box plot, outliers are defined as value Q3 + 1.5 * IQR 2022. 8. 2.