Main 8 Descriptive Statistics
Main 8 Descriptive Statistics
This article "
Main 8 Descriptive Statistics" will hep you to understand main descriptive statistics and what are the four descriptive statistics.

Main 8 Descriptive Statistics

Introduction

Descriptive statistics are important for the analysis and interpretation of data. The main objective of any statistical testing is to summarize the characteristics of a population into an inferential statistic.

Descriptive statistics are also used for comparing or contrasting groups of samples which have been taken for testing, such as to compare the proportions of income levels among individuals.

The sample sizes and distributions are commonly represented by means of proportions (p), mean (m), standard deviation (sd), variance (v), and range (r). Common data formats are:

Standard deviation, proportionality and mean (m)

Standard deviation, mean and range (m), minimum (mmin), maximum (mmax)

Variance, mean and range (v)

Skewness

Std. Error of Skewness

Kurtosis

Std. Error of Kurtosis

Range

Range, mean and standard deviation (m)

Range, sample size and mean are always expressed in relation to the central tendency and thus range, mean and std. error of range are always negative values. A positive value indicates that the values do not necessarily represent the mean. Normal distribution is one of the most common types of distributions. It is usually assumed that the data set contains normally distributed variables, but can be non-normally distributed. Most importantly, if there is more than one variable it needs to be converted in order to make it normal. An example of this conversion is shown at the end of our article.

There are two major kinds of ranges which may exist:

Normal range: it means that each value represents its own probability of occurring at a particular point. Usually, it is not considered important since it does not affect the results of the tests. This kind of range is used when we cannot observe all the points.

Normalized range: it is used in order to summarize the entire values by subtracting the mean from the sample's mean. So for a range, all values from the same range are put in equal intervals.

It can be very helpful when estimating or performing linear regression and other statistical modeling studies. Ranges are useful when you are using multiple ranges on the same dataset. For instance, using ranges to check your model would be faster, less error prone, and allow for better comparisons between models than just looking up the range. So for large datasets, using normalized ranges can save you hours of work.

Scatter plots are generally used to visualize relationships. scatter plots help us visualize trends or differences between two or more variables which is the plot on the left side of a graph. To understand what kind of relationship we are trying to measure, let’s take again a look at some scatterplots. Let’s say that we are trying to measure how much people earn and want to know whether there are significant correlations, what kind of relationships are represented by these lines on these plots? We can do this with the help of scatter plots. As soon as the variables are identified with their corresponding axes, scatterplots can be constructed based on the correlation matrix by selecting different combinations of correlated input/output values.

Correlation matrix allows us to see the correlations between various variables and, in this case, our dependent variable: earnings.

Correlation Matrix

The scatterplots with their respective correlation values can be viewed as scatter plots without their respective axes. When plotting variables that do not have axes, in case of the scatterplot with the independent variable x, we need to add on the independent variable. The line is drawn on the graph, which has a specific direction. If we think about the x, the direction can be defined by two directions – up and down.

For example, let's say our independent variable x has 3 dimensions. If the scatter plot with the x has only 2 directions (up and down) then then the first place is in x's direction. But the second place can only come from another variable.

The three variables that are plotted on the scatterplot can have different meanings. By adding another variable, we can get a new direction. Similarly, multiplying one variable by another will lead to a third different direction on the graph. So using scatterplots with axes, it is important to remember that these directions can be reversed (changing the direction of the line to the original), so it’s better to identify them with the help of coefficients.

Scatterplot with coefficients

What are parameters in the scatterplot with their coefficients? Coefficient represent weights of each factor of our hypothesis. In simple terms, they are the weights. Our hypotheses are usually made up of many factors. That is why it is called a multi-factor test and our output statistic is also known as a coefficient.

For example, the number of students who are going to graduate from college is usually represented by the coefficient of graduation. And this variable would be measured by adding the number of students who graduated from college to 100. Or in the case of a medical exam we usually measure the level of performance (performance is usually represented by the coefficient of exam performed) but our evaluation of student is done through marks. Another type of variable is the time period when the patient has been diagnosed with certain disease. In such cases, we measure several factors – age, profession of the respondent etc.

So, we have learned quite a bit about statistics by now. Now we only need to learn how to use the data and interpret it on further analyses. One more word: Statistical significance is concerned about the effects of changes in samples and variables on the outputs (coefficient). Is it possible to detect statistically significant effect by comparing the control group's output - the outcome variable - with the response variable - the independent variable? Here i would also like to tell you about the significance test. There are some steps when you should follow in order to run and have a reliable and valid significance test.

Defining a null hypothesis

H0 = β = μ=0

In general, we cannot accept the true value of a parameter - β. To prove H0 in the experiment one must show the effect of change in parameter (h). To show a statistically significant effect of increase in a parameter one has only to find out the effect size, p-value and the significance level. We must define a null hypothesis, h0 = β = μ=0. The null hypothesis can be rejected by running the ANOVA test. But the null hypothesis has two forms: h1 = β = h0+ 1. This means that the value of parameter can go up or down. But what are the types of tests for that purpose? I would like to discuss them with the help of a few examples. First, let’s give an example of a linear regression. Here we have two parameters (x and y) and our target variable (z).

Example of Linear Regression:

In this example we are dealing with a linear regression case. What we want to measure is the best estimate of the parameters for our model. In our example, we want to know how much our model can predict a person's income from previous years' income and with the help of our independent variables (x and y).

Our goal is to find optimal parameters that allows us to maximize the revenue as well as minimize the costs. How to formulate the problem is a straight-line regression and we are going to perform the least squares method. In our case, we have to choose our model parameters and find out the value of our beta coefficients. By doing so we just need to calculate the slope of our line, where our model can make predictions in future. Our graph shows data for both the independent variables (x) and our target variable.

Now it’s time to calculate our estimates of the factors that affect our model’s prediction. We’ve got the following formula:

To sum up:

We see the intercept – β = (x+ y) /2.

β = (x+ y) * (2 – (x) + (y)

The parameters of our model are calculated with the help of the formula above.

Now you can compute the goodness-of-fit of our model with the help of the R-Squared statistic, F-score, Standard Deviation of residuals or Root Mean Squared Error, etc.

For further discussion on the importance of good statistical analysis make sure to visit my blog. Hope you enjoyed reading!