AnalyzeMyData: Z-Score Normalization (StandardScaler)

Step one: Select a .csv format file.

Step two: Download the file after the process is completed.

Z-score normalization (standardization) is a statistical technique used to transform data so that it has a mean of 0 and a standard deviation of 1. The technique is also available in the scikit-learn library under the StandardScaler name.This is achieved by subtracting the man of the dataset from each data point and then dividing the result by the standard deviation of the dataset. The formula for Z-score normalization can be written as: \begin{equation} z = \frac{x-\mu}{\sigma} \end{equation} where: \(z\) is the Z-score, the \(x\) is the data point, \(\mu\) is the mean of the dataset feature/variable, and \(\sigma\) is the standard deviation of the dataset feature/variable.

Steps for performing Z-score normalization

First step is to calculate the mean value of the dataset feature/variable. \begin{equation} \mu = \frac{1}{N}\sum_{i=1}^N x_i \end{equation} where \(N\) is the number of data points and \(x_i\) represents each data point. Second step is to calculate the standard deviation \(\sigma\). \begin{equation} \sigma = \sqrt{\frac{1}{N}\sum_{i=1}^N (x_i-\mu)^2} \end{equation} Third and final step is to standardize all the values for feature/variable. \begin{equation} z_i = \frac{x_i-\mu}{\sigma} \end{equation} where \(z_i\) is the Z-score of the i-th data sample.

Example - Z-score normalization

Perform the z-score normalization for the following array: \begin{equation} x = [10,20,30,40,50] \end{equation} To perform the z-score normalization the first step is to calculate the mean of the array. \begin{equation} \mu = \frac{10+20+30+40+50}{5} = \frac{150}{5} = 30, \end{equation} The second step is to calculate the standard deviation (\(\sigma\)). \begin{eqnarray} \sigma &=& \sqrt{\frac{(10-30)^2 + (20-30)^2 + (30-30)^2 + (40-30)^2 + (50-30)^2}{5}}\\ \sigma &=& \sqrt{\frac{400+100+0+100+400}{5}}\\ \sigma &=& \sqrt{\frac{1000}{5}} = \sqrt{200} = 14.14. \end{eqnarray} After mean and standard deviation of the array were calculated the Z-score normalization can be performed. \begin{eqnarray} z_1 &=& \frac{10-30}{14.14} = -1.41\\ z_2 &=& \frac{20-30}{14.14} = -0.71\\ z_3 &=& \frac{30-30}{14.14} = 0\\ z_4 &=& \frac{40-30}{14.14} = 0.71\\ z_5 &=& \frac{50-30}{14.14} = 1.41 \end{eqnarray} So the standardize dataset (Z-scores) are equal to: \begin{equation} z = [-1.41, -0.71, 0, 0.71, 1.41] \end{equation}

Uses and Importance

The Z-score normalization is very useful in machine learning algorithms that assume or perform better when the data is normally distributed and centered around 0 with a standard deviation of 1 such as k-nearest neighbors and principle component analysis. The Z-scores can be used to identify outliers in the data, as the values far from 0 indicate usual data points.
The Z-score normalization ensures that different features in a dataset contribute equally to the analysis, preventing features with larger ranges from dominating those with smaller ranges.

AnalyzeMyData

ponedjeljak, 29. srpnja 2024.

Z-Score Normalization (StandardScaler)

Steps for performing Z-score normalization

Example - Z-score normalization

Uses and Importance

Nema komentara:

Objavi komentar

CSV to SQL Converter

Prijavi zloupotrebu

Oznake