AnalyzeMyData: Normalization to a unit norm

On this web-page you can perform unit vector transformation a.k.a. the normalization to a unit norm of all dataset variables. The dataset is uploaded by clicking on the Choose File button. After clicking on that button the Open pop-up window will appear. All you have to do is locate the file on you computer and click Open. After clicking "Open" the web-application will automatically normalize each of dataset variables.
Important: At the moment the dataset must be uploaded in .csv format only.

Step one: Select a .csv format file.

Step two: Download the file after the process is completed.

The unit vector transformation a.k.a. the normalization to a unit norm, is a technique used to scale data such that each data sample (or feature vector) has a unit norm. This is useful when the direction of the data samples is more important than their magnitude, such as in text classification, clustering, or any machine learning algorithms that are sensitive to the scale of the features.

Description of the normalization to a unit norm

The goal of the unit vector normalization is to adjust the length of each vector to be 1, without changing its direction. This transformation ensures that the vector lies on the surface of a unit hypersphere centered at the origin.
For a given vector (array) in n-dimensional space, the unit vector \(\textbf{u}\) can be computed using following formula: \begin{equation} \textbf{u} = \frac{\textbf{x}}{||\textbf{x}||} \end{equation} where \(||\textbf{x}||\) is the norm/length of the vector \(\textbf{x}\). The most commonly used norm is the L2 norm, which is defined as: \begin{equation} ||\textbf{x}||_2 = \sqrt{\sum_{i=1}^{n} x_i^2} \end{equation} where \(n\) is the total number of samples in the vector/array.

Steps to apply unit vector normalization

Two steps are required to apply the unit vector normalization i.e. compute the norm, and divide by the norm. The first step is to calculate the norm of the vector \(x\). For L2 normalization this is the calculation of \(||\textbf{x}||_2\). Then in second step, divide each component of the vector by the computed norm to get the normalized vector.

The advantages and disadvantages of unit vector normalization

The advantages of the unit vector normalization are:

Magnitude independence - which ensures that the magnitude of the data points does not affect the resits of algorithms that are sensitive to scale.
Improved convergence - can help to improve the convergence rates of the optimization algorithms by reduce the influence of feature scaling on the model training process.

The disadvantages or limitations of the unit vector normalization are:

Information loss - normalizing to unit norm might not be appropriate if the magnitude of the data is important for the analysis or model.
Sparse data - For sparse data, normalization might lead to dense vectors, which could affect performance depending on the algorithm used.

Example 1: Unit vector normalization

In this example let's transform the 3-dimensional vector \(x\) using unit vector normalization. The 3-dimensional vector is: \begin{equation} \textbf{x} = [3,4,0] \end{equation}

Step 1: Compute the norm

In this step let's calculate the L2 norm (Euclidean norm) of the vector \(\textbf{x}\). The L2 norm can be calculated using following formula: \begin{equation} ||\textbf{x}|_2 = \sqrt{x_1^2 + x_2^2 + x_3^2} \end{equation} Substituting the values from the 3-dimensional vector into the previous equation we can calculate the L2 norm. \begin{equation} ||\textbf{x}||_2 =\sqrt{3^2 + 4^2 + 0^2} = \sqrt{9+16+0} = \sqrt{25} = 5. \end{equation}

Step 2: Normalize the vector

Divide each component of the vector \(x\) by the computed norm to get the unit vector \(u\). \begin{equation} \textbf{u} = \frac{\textbf{x}}{||\textbf{x}||_2} \end{equation} \begin{equation} u_1 = \frac{3}{5} = 0.6, \end{equation} \begin{equation} u_2 = \frac{4}{5} = 0.8, \end{equation} \begin{equation} u_3 = \frac{0}{5} = 0. \end{equation} The normalized vector \(\textbf{u}\) is equal to: \begin{equation} \textbf{u} = [0.6,0.8,0] \end{equation}

Verification of results

To verify that the vector is normalized, check if the norm is equal to 1. \begin{equation} ||\textbf{u}||_2 = \sqrt{0.6^2 + 0.8^2 + 0^2} =\sqrt{0.36 + 0.64 + 0} = \sqrt{1} = 1 \end{equation} From the verification of the obtained results it can be seen that the normalization was done correctly, and the vector [0.6,0.8,0] is indeed a unit vector with a norm of 1.

Example 2: Unit vector normalization

In this example let's transform the 5-dimensional vector \(x\) using unit vector normalization. The 5-dimensional vector can be written as: \begin{equation} \textbf{x} = [7,-2,5,1,3] \end{equation}

Step 1: Compute the norm

In this step let's calculate the L2 norm (Euclidean norm) of the vector \(\textbf{x}\). The L2 norm can be calculated using following formula: \begin{equation} ||\textbf{x}|_2 = \sqrt{x_1^2 + x_2^2 + x_3^2+x_4^2 + x_5^2} \end{equation} Substituting the values from the 3-dimensional vector into the previous equation we can calculate the L2 norm. \begin{equation} ||\textbf{x}||_2 =\sqrt{7^2 + (-2)^2 + 5^2 + 1^2 + 3^2} = \sqrt{49+4+25+1+9} = \sqrt{88} = 9.38. \end{equation}

Step 2: Normalize the vector

Divide each component of the vector \(x\) by the computed norm to get the unit vector \(u\). \begin{equation} \textbf{u} = \frac{\textbf{x}}{||\textbf{x}||_2} \end{equation} \begin{equation} u_1 = \frac{7}{9.38} = 0.746, \end{equation} \begin{equation} u_2 = \frac{-2}{9.38} = -0.213, \end{equation} \begin{equation} u_3 = \frac{5}{9.38} = 0.533, \end{equation} \begin{equation} u_4 = \frac{1}{9.38} = 0.107, \end{equation} \begin{equation} u_5 = \frac{3}{9.38} = 0.302, \end{equation} The normalized vector \(\textbf{u}\) is equal to: \begin{equation} \textbf{u} = [0.746,-0.213,0.533,0.107,0.320] \end{equation}

Verification of results

To verify that the vector is normalized, check if the norm is equal to 1. \begin{equation} ||\textbf{u}||_2 = \sqrt{0.746^2 + (-0.213)^2 + 0.533^2 + 0.107^2 + 0.320^2} =\sqrt{0.998} \approx 1. \end{equation} From the conducted investigation it can be concluded that the normalization was done correctly, and the vector \([0.746,-0.213,0.533,0.107,0.320]\) is indeed a unit vector with a norm very close to 1.

AnalyzeMyData

četvrtak, 1. kolovoza 2024.

Normalization to a unit norm

Description of the normalization to a unit norm

Steps to apply unit vector normalization

The advantages and disadvantages of unit vector normalization

Example 1: Unit vector normalization

Step 1: Compute the norm

Step 2: Normalize the vector

Verification of results

Example 2: Unit vector normalization

Step 1: Compute the norm

Step 2: Normalize the vector

Verification of results

Nema komentara:

Objavi komentar

CSV to SQL Converter

Prijavi zloupotrebu

Oznake