On this web-page you can perform unit vector transformation a.k.a. the normalization to a unit norm of all dataset variables. The dataset is uploaded by clicking on the Choose File button. After clicking on that button the Open pop-up window will appear. All you have to do is locate the file on you computer and click Open. After clicking "Open" the web-application will automatically normalize each of dataset variables.
Important: At the moment the dataset must be uploaded in .csv format only.
Important: At the moment the dataset must be uploaded in .csv format only.
0% |
The unit vector transformation a.k.a. the normalization to a unit norm, is a technique used to scale data such that each data sample (or feature vector) has a unit norm. This is useful when the direction of the data samples is more important than their magnitude, such as in text classification, clustering, or any machine learning algorithms that are sensitive to the scale of the features.
The goal of the unit vector normalization is to adjust the length of each vector to be 1, without changing its direction. This transformation ensures that the vector lies on the surface of a unit hypersphere centered at the origin.
For a given vector (array) in n-dimensional space, the unit vector \(\textbf{u}\) can be computed using following formula: \begin{equation} \textbf{u} = \frac{\textbf{x}}{||\textbf{x}||} \end{equation} where \(||\textbf{x}||\) is the norm/length of the vector \(\textbf{x}\). The most commonly used norm is the L2 norm, which is defined as: \begin{equation} ||\textbf{x}||_2 = \sqrt{\sum_{i=1}^{n} x_i^2} \end{equation} where \(n\) is the total number of samples in the vector/array.
Two steps are required to apply the unit vector normalization i.e. compute the norm, and divide by the norm. The first step is to calculate the norm of the vector \(x\). For L2 normalization this is the calculation of \(||\textbf{x}||_2\). Then in second step, divide each component of the vector by the computed norm to get the normalized vector.
The advantages of the unit vector normalization are:
In this example let's transform the 3-dimensional vector \(x\) using unit vector normalization.
The 3-dimensional vector is:
\begin{equation}
\textbf{x} = [3,4,0]
\end{equation}
In this step let's calculate the L2 norm (Euclidean norm) of the vector \(\textbf{x}\). The L2 norm can be calculated using following formula:
\begin{equation}
||\textbf{x}|_2 = \sqrt{x_1^2 + x_2^2 + x_3^2}
\end{equation}
Substituting the values from the 3-dimensional vector into the previous equation we can calculate the L2 norm.
\begin{equation}
||\textbf{x}||_2 =\sqrt{3^2 + 4^2 + 0^2} = \sqrt{9+16+0} = \sqrt{25} = 5.
\end{equation}
Divide each component of the vector \(x\) by the computed norm to get the unit vector \(u\).
\begin{equation}
\textbf{u} = \frac{\textbf{x}}{||\textbf{x}||_2}
\end{equation}
\begin{equation}
u_1 = \frac{3}{5} = 0.6,
\end{equation}
\begin{equation}
u_2 = \frac{4}{5} = 0.8,
\end{equation}
\begin{equation}
u_3 = \frac{0}{5} = 0.
\end{equation}
The normalized vector \(\textbf{u}\) is equal to:
\begin{equation}
\textbf{u} = [0.6,0.8,0]
\end{equation}
To verify that the vector is normalized, check if the norm is equal to 1.
\begin{equation}
||\textbf{u}||_2 = \sqrt{0.6^2 + 0.8^2 + 0^2} =\sqrt{0.36 + 0.64 + 0} = \sqrt{1} = 1
\end{equation}
From the verification of the obtained results it can be seen that the normalization was done correctly, and the vector [0.6,0.8,0] is indeed a unit vector with a norm of 1.
In this example let's transform the 5-dimensional vector \(x\) using unit vector normalization.
The 5-dimensional vector can be written as:
\begin{equation}
\textbf{x} = [7,-2,5,1,3]
\end{equation}
In this step let's calculate the L2 norm (Euclidean norm) of the vector \(\textbf{x}\). The L2 norm can be calculated using following formula:
\begin{equation}
||\textbf{x}|_2 = \sqrt{x_1^2 + x_2^2 + x_3^2+x_4^2 + x_5^2}
\end{equation}
Substituting the values from the 3-dimensional vector into the previous equation we can calculate the L2 norm.
\begin{equation}
||\textbf{x}||_2 =\sqrt{7^2 + (-2)^2 + 5^2 + 1^2 + 3^2} = \sqrt{49+4+25+1+9} = \sqrt{88} = 9.38.
\end{equation}
Divide each component of the vector \(x\) by the computed norm to get the unit vector \(u\).
\begin{equation}
\textbf{u} = \frac{\textbf{x}}{||\textbf{x}||_2}
\end{equation}
\begin{equation}
u_1 = \frac{7}{9.38} = 0.746,
\end{equation}
\begin{equation}
u_2 = \frac{-2}{9.38} = -0.213,
\end{equation}
\begin{equation}
u_3 = \frac{5}{9.38} = 0.533,
\end{equation}
\begin{equation}
u_4 = \frac{1}{9.38} = 0.107,
\end{equation}
\begin{equation}
u_5 = \frac{3}{9.38} = 0.302,
\end{equation}
The normalized vector \(\textbf{u}\) is equal to:
\begin{equation}
\textbf{u} = [0.746,-0.213,0.533,0.107,0.320]
\end{equation}
To verify that the vector is normalized, check if the norm is equal to 1.
\begin{equation}
||\textbf{u}||_2 = \sqrt{0.746^2 + (-0.213)^2 + 0.533^2 + 0.107^2 + 0.320^2} =\sqrt{0.998} \approx 1.
\end{equation}
From the conducted investigation it can be concluded that the normalization was done correctly, and the vector \([0.746,-0.213,0.533,0.107,0.320]\) is indeed a unit vector with a norm very close to 1.
Description of the normalization to a unit norm
The goal of the unit vector normalization is to adjust the length of each vector to be 1, without changing its direction. This transformation ensures that the vector lies on the surface of a unit hypersphere centered at the origin.
For a given vector (array) in n-dimensional space, the unit vector \(\textbf{u}\) can be computed using following formula: \begin{equation} \textbf{u} = \frac{\textbf{x}}{||\textbf{x}||} \end{equation} where \(||\textbf{x}||\) is the norm/length of the vector \(\textbf{x}\). The most commonly used norm is the L2 norm, which is defined as: \begin{equation} ||\textbf{x}||_2 = \sqrt{\sum_{i=1}^{n} x_i^2} \end{equation} where \(n\) is the total number of samples in the vector/array.
Steps to apply unit vector normalization
Two steps are required to apply the unit vector normalization i.e. compute the norm, and divide by the norm. The first step is to calculate the norm of the vector \(x\). For L2 normalization this is the calculation of \(||\textbf{x}||_2\). Then in second step, divide each component of the vector by the computed norm to get the normalized vector.
The advantages and disadvantages of unit vector normalization
The advantages of the unit vector normalization are:
- Magnitude independence - which ensures that the magnitude of the data points does not affect the resits of algorithms that are sensitive to scale.
- Improved convergence - can help to improve the convergence rates of the optimization algorithms by reduce the influence of feature scaling on the model training process.
- Information loss - normalizing to unit norm might not be appropriate if the magnitude of the data is important for the analysis or model.
- Sparse data - For sparse data, normalization might lead to dense vectors, which could affect performance depending on the algorithm used.