some of the most important mathematical formulas commonly used in data science (DS)

时间: 2024-09-17 03:29:47

Here’s a list of some of the most important mathematical formulas commonly used in data science (DS) along with a brief explanation for some of them:

1. Gradient Descent:

$ \theta_j = \theta_j - \alpha \nabla J(\theta_j) $

A method used for minimizing a function by iteratively moving in the direction of steepest descent.

2. Normal Distribution:

$ f(x | \mu, \sigma^2) = \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{(x - \mu)^2}{2\sigma^2}} $

Describes how the values of a variable are distributed, commonly used in statistics.

3. Z-score:

$ z = \frac{x - \mu}{\sigma} $

Measures the number of standard deviations a data point is from the mean.

4. Sigmoid Function:

$ \sigma(x) = \frac{1}{1 + e^{-x}} $

A function that maps any real-valued number into a value between 0 and 1, often used in logistic regression.

5. Correlation:

$ \text{Correlation} = \frac{\text{Cov}(X, Y)}{\text{Std}(X) \cdot \text{Std}(Y)} $

A statistical measure that describes the extent to which two variables change together.

6. Cosine Similarity:

$ \text{similarity} = \frac{A \cdot B}{||A|| ||B||} $

A metric used to measure how similar two vectors are, ranging from -1 to 1.

7. Naive Bayes:

$ P(y|x_1,...,x_n) = \frac{P(y) \prod_{i=1}^{n} P(x_i|y)}{P(x_1,...,x_n)} $

A classification technique based on Bayes' theorem, assuming independence among predictors.

8. Maximum Likelihood Estimation (MLE):

$ \arg\max_{\theta} \prod_{i=1}^{n} P(x_i|\theta) $

A method for estimating the parameters of a statistical model.

9. Ordinary Least Squares (OLS):

$ \beta = (X^TX)^{-1}X^Ty $

A method for estimating the coefficients of a linear regression model.

10. F1 Score:

$ \text{F1} = 2 \cdot \frac{P \cdot R}{P + R} $

A measure of a test's accuracy that considers both the precision (P) and the recall (R).

11. ReLU (Rectified Linear Unit):

$ \text{ReLU}(x) = \max(0, x) $

An activation function widely used in neural networks.

12. Softmax Function:

$ P(y = j|x) = \frac{e^{z_j}}{\sum_{k} e^{z_k}} $

Converts logits to probabilities that sum to 1 for multi-class classification.

13. R² Score:

$ R^2 = 1 - \frac{\sum_{i=1}^{n} (y_i - \hat{y}_i)^2}{\sum_{i=1}^{n} (y_i - \bar{y})^2} $

Indicates how well the regression model approximates the real data points.

14. Mean Squared Error (MSE):

$ \text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 $

Measures the average squared difference between actual and predicted values.

15. MSE with L2 Regularization:

$ \text{MSE}_{\text{regularized}} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 + \lambda \sum_{j=1}^{p} \beta_j^2 $

A regularized version of MSE to prevent overfitting.

16. Eigenvectors and Eigenvalues:

$ Av = \lambda v $

Describes properties of linear transformations in vector spaces.

17. Entropy:

$ \text{Entropy} = -\sum_{i} p_i \log_2(p_i) $

A measure of the uncertainty or randomness in a probability distribution.

18. K-Means Clustering:

$ \arg\min_{S} \sum_{i=1}^{k} \sum_{x \in S_i} ||x - \mu_i||^2 $

A method used to partition a dataset into k clusters.

19. Kullback-Leibler Divergence (KL Divergence):

$ D_{KL}(P||Q) = \sum_{x \in X} P(x) \log\left(\frac{P(x)}{Q(x)}\right) $

A measure of how one probability distribution diverges from a second reference probability distribution.

20. Log Loss:

$ \text{LogLoss} = -\frac{1}{N} \sum_{i=1}^{N} \left( y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i) \right) $

A measure of performance for a classification model where the prediction is a probability value between 0 and 1.

21. Support Vector Machine (SVM):

$ \min \frac{1}{2} ||w||^2 + C \sum_{i=1}^{N} \max(0, 1 - y_i(w \cdot x_i - b)) $

A supervised learning model used for classification and regression analysis.

22. Linear Regression:

$ y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + ... + \beta_n x_n + \epsilon $

A statistical method for modeling the relationship between a dependent variable and one or more independent variables.

23. Singular Value Decomposition (SVD):

$ A = UDV^T $

A factorization of a real or complex matrix, used in linear algebra and data compression.

24. Lagrange Multiplier:

$ \max f(x); \quad g(x) = 0 $

A strategy for finding the local maxima and minima of a function subject to equality constraints.

25. Lagrange Function:

$ L(x, \lambda) = f(x) - \lambda g(x) $

A function used in optimization to find the extremes of one function given constraints from another.

Feel free to dive deeper into any of these concepts for a better understanding!