some of the most important mathematical formulas commonly used in data science (DS)
时间: 2024-09-17 03:29:47
Here’s a list of some of the most important mathematical formulas commonly used in data science (DS) along with a brief explanation for some of them:
1. Gradient Descent:
$ \theta_j = \theta_j - \alpha \nabla J(\theta_j) $
A method used for minimizing a function by iteratively moving in the direction of steepest descent.
2. Normal Distribution:
$ f(x | \mu, \sigma^2) = \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{(x - \mu)^2}{2\sigma^2}} $
Describes how the values of a variable are distributed, commonly used in statistics.
3. Z-score:
$ z = \frac{x - \mu}{\sigma} $
Measures the number of standard deviations a data point is from the mean.
4. Sigmoid Function:
$ \sigma(x) = \frac{1}{1 + e^{-x}} $
A function that maps any real-valued number into a value between 0 and 1, often used in logistic regression.
5. Correlation:
$ \text{Correlation} = \frac{\text{Cov}(X, Y)}{\text{Std}(X) \cdot \text{Std}(Y)} $
A statistical measure that describes the extent to which two variables change together.
6. Cosine Similarity:
$ \text{similarity} = \frac{A \cdot B}{||A|| ||B||} $
A metric used to measure how similar two vectors are, ranging from -1 to 1.
7. Naive Bayes:
$ P(y|x_1,...,x_n) = \frac{P(y) \prod_{i=1}^{n} P(x_i|y)}{P(x_1,...,x_n)} $
A classification technique based on Bayes' theorem, assuming independence among predictors.
8. Maximum Likelihood Estimation (MLE):
$ \arg\max_{\theta} \prod_{i=1}^{n} P(x_i|\theta) $
A method for estimating the parameters of a statistical model.
9. Ordinary Least Squares (OLS):
$ \beta = (X^TX)^{-1}X^Ty $
A method for estimating the coefficients of a linear regression model.
10. F1 Score:
$ \text{F1} = 2 \cdot \frac{P \cdot R}{P + R} $
A measure of a test's accuracy that considers both the precision (P) and the recall (R).
11. ReLU (Rectified Linear Unit):
$ \text{ReLU}(x) = \max(0, x) $
An activation function widely used in neural networks.
12. Softmax Function:
$ P(y = j|x) = \frac{e^{z_j}}{\sum_{k} e^{z_k}} $
Converts logits to probabilities that sum to 1 for multi-class classification.
13. R² Score:
$ R^2 = 1 - \frac{\sum_{i=1}^{n} (y_i - \hat{y}_i)^2}{\sum_{i=1}^{n} (y_i - \bar{y})^2} $
Indicates how well the regression model approximates the real data points.
14. Mean Squared Error (MSE):
$ \text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 $
Measures the average squared difference between actual and predicted values.
15. MSE with L2 Regularization:
$ \text{MSE}_{\text{regularized}} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 + \lambda \sum_{j=1}^{p} \beta_j^2 $
A regularized version of MSE to prevent overfitting.
16. Eigenvectors and Eigenvalues:
$ Av = \lambda v $
Describes properties of linear transformations in vector spaces.
17. Entropy:
$ \text{Entropy} = -\sum_{i} p_i \log_2(p_i) $
A measure of the uncertainty or randomness in a probability distribution.
18. K-Means Clustering:
$ \arg\min_{S} \sum_{i=1}^{k} \sum_{x \in S_i} ||x - \mu_i||^2 $
A method used to partition a dataset into k clusters.
19. Kullback-Leibler Divergence (KL Divergence):
$ D_{KL}(P||Q) = \sum_{x \in X} P(x) \log\left(\frac{P(x)}{Q(x)}\right) $
A measure of how one probability distribution diverges from a second reference probability distribution.
20. Log Loss:
$ \text{LogLoss} = -\frac{1}{N} \sum_{i=1}^{N} \left( y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i) \right) $
A measure of performance for a classification model where the prediction is a probability value between 0 and 1.
21. Support Vector Machine (SVM):
$ \min \frac{1}{2} ||w||^2 + C \sum_{i=1}^{N} \max(0, 1 - y_i(w \cdot x_i - b)) $
A supervised learning model used for classification and regression analysis.
22. Linear Regression:
$ y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + ... + \beta_n x_n + \epsilon $
A statistical method for modeling the relationship between a dependent variable and one or more independent variables.
23. Singular Value Decomposition (SVD):
$ A = UDV^T $
A factorization of a real or complex matrix, used in linear algebra and data compression.
24. Lagrange Multiplier:
$ \max f(x); \quad g(x) = 0 $
A strategy for finding the local maxima and minima of a function subject to equality constraints.
25. Lagrange Function:
$ L(x, \lambda) = f(x) - \lambda g(x) $
A function used in optimization to find the extremes of one function given constraints from another.
Feel free to dive deeper into any of these concepts for a better understanding!