分享丨Deep-ML-Solutions

Miya

1123

2025.01.02

2025.05.16

发布于黑龙江

Python3 机器学习学习分享

1. Matrix times Vector
1. Scalar Multiplication of a Matrix
1. Linear Regression Using Gradient Descent
1. K-Means Clustering
1. Decision Tree Learning
1. Sigmoid Activation Function Understanding
1. Single Neuron
1. Divide Dataset Based on Feature Threshold
1. One-Hot Encoding of Nominal Values
1. Implement AdaBoost Fit Method
1. Implementation of Log Softmax Function
1. Leaky ReLU Activation Function
1. Linear Kernel Function
1. Implement Gradient Descent Variants with MSE Loss
1. Implement Gini Impurity Calculation for a Set of Classes
1. Calculate Image Brightness
1. Calculate Performance Metrics for a Classification Model
1. Binomial Distribution Probability
1. Positional Encoding Calculator
1. Adam Optimizer
1. Implement the Softsign Activation Function

1. Matrix times Vector

Deep-ML | Matrix times Vector

(Easy)

5. Scalar Multiplication of a Matrix

Deep-ML | Scalar Multiplication of a Matrix

(Easy)

15. Linear Regression Using Gradient Descent

Deep-ML | Linear Regression Using Gradient Descent

前置知识

梯度下降法

权重更新规则如下。

θ_{j} := θ_{j} - α \frac{1}{m} i = 1 \sum m (h_{θ} (x^{(i)}) - y^{(i)}) x_{j}^{(i)}

详解见 No. 47.

Code

Numpy 的矩阵乘法规则为 X(m, n) @ Y(n, p)，注意将数组 y(3,) 调整为向量 y(3, 1).

def linear_regression_gradient_descent(X: np.ndarray, y: np.ndarray, alpha: float, iterations: int) -> np.ndarray:
	m, n = X.shape
	theta = np.zeros((n, 1))

	for i in range(iterations):
		h = np.dot(X, theta)           # 3 x 1
		# -1 means AUTO
		err = h - y.reshape(-1, 1)     # 3 x 1
		update = np.dot(X.T, err) / m  # 2 x 1
		theta -= alpha * update

	return theta

17. K-Means Clustering

Deep-ML | K-Means Clustering

前置知识

K-均值聚类

聚类是 NP-Hard 问题，因此使用 K-means 等启发式算法求解。对样本集 $D = {x_{1}, x_{2}, ..., x_{m}}$ ，假设簇划分为 $(C_{1}, C_{2}, ..., C_{k})$ ，K-means 的目标是最小化平方误差

SSE = i \sum k x \in C_{i} \sum ∣∣ x - μ_{i} ∣∣_{2}^{2}

算法流程如下。

初始化 $C_{t} = \emptyset, (t = 1, 2, ..., k)$
对每一个 $x \in D$ ，计算 $x$ 与各个质心向量的距离 $d_{i}, (i = 1, 2, ..., k)$ ，将 $x$ 标记为最小的 $d_{i}$ 对应的类别 $λ_{i}$ ，更新 $C_{i} = C_{i} \cup {x}$
对每一个簇 $C_{i}$ ，使用 $C_{i}$ 中所有样本点更新质心 $μ_{i} = \frac{1}{∣ C _{i} ∣} x \in C_{i} \sum x$
重复 (1) ~ (3) 直至 $μ = {μ_{1}, μ_{2}, ..., μ_{k}}$ 不再更新
输出簇划分 $C$

Code

算法实现中，使用欧氏距离进行聚类。

def k_means_clustering(points: list[tuple[float, float]], k: int, initial_centroids: list[tuple[float, float]], max_iterations: int) -> list[tuple[float, float]]:
	
	def euc_distance(p1, p2):
		n_dims = len(p1)
		return sum((p1[i] - p2[i]) ** 2 for i in range(n_dims)) ** 0.5
	
	def do_clustering(centroids):
		labels = []
		for point in points:
			dist = [euc_distance(point, centroids[i]) for i in range(k)]
			labels.append(dist.index(min(dist)))
		return labels
	
	def update_centroids(labels, old_centroids, use_random=True):
		new_centroids = []
		for i in range(k):
			subpoints = [points[j] for j in range(len(points)) if labels[j] == i]
			if subpoints:
				n_dims = len(subpoints[0])
				new_centroid = (*(
					sum(point[i] for point in subpoints) / len(subpoints) for i in range(n_dims)
				),)
				new_centroids.append(new_centroid)
			else:
				if use_random:
					import random
					new_centroids.append(random.choice(points))
				else:
					new_centroids.append(old_centroids[i])

		return new_centroids

	current_centroids = initial_centroids
	for _ in range(max_iterations):
		current_labels = do_clustering(current_centroids)
		new_centroids = update_centroids(current_labels, current_centroids, True)

		if all(euc_distance(c, n) < 1e-6 for c, n in zip(current_centroids, new_centroids)):
			break
		current_centroids = new_centroids

	return current_centroids

20. Decision Tree Learning

Deep-ML | Decision Tree Learning

前置知识

ID3 决策树

（转载）代码参考

本题目要求实现一颗 ID3 决策树。ID3 算法的核心是根据信息增益来选择进行划分的特征，然后递归地构建决策树。

首先计算样本总熵

E = - i = 1 \sum n p (x_{i}) lo g_{2} (x_{i})

引入条件熵

E (Y / X) = i = 1 \sum n p (x_{i}) lo g_{2} (Y / x_{i})

代码实现。根据信息 $Y$ 进行分类后的 $∣ Y ∣$ 个子集分别计算自己的熵，即为 $E (Y / x_{i})$ . 见参考代码 chooseBestFeatureToSplit 函数。在 splitDataSet 函数中还对当前特征进行了移除，这是为了便于递归建树。

最后，引入信息增益

G ain = E (D) - E (D / A)

有了以上概念后，ID3 便可基于信息增益 $G$ 进行特征选取，思路是每次选择属性 $A$ 使得 $arg max_{A} G (A)$ ，其意义是每次选取使得数据集 $D$ 不确定性降低最多的属性进行分类。

22. Sigmoid Activation Function Understanding

Deep-ML | Sigmoid Activation Function Understanding

前置知识

Sigmoid

σ (z) = \frac{1}{1 + e ^{- z}}

Code

return round(1 / (1 + math.exp(-z)), 4)

24. Single Neuron

Deep-ML | Single Neuron

前置知识

MSE (均方误差)

MSE = \frac{1}{n} i = 1 \sum n (Y_{i} - \hat{Y_{i}})^{2}

Code

import math

def single_neuron_model(features: list[list[float]], labels: list[int], weights: list[float], bias: float) -> (list[float], float):
	# Your code here

	def sigmoid(z):
		return 1 / (1 + math.exp(-z))
	
	def dot(x, w):
		return sum([x_ * w_ for x_, w_ in zip(x, w)])
	
	probabilities = []

	for feature_vectors in features:
		Z = dot(feature_vectors, weights) + bias
		probabilities.append(sigmoid(Z))
	
	mse = sum([(p - float(l)) ** 2 for p, l in zip(probabilities, labels)]) / len(labels)

	# ROUND
	probabilities = [round(p, 4) for p in probabilities]
	mse = round(mse, 4)

	return probabilities, mse

31. Divide Dataset Based on Feature Threshold

Deep-ML | Decision Tree Learning

这道题是 Easy，不做赘述。

34. One-Hot Encoding of Nominal Values

Deep-ML | One-Hot Encoding of Nominal Values

编写一个独热码转换函数。学习 np 用法，特别是广播处理。

Code

import numpy as np

def to_categorical(x, n_col=None):
	n_col = n_col or np.amax(x) + 1
	one_hots = np.zeros((x.shape[0], n_col))
	one_hots[np.arange(x.shape[0]), x] = 1  # broadcast
	return one_hots

38. Implement AdaBoost Fit Method

Deep-ML | Implement AdaBoost Fit Method

前置知识

Boosting

集成学习方法，其主要思想就是将弱的基学习器提升 (boost) 为强学习器。具体步骤如下:

先用每个样本权重相等的训练集训练一个初始的基学习器；
根据上轮得到的学习器对训练集的预测表现情况调整训练集中的样本权重, 然后据此训练一个新的基学习器；
重复步骤 2 直到得到 $M$ 个基学习器，最终的集成结果是 $M$ 个基学习器的组合。

由此看出，Boosting 是一个串行的过程。

AdaBoost

AdaBoost 主要记住两个重要环节

得到基学习器的权重系数 $α_{m}$
更新训练样本权重 $W$

具体过程如下。

考虑二分类问题，给定数据集 $D = {(x_{1}, y_{1}), (x_{2}, y_{2}), \dots, (x_{N}, y_{N})}$ ，其中 $x_{i}$ 是一个含有 $d$ 个元素的列向量，即 $x_{i} \in X \subseteq R^{d}$ ，而 $y_{i}$ 是标量， $y \in {+ 1, - 1}$ 。

1) 初始化样本权重

W_{1} = (w_{1}^{(1)}, w_{2}^{(1)}, \dots, w_{N}^{(1)}), w_{i}^{(1)} = \frac{1}{N}, i = 1, 2, \dots, N

2) 对 $m = 1, 2, \dots, M$ ，重复以下操作得到 $M$ 个基学习器：

按照样本的权重分布 $W_{m}$ 训练数据得到第 $m$ 个基学习器 $G_{m} (x)$ ：

G_{m} (x) : X \to y \in {+ 1, - 1}

计算 $G_{m} (x)$ 在加权训练数据集上的分类误差率。其中 $I (\cdot)$ 是指示函数，括号里的条件成立其取值为 1，否则为 0。 $P (G_{m} (x_{i}) \neq = y_{i})$ 表示编号为 $m$ 的基分类器 $G_{m} (x)$ 对 $x_{i}$ 的预测标记与 $x_{i}$ 的真实标记 $y_{i}$ 之间不一致的概率，记为 $ϵ$ 。

ϵ_{m} = i = 1 \sum N P (G_{m} (x_{i}) \neq = y_{i}) = i = 1 \sum N w_{i}^{(m)} I (G_{m} (x_{i}) \neq = y_{i})

3) 计算 $G_{m} (x)$ 的权重系数（即最终集成时使用的基学习器的权重）

α_{m} = \frac{1}{2} l n \frac{1 - ϵ _{m}}{ϵ _{m}}

4) 更新训练样本权重：

W_{m + 1} = (w_{1}^{(m + 1)}, w_{2}^{(m + 1)}, \dots, w_{N}^{(m + 1)}) w_{i}^{(m + 1)} = \frac{w _{i}^{(m)}}{Z _{m}} e^{- α_{m} y_{i} G_{m} (x_{i})}, i = 1, 2, \dots, N

5) 其中， $Z_{m}$ 是归一化因子，目的是为了使 $W_{m + 1}$ 的所有元素和为 1。即

Z_{m} = i = 1 \sum N w_{i}^{(m)} e^{- α_{m} y_{i} G_{m} (x_{i})}

6) 构建最终的分类器线性组合

f (x) = m = 1 \sum M α_{m} G_{m} (x)

7) 最终的分类器为

G (x) = s i g n (f (x)) = s i g n (i = 1 \sum M α_{m} G_{m} (x))

Summary

代码其实并不复杂，建议伴随理解。主要记住 $α_{m}$ 和 $W$ 的更新。

本文转载自博客园

39. Implementation of Log Softmax Function

Deep-ML | Implementation of Log Softmax Function

前置知识

SoftMax

softmax (x_{i}) = \frac{e ^{x_{i}}}{\sum _{j = 1}^{n} e ^{x_{j}}}

Log SoftMax

直接计算上式对数会导致数值不稳定。为此，减去 $max x$

lo g softmax (x_{i}) = x_{i} - max (x) - lo g (j = 1 \sum n e^{x_{j} - m a x (x)})

Code

import numpy as np

def log_softmax(scores: list) -> np.ndarray:
	sub_max = scores - np.max(scores)
	log_submax = np.log(np.sum(np.exp(sub_max)))
	return sub_max - log_submax

44. Leaky ReLU Activation Function

Deep-ML | Leaky ReLU Activation Function

前置知识

Leaky ReLU

Leaky ReLU 通过允许小的负斜率来解决 Dying ReLU 问题。

f (x) = {x αx if x > 0 if x \leq 0

45. Linear Kernel Function

Deep-ML | Linear Kernel Function

可以手动实现

return sum(x1_ * x2_ for x1_, x2_ in zip(x1, x2))

也可直接调用 np.inner.

47. Implement Gradient Descent Variants with MSE Loss

Deep-ML | Implement Gradient Descent Variants with MSE Loss

前置知识

本题要求对一个简单的全连接层 (无偏置)

\hat{y} = W \cdot x

使用梯度下降法进行优化。损失函数为 MSE，其梯度

\frac{\partial W}{\partial MSE} = \frac{1}{n} i = 1 \sum n - 2 (y_{i} - \overset{y}{^}_{i}) X_{i}

Batch Gradient Descent

处理整个数据集，计算平均梯度更新参数

θ = θ - α \cdot \frac{2}{m} i = 1 \sum m (h_{θ} (x^{(i)}) - y^{(i)}) x^{(i)}

$m$ 为样本容量。

Stochastic Gradient Descent (SGD)

θ = θ - α \cdot 2 (h_{θ} (x^{(i)}) - y^{(i)}) x^{(i)}

Mini-Batch Gradient Descent

最常用的方式，每次处理小批量数据。

θ = θ - α \cdot \frac{2}{b} i = 1 \sum b (h_{θ} (x^{(i)}) - y^{(i)}) x^{(i)}

$b$ 为 batch_size.

Code

import numpy as np

def gradient_descent(X, y, weights, learning_rate, n_iterations, batch_size=1, method='batch'):
	m = len(y)

	if method == 'batch':
		batch_size = m
	elif method == 'stochastic':
		batch_size = 1
	elif method == 'mini_batch':
		batch_size = batch_size
	
	for _ in range(n_iterations):
		for i in range(0, m, batch_size):
			X_batch = X[i: i + batch_size]
			y_batch = y[i: i + batch_size]

			pred = X_batch.dot(weights)
			err = pred - y_batch
			gradient = 2 * X_batch.T.dot(err) / batch_size  # broadcast
			
			# update
			weights -= learning_rate * gradient

	return weights

64. Implement Gini Impurity Calculation for a Set of Classes

Deep-ML | Implement Gini Impurity Calculation for a Set of Classes

前置知识

基尼不纯度

常用于计算决策树节点的最优分割。

Gini Impurity = 1 - i = 1 \sum n p_{i}

其中 $p_{i} = \frac{n _{i}}{n}$ 为某一类别的概率。

Code


import numpy as np
from collections import Counter

def gini_impurity(y):
	"""
	Calculate Gini Impurity for a list of class labels.

	:param y: List of class labels
	:return: Gini Impurity rounded to three decimal places
	"""
	n_labels = len(y)
	probs = [float(p) / n_labels for _, p in Counter(y).items()]
	
	return round(1 - sum(p ** 2 for p in probs), 3)

70. Calculate Image Brightness

Deep-ML | Calculate Image Brightness

本题注意输入合法性检查。不做赘述。

77. Calculate Performance Metrics for a Classification Model

Deep-ML | Calculate Performance Metrics for a Classification Model

前置知识

混淆矩阵

Actual Positive Actual Negative Predicted Positive TP FP Predicted Negative FN TN

Accuracy

根据 Confusion Matrix 定义，有

Accuracy = \frac{TP + TN}{TP + TN + FP + FN}

Precision

Precision = \frac{TP}{TP + FP}

Negative Predictive Value

Negative Predictive Value = \frac{TN}{TN + FN}

Recall

Recall = \frac{TP}{TP + FN}

Specificity

Specificity = \frac{TN}{TN + FP}

F1 Score

Trade-off of $FN$ s and $FP$ s.

F1 Score = 2 \times \frac{1}{\frac{1}{Precision} + \frac{1}{Recall}}

Code

def performance_metrics(actual: list[int], predicted: list[int]) -> tuple:
	# Implement your code here
	n = len(actual)
	TP = sum(1 if a == p == 1 else 0 for a, p in zip(actual, predicted))
	TN = sum(1 if a == p == 0 else 0 for a, p in zip(actual, predicted))
	FN = sum(actual) - TP
	FP = sum(predicted) - TP

	accuracy = float(TP + TN) / (TP + TN + FP + FN)
	precision = float(TP) / (TP + FP)
	negativePredictive = float(TN) / (TN + FN)
	recall = float(TP) / (TP + FN)
	specificity = float(TN) / (TN + FP)
	f1 = 2.0 / (1 / precision + 1 / recall)
	confusion_matrix = [[TP, FN], [FP, TN]]
	

	return confusion_matrix, round(accuracy, 3), round(f1, 3), round(specificity, 3), round(negativePredictive, 3)

79. Binomial Distribution Probability

Deep-ML | Binomial Distribution Probability

前置知识

二项分布

P (X = k) = (k n) \cdot p^{k} \cdot (1 - p)^{n - k}

Code

import math

def binomial_probability(n, k, p):
	C = math.comb(n, k)
	probability = C * pow(p, k) * pow(1 - p, n - k)
	return round(probability, 5)

85. Positional Encoding Calculator

Deep-ML | Positional Encoding Calculator

前置知识

Transformer 位置编码器

对维度 $i \in d_{m o d e l}$ 和位置 $p os$ ，位置编码的计算公式如下。对于偶数维度：

sin (\frac{p os}{1000 0 ^{\frac{i}{d}}})

对于奇数维度：

cos (\frac{p os}{1000 0 ^{\frac{i - 1}{d}}})

Code

(Generated by GPT-4o)

import numpy as np

def pos_encoding(position: int, d_model: int):
    # 检查输入的有效性
    if position == 0 or d_model <= 0:
        return -1
    
    # 创建位置编码数组
    pos_enc = np.zeros((1, position, d_model), dtype=np.float16)

    # 计算位置编码
    for pos in range(position):
        for i in range(d_model):
            if i % 2 == 0:
                pos_enc[0, pos, i] = np.sin(pos / (10000 ** (i / d_model)))
            else:
                pos_enc[0, pos, i] = np.cos(pos / (10000 ** ((i - 1) / d_model)))

    pos_enc = np.float16(pos_enc)
    return pos_enc

87. Adam Optimizer

Deep-ML | Adam Optimizer

前置知识

Adam

Adam (Kingma. Ba. 2014) 是有效的优化器，它使用指数加权移动平均值来估算梯度的动量和二次矩。具体地，

v_{t} s_{t} \leftarrow β_{1} v_{t - 1} + (1 - β_{1}) g_{t}, \leftarrow β_{2} s_{t - 1} + (1 - β_{2}) g_{t}^{2} .

通常 $β_{1} = 0.9, β_{2} = 0.999$ ，接下来修正偏差，得到 $\overset{v}{^}$ 与 $\overset{s}{^}$

\hat{v}_{t} = \frac{v _{t}}{1 - β _{1}^{t}} and \hat{s}_{t} = \frac{s _{t}}{1 - β _{2}^{t}} .

最后，以非常类似于 RMSProp 算法的方式重新缩放梯度，得到 $g^{t}$ 的更新方程

g_{t}^{'} = \frac{η v ^ _{t}}{s ^ _{t} + ϵ} .

更新参数

x_{t} \leftarrow x_{t - 1} - g_{t}^{'} .

以下伪代码描述了 Adam 的整个过程。

Code

import numpy as np

def adam_optimizer(parameter, grad, m, v, t, learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-8):
	"""
	Update parameters using the Adam optimizer.
	Adjusts the learning rate based on the moving averages of the gradient and squared gradient.
	:param parameter: Current parameter value
	:param grad: Current gradient
	:param m: First moment estimate
	:param v: Second moment estimate
	:param t: Current timestep
	:param learning_rate: Learning rate (default=0.001)
	:param beta1: First moment decay rate (default=0.9)
	:param beta2: Second moment decay rate (default=0.999)
	:param epsilon: Small constant for numerical stability (default=1e-8)
	:return: tuple: (updated_parameter, updated_m, updated_v)
	"""
	# Your code here
	m = beta1 * m + (1 - beta1) * grad
	v = beta2 * v + (1 - beta2) * np.square(grad)
	m_bias_corr = m / (1 - np.power(beta1, t))
	v_bias_corr = v / (1 - np.power(beta2, t))
	grad_1 = learning_rate * m_bias_corr / (np.sqrt(v_bias_corr) + epsilon)
	parameter -= grad_1

	return np.round(parameter,5), np.round(m,5), np.round(v,5)

100. Implement the Softsign Activation Function

Deep-ML | Implement the Softsign Activation Function

前置知识

SoftSign

Softsign (x) = \frac{x}{1 + ∣ x ∣}

Code

return round(x / (1 + abs(x)), 4)

目录

1. Matrix times Vector

5. Scalar Multiplication of a Matrix

15. Linear Regression Using Gradient Descent

前置知识

梯度下降法

Code

17. K-Means Clustering

前置知识

K-均值聚类

Code

20. Decision Tree Learning

前置知识

ID3 决策树

22. Sigmoid Activation Function Understanding

前置知识

Sigmoid

Code

24. Single Neuron

前置知识

MSE (均方误差)

Code

31. Divide Dataset Based on Feature Threshold

34. One-Hot Encoding of Nominal Values

Code

38. Implement AdaBoost Fit Method

前置知识

Boosting

AdaBoost

Summary

39. Implementation of Log Softmax Function

前置知识

SoftMax

Log SoftMax

Code

44. Leaky ReLU Activation Function

前置知识

Leaky ReLU

45. Linear Kernel Function

47. Implement Gradient Descent Variants with MSE Loss

前置知识

Batch Gradient Descent

Stochastic Gradient Descent (SGD)

Mini-Batch Gradient Descent

Code

64. Implement Gini Impurity Calculation for a Set of Classes

前置知识

基尼不纯度

Code

70. Calculate Image Brightness

77. Calculate Performance Metrics for a Classification Model

前置知识

混淆矩阵

Accuracy

Precision

Negative Predictive Value

Recall

Specificity

F1 Score

Code

79. Binomial Distribution Probability

前置知识

二项分布

Code

85. Positional Encoding Calculator

前置知识

Transformer 位置编码器

Code

87. Adam Optimizer

前置知识

Adam

Code

100. Implement the Softsign Activation Function

前置知识

SoftSign

Code