(Easy)
(Easy)
权重更新规则如下。
详解见 No. 47.
Numpy 的矩阵乘法规则为 X(m, n) @ Y(n, p),注意将数组 y(3,) 调整为向量 y(3, 1).
def linear_regression_gradient_descent(X: np.ndarray, y: np.ndarray, alpha: float, iterations: int) -> np.ndarray:
m, n = X.shape
theta = np.zeros((n, 1))
for i in range(iterations):
h = np.dot(X, theta) # 3 x 1
# -1 means AUTO
err = h - y.reshape(-1, 1) # 3 x 1
update = np.dot(X.T, err) / m # 2 x 1
theta -= alpha * update
return theta聚类是 NP-Hard 问题,因此使用 K-means 等启发式算法求解。对样本集 ,假设簇划分为 ,K-means 的目标是最小化平方误差
算法流程如下。
算法实现中,使用欧氏距离进行聚类。
def k_means_clustering(points: list[tuple[float, float]], k: int, initial_centroids: list[tuple[float, float]], max_iterations: int) -> list[tuple[float, float]]:
def euc_distance(p1, p2):
n_dims = len(p1)
return sum((p1[i] - p2[i]) ** 2 for i in range(n_dims)) ** 0.5
def do_clustering(centroids):
labels = []
for point in points:
dist = [euc_distance(point, centroids[i]) for i in range(k)]
labels.append(dist.index(min(dist)))
return labels
def update_centroids(labels, old_centroids, use_random=True):
new_centroids = []
for i in range(k):
subpoints = [points[j] for j in range(len(points)) if labels[j] == i]
if subpoints:
n_dims = len(subpoints[0])
new_centroid = (*(
sum(point[i] for point in subpoints) / len(subpoints) for i in range(n_dims)
),)
new_centroids.append(new_centroid)
else:
if use_random:
import random
new_centroids.append(random.choice(points))
else:
new_centroids.append(old_centroids[i])
return new_centroids
current_centroids = initial_centroids
for _ in range(max_iterations):
current_labels = do_clustering(current_centroids)
new_centroids = update_centroids(current_labels, current_centroids, True)
if all(euc_distance(c, n) < 1e-6 for c, n in zip(current_centroids, new_centroids)):
break
current_centroids = new_centroids
return current_centroids
本题目要求实现一颗 ID3 决策树。ID3 算法的核心是根据信息增益来选择进行划分的特征,然后递归地构建决策树。
首先计算样本总熵
引入条件熵
代码实现。根据信息 进行分类后的 个子集分别计算自己的熵,即为 . 见参考代码 chooseBestFeatureToSplit 函数。在 splitDataSet 函数中还对当前特征进行了移除,这是为了便于递归建树。
最后,引入信息增益
有了以上概念后,ID3 便可基于信息增益 进行特征选取,思路是每次选择属性 使得 ,其意义是每次选取使得数据集 不确定性降低最多的属性进行分类。
return round(1 / (1 + math.exp(-z)), 4)import math
def single_neuron_model(features: list[list[float]], labels: list[int], weights: list[float], bias: float) -> (list[float], float):
# Your code here
def sigmoid(z):
return 1 / (1 + math.exp(-z))
def dot(x, w):
return sum([x_ * w_ for x_, w_ in zip(x, w)])
probabilities = []
for feature_vectors in features:
Z = dot(feature_vectors, weights) + bias
probabilities.append(sigmoid(Z))
mse = sum([(p - float(l)) ** 2 for p, l in zip(probabilities, labels)]) / len(labels)
# ROUND
probabilities = [round(p, 4) for p in probabilities]
mse = round(mse, 4)
return probabilities, mse
这道题是 Easy,不做赘述。
编写一个独热码转换函数。学习 np 用法,特别是广播处理。
import numpy as np
def to_categorical(x, n_col=None):
n_col = n_col or np.amax(x) + 1
one_hots = np.zeros((x.shape[0], n_col))
one_hots[np.arange(x.shape[0]), x] = 1 # broadcast
return one_hots集成学习方法,其主要思想就是将弱的基学习器提升 (boost) 为强学习器。具体步骤如下:
由此看出,Boosting 是一个串行的过程。
AdaBoost 主要记住两个重要环节
具体过程如下。
考虑二分类问题,给定数据集 ,其中 是一个含有 个元素的列向量,即 ,而 是标量,。
1) 初始化样本权重
2) 对 ,重复以下操作得到 个基学习器:
3) 计算 的权重系数(即最终集成时使用的基学习器的权重)
4) 更新训练样本权重:
5) 其中, 是归一化因子,目的是为了使 的所有元素和为 1。即
6) 构建最终的分类器线性组合
7) 最终的分类器为
代码其实并不复杂,建议伴随理解。主要记住 和 的更新。
本文转载自 博客园
直接计算上式对数会导致数值不稳定。为此,减去
import numpy as np
def log_softmax(scores: list) -> np.ndarray:
sub_max = scores - np.max(scores)
log_submax = np.log(np.sum(np.exp(sub_max)))
return sub_max - log_submaxLeaky ReLU 通过允许小的负斜率来解决 Dying ReLU 问题。
可以手动实现
return sum(x1_ * x2_ for x1_, x2_ in zip(x1, x2))也可直接调用 np.inner.
本题要求对一个简单的全连接层 (无偏置)
使用梯度下降法进行优化。损失函数为 MSE,其梯度
处理整个数据集,计算平均梯度更新参数
为样本容量。
最常用的方式,每次处理小批量数据。
为 batch_size.
import numpy as np
def gradient_descent(X, y, weights, learning_rate, n_iterations, batch_size=1, method='batch'):
m = len(y)
if method == 'batch':
batch_size = m
elif method == 'stochastic':
batch_size = 1
elif method == 'mini_batch':
batch_size = batch_size
for _ in range(n_iterations):
for i in range(0, m, batch_size):
X_batch = X[i: i + batch_size]
y_batch = y[i: i + batch_size]
pred = X_batch.dot(weights)
err = pred - y_batch
gradient = 2 * X_batch.T.dot(err) / batch_size # broadcast
# update
weights -= learning_rate * gradient
return weights
Deep-ML | Implement Gini Impurity Calculation for a Set of Classes
常用于计算决策树节点的最优分割。
其中 为某一类别的概率。
import numpy as np
from collections import Counter
def gini_impurity(y):
"""
Calculate Gini Impurity for a list of class labels.
:param y: List of class labels
:return: Gini Impurity rounded to three decimal places
"""
n_labels = len(y)
probs = [float(p) / n_labels for _, p in Counter(y).items()]
return round(1 - sum(p ** 2 for p in probs), 3)本题注意输入合法性检查。不做赘述。
Deep-ML | Calculate Performance Metrics for a Classification Model
根据 Confusion Matrix 定义,有
Trade-off of s and s.
def performance_metrics(actual: list[int], predicted: list[int]) -> tuple:
# Implement your code here
n = len(actual)
TP = sum(1 if a == p == 1 else 0 for a, p in zip(actual, predicted))
TN = sum(1 if a == p == 0 else 0 for a, p in zip(actual, predicted))
FN = sum(actual) - TP
FP = sum(predicted) - TP
accuracy = float(TP + TN) / (TP + TN + FP + FN)
precision = float(TP) / (TP + FP)
negativePredictive = float(TN) / (TN + FN)
recall = float(TP) / (TP + FN)
specificity = float(TN) / (TN + FP)
f1 = 2.0 / (1 / precision + 1 / recall)
confusion_matrix = [[TP, FN], [FP, TN]]
return confusion_matrix, round(accuracy, 3), round(f1, 3), round(specificity, 3), round(negativePredictive, 3)import math
def binomial_probability(n, k, p):
C = math.comb(n, k)
probability = C * pow(p, k) * pow(1 - p, n - k)
return round(probability, 5)对维度 和位置 ,位置编码的计算公式如下。对于偶数维度:
对于奇数维度:
(Generated by GPT-4o)
import numpy as np
def pos_encoding(position: int, d_model: int):
# 检查输入的有效性
if position == 0 or d_model <= 0:
return -1
# 创建位置编码数组
pos_enc = np.zeros((1, position, d_model), dtype=np.float16)
# 计算位置编码
for pos in range(position):
for i in range(d_model):
if i % 2 == 0:
pos_enc[0, pos, i] = np.sin(pos / (10000 ** (i / d_model)))
else:
pos_enc[0, pos, i] = np.cos(pos / (10000 ** ((i - 1) / d_model)))
pos_enc = np.float16(pos_enc)
return pos_encAdam (Kingma. Ba. 2014) 是有效的优化器,它使用指数加权移动平均值来估算梯度的动量和二次矩。具体地,
通常 ,接下来修正偏差,得到 与
最后,以非常类似于 RMSProp 算法的方式重新缩放梯度,得到 的更新方程
更新参数
以下伪代码描述了 Adam 的整个过程。

import numpy as np
def adam_optimizer(parameter, grad, m, v, t, learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-8):
"""
Update parameters using the Adam optimizer.
Adjusts the learning rate based on the moving averages of the gradient and squared gradient.
:param parameter: Current parameter value
:param grad: Current gradient
:param m: First moment estimate
:param v: Second moment estimate
:param t: Current timestep
:param learning_rate: Learning rate (default=0.001)
:param beta1: First moment decay rate (default=0.9)
:param beta2: Second moment decay rate (default=0.999)
:param epsilon: Small constant for numerical stability (default=1e-8)
:return: tuple: (updated_parameter, updated_m, updated_v)
"""
# Your code here
m = beta1 * m + (1 - beta1) * grad
v = beta2 * v + (1 - beta2) * np.square(grad)
m_bias_corr = m / (1 - np.power(beta1, t))
v_bias_corr = v / (1 - np.power(beta2, t))
grad_1 = learning_rate * m_bias_corr / (np.sqrt(v_bias_corr) + epsilon)
parameter -= grad_1
return np.round(parameter,5), np.round(m,5), np.round(v,5)
return round(x / (1 + abs(x)), 4)