Solution 1.1 finished

This commit is contained in:
Pascal Schindler 2021-11-25 12:10:55 +01:00
parent a4f4b6302c
commit ffa483afd3

View File

@ -86,7 +86,7 @@
}, },
"outputs": [], "outputs": [],
"source": [ "source": [
"data = np.load(\"iris_data.npz\")\n", "data = np.load(\"/Users/pascalschindler/Desktop/ML_Übungsblätter/Exercise_2/iris_data.npz\")\n",
"train_samples = data[\"train_features\"]\n", "train_samples = data[\"train_features\"]\n",
"train_labels = data[\"train_labels\"]\n", "train_labels = data[\"train_labels\"]\n",
"test_samples = data[\"test_features\"]\n", "test_samples = data[\"test_features\"]\n",
@ -192,13 +192,16 @@
"\n", "\n",
"\\begin{align} \n", "\\begin{align} \n",
"\\mathcal L_{\\mathrm{cat-NLL}} \n", "\\mathcal L_{\\mathrm{cat-NLL}} \n",
"=& - \\sum_{i=1}^N \\log p(c_i | \\boldsymbol x_i) \\\\ \n", "=& - \\sum_{i=1}^N \\log p(c_i | \\boldsymbol x_i) \\\\\n",
"=& - \\sum_{i=1}^N \\sum_{k=1}^K h_{i, k} \\log(p_{i,k}) \\\\ \n", "=& - \\sum_{i=1}^N \\sum_{k=1}^K h_{i, k} \\log(p_{i,k}) \\\\\n",
"=& - \\sum_{i=1}^N \\sum_{k=1}^K h_{i, k} \\log(\\textrm{softmax}(\\boldsymbol{w}_k^T \\boldsymbol \\phi(\\boldsymbol{x}_i))_k)\\\\ \n", "=& - \\sum_{i=1}^N \\sum_{k=1}^K h_{i, k} \\log(\\textrm{softmax}(\\boldsymbol{w}_k^T \\boldsymbol \\phi(\\boldsymbol{x}_i))_k)\\\\\n",
"=& - \\sum_{i=1}^N \\left(\\sum_{k=1}^K h_{i,k}\\boldsymbol{w}^T_k \\boldsymbol \\phi(\\boldsymbol{x}_i) - \\log \\sum_{j=1}^K \\exp(\\boldsymbol{w}_j^T \\boldsymbol \\phi(\\boldsymbol{x}_i))\\right).\n", "=& - \\sum_{i=1}^N \\left(\\sum_{k=1}^K h_{i,k}\\boldsymbol{w}^T_k \\boldsymbol \\phi(\\boldsymbol{x}_i) - \\log \\sum_{j=1}^K \\exp(\\boldsymbol{w}_j^T \\boldsymbol \\phi(\\boldsymbol{x}_i))\\right).\n",
"\\end{align}\n", "\\end{align}\n",
"\n", "\n",
"In order to use gradient based optimization for this, we of course also need to derive the gradient.\n", "In order to use gradient based optimization for this, we of course also need to derive the gradient.\n",
"This gives us the loss for a single sample. To get the loss for all samples we will need to sum over loss for a single sample\n",
"\n",
"\n",
"\n", "\n",
"### 1.1) Derivation (4 Points)\n", "### 1.1) Derivation (4 Points)\n",
"Derive the gradient $\\dfrac{\\partial \\mathcal L_{\\mathrm{cat-NLL}}}{\\partial \\boldsymbol{w}}$ of the loss function w.r.t. the full weight vector $\\boldsymbol w \\equiv \\begin{pmatrix} \\boldsymbol w_1^T & \\dots & \\boldsymbol w_K^T \\end{pmatrix}^T$, which is obtained by stacking the class-specific weight vectors $\\boldsymbol w_k$.\n", "Derive the gradient $\\dfrac{\\partial \\mathcal L_{\\mathrm{cat-NLL}}}{\\partial \\boldsymbol{w}}$ of the loss function w.r.t. the full weight vector $\\boldsymbol w \\equiv \\begin{pmatrix} \\boldsymbol w_1^T & \\dots & \\boldsymbol w_K^T \\end{pmatrix}^T$, which is obtained by stacking the class-specific weight vectors $\\boldsymbol w_k$.\n",
@ -206,9 +209,73 @@
"**Hint 1:** Follow the steps in the derivation of the gradient of the loss for the binary classification in the lecture.\n", "**Hint 1:** Follow the steps in the derivation of the gradient of the loss for the binary classification in the lecture.\n",
"\n", "\n",
"**Hint 2:** Derive the gradient not for the whole vector $\\boldsymbol w$ but only for $\\boldsymbol w_k$ i.e., $\\dfrac{\\partial \\mathcal L_{\\mathrm{cat-NLL}}}{\\partial \\boldsymbol{w}_k}$. The gradients for the individual\n", "**Hint 2:** Derive the gradient not for the whole vector $\\boldsymbol w$ but only for $\\boldsymbol w_k$ i.e., $\\dfrac{\\partial \\mathcal L_{\\mathrm{cat-NLL}}}{\\partial \\boldsymbol{w}_k}$. The gradients for the individual\n",
"$\\boldsymbol w_k$ can then be stacked to obtain the full gradient.\n" "$\\boldsymbol w_k$ can then be stacked to obtain the full gradient."
] ]
}, },
{
"cell_type": "markdown",
"source": [
"\\begin{align}\n",
"\\mathcal L_{\\mathrm{cat-NLL}}\n",
"=& - \\sum_{i=1}^N \\left(\\sum_{k=1}^K h_{i,k}\\boldsymbol{w}^T_k \\boldsymbol \\phi(\\boldsymbol{x}_i) - \\log \\sum_{j=1}^K \\exp(\\boldsymbol{w}_j^T \\boldsymbol \\phi(\\boldsymbol{x}_i))\\right) \\\\\n",
"=& - \\sum_{i=1}^N \\left(\\sum_{k=1}^K h_{i,k}\\boldsymbol{w}^T_k \\boldsymbol \\phi(\\boldsymbol{x}_i) \\right) + \\sum_{i=1}^N \\log \\left(\\sum_{j=1}^K h_{i,k} \\exp(\\boldsymbol{w}_j^T \\boldsymbol \\phi(\\boldsymbol{x}_i)\\right) \\\\\n",
"\\end{align}\n"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "markdown",
"source": [
"Take $\\nabla_{w_j}\\mathcal L_{\\mathrm{cat-NLL}}$ with respect to a particular weight vector $\\boldsymbol{w}_j$, which leads to the fact that the sum $\\sum_{k=1}^K$ is no longer necessary and $\\sum_{i=1}^N\\log \\left(\\sum_{j=1}^K h_{i,k} \\exp(\\boldsymbol{w}_j^T \\boldsymbol \\phi(\\boldsymbol{x}_i)\\right) = \\sum_{i=1}^N \\log \\left( \\sum_{j=1}^K \\exp(\\boldsymbol{w}_j^T \\boldsymbol \\phi(\\boldsymbol{x}_i)\\right)$, since $\\sum_{k=1}^K h_{i,k} = 1$\n"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "markdown",
"source": [
"\\begin{align}\n",
"\\dfrac{\\partial \\mathcal L_{\\mathrm{cat-NLL}}}{\\partial \\boldsymbol{w}_k}\n",
"=& \\dfrac{\\partial}{\\partial \\boldsymbol{w}_k} \\left(- \\sum_{i=1}^N \\left( h_{i,k}\\boldsymbol{w}^T_k \\boldsymbol \\phi(\\boldsymbol{x}_i) \\right) + \\sum_{i=1}^N \\log \\left(\\sum_{j=1}^K \\exp(\\boldsymbol{w}_j^T \\boldsymbol \\phi(\\boldsymbol{x}_i)\\right)\\right) \\\\\n",
"=& - \\sum_{i=1}^N \\left( h_{i,k}\\boldsymbol \\phi(\\boldsymbol{x}_i) \\right) + \\sum_{i=1}^N \\left( \\frac{1}{\\sum_{j=1}^K \\exp(\\boldsymbol{w}_j^T \\boldsymbol \\phi(\\boldsymbol{x}_i))} \\right) \\exp(\\boldsymbol{w}_j^T \\boldsymbol \\phi(\\boldsymbol{x}_i)) * \\phi(\\boldsymbol{x}_i)\\\\\n",
"=& \\sum_{i=1}^N \\left[ \\left( \\frac{\\exp(\\boldsymbol{w}_j^T \\boldsymbol \\phi(\\boldsymbol{x}_i))}{\\sum_{j=1}^K \\exp(\\boldsymbol{w}_j^T \\boldsymbol \\phi(\\boldsymbol{x}_i))} - h_{i,k} \\right ) \\right] \\phi(\\boldsymbol{x}_i))\n",
"\\end{align}"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "markdown",
"source": [
"This is the derivation for one specific weight vector/class. This has to be done for all weight vectors\n"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "markdown",
"source": [
"\\begin{align}\n",
"\\dfrac{\\partial \\mathcal L_{\\mathrm{cat-NLL}}}{\\partial \\boldsymbol{w}}\n",
"=& \\begin{pmatrix}\n",
"\\sum_{i=1}^N \\left[ \\left( \\frac{\\exp(\\boldsymbol{w}_j^T \\boldsymbol \\phi(\\boldsymbol{x}_i))}{\\sum_{j=1}^K \\exp(\\boldsymbol{w}_j^T \\boldsymbol \\phi(\\boldsymbol{x}_i))} - h_{i,1} \\right ) \\right] \\phi(\\boldsymbol{x}_i)) \\\\\n",
"\\vdots \\\\\n",
"\\sum_{i=1}^N \\left[ \\left( \\frac{\\exp(\\boldsymbol{w}_j^T \\boldsymbol \\phi(\\boldsymbol{x}_i))}{\\sum_{j=1}^K \\exp(\\boldsymbol{w}_j^T \\boldsymbol \\phi(\\boldsymbol{x}_i))} - h_{i,K} \\right ) \\right] \\phi(\\boldsymbol{x}_i))\n",
"\\end{pmatrix}\n",
"\\end{align}"
],
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%% md\n"
}
}
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},