{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![](images/python_with_Birds.gif)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "# แนะนำ Pandas (Introduction to Pandas)\n",
    "**20** minutes\n",
    "\n",
    "<font size=\"4\"> **วัตถุประสงค์**\n",
    "</font>\n",
    "\n",
    "<B> หลังจากทำทำแล็บ นศ.จะสามารถ </B>\n",
    "\n",
    "* ใช้งาน Pandas เข้าถึงและแสดงข้อมูลได้\n",
    "\n",
    "Ref: \n",
    "* http://pandas.pydata.org/pandas-docs/stable/getting_started/index.html\n",
    "* https://www.kaggle.com/leonardodata/analysis-from-2010-2019-spotify"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "## Dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "ตอนนี้ในมือของคุณมีชุดข้อมูล (Dataset) เพลง เป็นข้อมูลเพลงฮิตในแต่ละปีตั้งแต่ปี 2010 ถึงปี 2019 โดยข้อมูลอยู่ในรูปแบบของตาราง แต่ละแถว (row) ในตารางเป็นข้อมูลของเพลงแต่ละเพลง และแต่ละคอลัมน์ (column) มีข้อมูลดังต่อไปนี้\n",
    "\n",
    "* **artist** - ชื่อศิลปิน (นักร้อง วงดนตรี)\n",
    "* Top genre คือแนวเพลงหรือประเภทของแนวดนตรี\n",
    "* Year คือปีของเพลงที่อยู่ในบิลบอร์ด (Billboard)\n",
    "* Bpm (Tempo) คือจังหวะเร็วช้าขนาดไหน หน่วยเป็น bpm (Beats.Per.Minute)\n",
    "* Nrgy (Energy) คือพลังหรือความพุ่งของเพลง\n",
    "* Dnce (Danceability) คือความดิ้น หรือความน่าเต้นของเพลง\n",
    "* dB (Loudness) คือระดับความดังในหน่วย decibel \n",
    "* Live (Liveness) คือระดับของการแสดงสด (Live) ของเพลง \n",
    "* Val (Valence) คือระดับความ positive ในเพลง (ค่าต่ำ: เศร้า-หดหู่-โกรธ,คาสูง: มีความสุข-ให้กำลังใจ)\n",
    "* Dur (Length) - คือความยาว (ระยะเวลา) ของเพลง/ดนตรี\n",
    "* Acous (Acousticness) คือความ “acoustic” ของเพลง\n",
    "* Spch (Speechiness) คือปริมาณของคำพูดหรือเนื้อร้อง\n",
    "* Popularity คือระดับความนิยม ยิ่งค่าสูงเท่าไหร่ ยิ่งเป็นที่นิยม\n",
    "\n",
    "<br>\n",
    "\n",
    "ชุดข้อมูลที่ได้รับอยู่ในรูปตาราง (Format CSV)\n",
    "\n",
    "(Source: spotifycharts.com, kaggle.com)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "|#|title                                                                                               |artist                  |top genre                |year|bpm|nrgy|dnce|dB |live|val|dur|acous|spch|pop|\n",
    "|------|----------------------------------------------------------------------------------------------------|------------------------|-------------------------|----|---|----|----|---|----|---|---|-----|----|---|\n",
    "|1     |Memories                                                                                            |Maroon 5                |pop                      |2019|91 |32  |76  |-7 |8   |57 |189|84   |5   |99 |\n",
    "|2     |Lose You To Love Me                                                                                 |Selena Gomez            |dance pop                |2019|102|34  |51  |-9 |21  |9  |206|58   |4   |97 |\n",
    "|3     |Someone You Loved                                                                                   |Lewis Capaldi           |pop                      |2019|110|41  |50  |-6 |11  |45 |182|75   |3   |96 |\n",
    "|4     |Senorita                                                                                            |Shawn Mendes            |canadian pop             |2019|117|54  |76  |-6 |9   |75 |191|4    |3   |95 |\n",
    "|5     |How Do You Sleep?                                                                                   |Sam Smith               |pop                      |2019|111|68  |48  |-5 |8   |35 |202|15   |9   |93 |\n",
    "|6     |South of the Border (feat. Camila Cabello & Cardi B)                                                |Ed Sheeran              |pop                      |2019|98 |62  |86  |-6 |9   |67 |204|15   |8   |92 |\n",
    "|7     |Trampoline (with ZAYN)                                                                              |SHAED                   |electropop               |2019|127|46  |62  |-6 |14  |50 |184|56   |3   |92 |\n",
    "|8     |Happier                                                                                             |Marshmello              |brostep                  |2019|100|79  |69  |-3 |17  |67 |214|19   |5   |90 |\n",
    "|9     |Truth Hurts                                                                                         |Lizzo                   |escape room              |2019|158|62  |72  |-3 |12  |41 |173|11   |11  |90 |\n",
    "|10    |Good as Hell (feat. Ariana Grande) - Remix                                                          |Lizzo                   |escape room              |2019|96 |89  |67  |-3 |74  |48 |159|30   |6   |90 |\n",
    "|11    |Underneath the Tree                                                                                 |Kelly Clarkson          |dance pop                |2013|160|81  |51  |-5 |21  |69 |230|0    |5   |88 |\n",
    "|12    |Higher Love                                                                                         |Kygo                    |edm                      |2019|104|68  |69  |-7 |10  |40 |228|2    |3   |88 |\n",
    "|13    |Shape of You                                                                                        |Ed Sheeran              |pop                      |2017|96 |65  |83  |-3 |9   |93 |234|58   |8   |87 |\n",
    "|14    |Only Human                                                                                          |Jonas Brothers          |boy band                 |2019|94 |50  |80  |-6 |6   |87 |183|11   |7   |87 |\n",
    "|15    |All of Me                                                                                           |John Legend             |neo mellow               |2014|120|26  |42  |-7 |13  |33 |270|92   |3   |86 |\n",
    "|16    |Closer                                                                                              |The Chainsmokers        |electropop               |2017|95 |52  |75  |-6 |11  |66 |245|41   |3   |86 |\n",
    "|17    |One Kiss (with Dua Lipa)                                                                            |Calvin Harris           |dance pop                |2018|124|86  |79  |-3 |8   |59 |215|4    |11  |86 |\n",
    "|18    |Beautiful People (feat. Khalid)                                                                     |Ed Sheeran              |pop                      |2019|93 |65  |64  |-8 |8   |55 |198|12   |19  |86 |\n",
    "|19    |Sucker                                                                                              |Jonas Brothers          |boy band                 |2019|138|73  |84  |-5 |11  |95 |181|4    |6   |86 |\n",
    "|20    |Don't Call Me Up                                                                                    |Mabel                   |dance pop                |2019|99 |88  |67  |-3 |8   |23 |178|30   |15  |86 |\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "## Pandas คือ?\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[Pandas](https://pypi.org/project/pandas/)<sup>*</sup> เป็นไลบรารี่ที่ได้รับความนิยมอย่างมากอันหนึ่งของภาษาไพทอน ใช้ในการจัดการข้อมูล (Data wrangling/ Data cleaning) และการวิเคราะห์ข้อมูล (Data analysis) สามารถจัดการกับข้อมูลที่มีขนาดใหญ่ๆ ได้โดยที่ไม่มีปัญหา (ในขณะที่โปรแกรม Spreadsheets อย่าง Excel มีปัญหาเรื่องการประมวลผลช้าและไม่เสถียร)\n",
    "\n",
    "หากทำงานเกี่ยวข้องกับ Data (เช่น Data Scientist) Pandas จะเป็นตัวเลือกหลัก เพราะ Pandas สามารถเชื่อมต่อการแหล่งข้อมูลได้หลากหลาย (เช่น CSV file, Text file, Microsoft Excel, Clipboard, HDFS format, Google BIg Query, Oracle database, URL, HTML, JSON file)\n",
    "หลังจากนั้นก็สามารถจัดเตรียมข้อมูล (Data preparation) ทำความสะอาดข้อมูล (Data cleansing/Data scrubbing) และจัดรูปแบบให้พร้อมกับการนำไปวิเคราะห์ (Data analytics) จนถึงแสดงผล (Data visualization) เบื้องต้นก็ได้\n",
    "\n",
    "\n",
    "เนื่องจาก Pandas ไม่ใช่โมดูลมาตรฐานของไพทอน ดังนั้น ก่อนจะใช้งานจำเป็นต้องติดตั้งลงเครื่องก่อน (pip install หรือ conda install)\n",
    "\n",
    "เราจะเริ่มต้นเขียนโปรแกรมโดยการ import เข้ามา ณ ที่นี้จะตั้งชื่อว่า pd เพื่อให้เวลาเรียกใช้งานฟังก์ชันจะได้เขียนง่ายๆ เช่น <code>pd.read_csv()</code> \n",
    "\n",
    "(<sup>*</sup>Pandas มาจาก Panel data ซึ่งหมายถึงข้อมูลที่มีหลายมิติ)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## การอ่านข้อมูลจากไฟล์ในรูปแบบ DataFrame"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Import ไลบรารี่ที่จะใช้งาน"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "#### เช็ค Version"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Pandas version 1.5.1\n"
     ]
    }
   ],
   "source": [
    "print (\"Pandas version\",pd.__version__)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**ดาวน์โหลดชุดข้อมูลไฟล์ CSV เข้าสู่โปรแกรมในรูปแบบ DataFrame**\n",
    "\n",
    "_The top songs BY YEAR in the world by spotify. This dataset has several variables about the songs and is based on Billboard._\n",
    "\n",
    "(ชุดข้อมูลข้างต้นสามารถดาวโหลดได้จาก spotifycharts.com หรือ [kaggle.com](https://www.kaggle.com/leonardopena/top-spotify-songs-from-20102019-by-year) โดยข้อมูลดิบจะเรียงตามปี (Year) ในรูปของไฟล์ CSV (Comma-Separated Values) )\n",
    "\n",
    "\n",
    "ก่อนอื่น เราจะโหลดไฟล์ข้อมูล csv ที่อยู่ในเครื่องของเราเข้าสู่ไพทอนในรูปแบบ DataFrame (DataFrame คือ ตารางข้อมูลที่ถูกแบ่งเป็นแถวและคอลัมน์เหมือน Excel) โดยใช้ฟังก์ชัน <code>read_csv()</code>\n",
    "\n",
    "ตัวแปร <code>csv_path</code> เก็บที่อยู่ของไฟล์ (path) <b>.csv</b> ส่งเป็นอาร์กิวเมนต์ให้กับฟังก์ชั่น <code>read_csv</code>  ผลลัพธ์จะถูกเก็บในออบเจ็กต์ชื่อ  <code>df</code> ซึ่งเป็นชื่อย่อของตัวแปรที่นิยมใช้เก็บ DataFrame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Read data from CSV file\n",
    "csv_path = 'docs/Top_Spotify_songs_from_2010_2019.csv'\n",
    "df = pd.read_csv(csv_path, encoding='ISO-8859-1')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "หมายเหตุ: ASCII: 7 bit, ISO-8859-1: 8 bits, UTF-8: 8-32 bits"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "เราสามารถแสดงตัวอย่างข้อมูล (Preview) 5 แถวบนสุดและ 5 แถวล่างสุดของ Dataframe ได้โดยใช้ฟังก์ชัน ใช้เมธอด <code>head()</code> และ <code>tail()</code> ตามลำดับ"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Unnamed: 0</th>\n",
       "      <th>title</th>\n",
       "      <th>artist</th>\n",
       "      <th>top genre</th>\n",
       "      <th>year</th>\n",
       "      <th>bpm</th>\n",
       "      <th>nrgy</th>\n",
       "      <th>dnce</th>\n",
       "      <th>dB</th>\n",
       "      <th>live</th>\n",
       "      <th>val</th>\n",
       "      <th>dur</th>\n",
       "      <th>acous</th>\n",
       "      <th>spch</th>\n",
       "      <th>pop</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>Hey, Soul Sister</td>\n",
       "      <td>Train</td>\n",
       "      <td>neo mellow</td>\n",
       "      <td>2010</td>\n",
       "      <td>97</td>\n",
       "      <td>89</td>\n",
       "      <td>67</td>\n",
       "      <td>-4</td>\n",
       "      <td>8</td>\n",
       "      <td>80</td>\n",
       "      <td>217</td>\n",
       "      <td>19</td>\n",
       "      <td>4</td>\n",
       "      <td>83</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>Love The Way You Lie</td>\n",
       "      <td>Eminem</td>\n",
       "      <td>detroit hip hop</td>\n",
       "      <td>2010</td>\n",
       "      <td>87</td>\n",
       "      <td>93</td>\n",
       "      <td>75</td>\n",
       "      <td>-5</td>\n",
       "      <td>52</td>\n",
       "      <td>64</td>\n",
       "      <td>263</td>\n",
       "      <td>24</td>\n",
       "      <td>23</td>\n",
       "      <td>82</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>TiK ToK</td>\n",
       "      <td>Kesha</td>\n",
       "      <td>dance pop</td>\n",
       "      <td>2010</td>\n",
       "      <td>120</td>\n",
       "      <td>84</td>\n",
       "      <td>76</td>\n",
       "      <td>-3</td>\n",
       "      <td>29</td>\n",
       "      <td>71</td>\n",
       "      <td>200</td>\n",
       "      <td>10</td>\n",
       "      <td>14</td>\n",
       "      <td>80</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>Bad Romance</td>\n",
       "      <td>Lady Gaga</td>\n",
       "      <td>dance pop</td>\n",
       "      <td>2010</td>\n",
       "      <td>119</td>\n",
       "      <td>92</td>\n",
       "      <td>70</td>\n",
       "      <td>-4</td>\n",
       "      <td>8</td>\n",
       "      <td>71</td>\n",
       "      <td>295</td>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "      <td>79</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>Just the Way You Are</td>\n",
       "      <td>Bruno Mars</td>\n",
       "      <td>pop</td>\n",
       "      <td>2010</td>\n",
       "      <td>109</td>\n",
       "      <td>84</td>\n",
       "      <td>64</td>\n",
       "      <td>-5</td>\n",
       "      <td>9</td>\n",
       "      <td>43</td>\n",
       "      <td>221</td>\n",
       "      <td>2</td>\n",
       "      <td>4</td>\n",
       "      <td>78</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Unnamed: 0                 title      artist        top genre  year  bpm  \\\n",
       "0           1      Hey, Soul Sister       Train       neo mellow  2010   97   \n",
       "1           2  Love The Way You Lie      Eminem  detroit hip hop  2010   87   \n",
       "2           3               TiK ToK       Kesha        dance pop  2010  120   \n",
       "3           4           Bad Romance   Lady Gaga        dance pop  2010  119   \n",
       "4           5  Just the Way You Are  Bruno Mars              pop  2010  109   \n",
       "\n",
       "   nrgy  dnce  dB  live  val  dur  acous  spch  pop  \n",
       "0    89    67  -4     8   80  217     19     4   83  \n",
       "1    93    75  -5    52   64  263     24    23   82  \n",
       "2    84    76  -3    29   71  200     10    14   80  \n",
       "3    92    70  -4     8   71  295      0     4   79  \n",
       "4    84    64  -5     9   43  221      2     4   78  "
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Print first five rows of the dataframe\n",
    "\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "จะกำหนดจำนวนแถวที่ต้องการแสดงผลก็ทำได้ เช่น"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Unnamed: 0</th>\n",
       "      <th>title</th>\n",
       "      <th>artist</th>\n",
       "      <th>top genre</th>\n",
       "      <th>year</th>\n",
       "      <th>bpm</th>\n",
       "      <th>nrgy</th>\n",
       "      <th>dnce</th>\n",
       "      <th>dB</th>\n",
       "      <th>live</th>\n",
       "      <th>val</th>\n",
       "      <th>dur</th>\n",
       "      <th>acous</th>\n",
       "      <th>spch</th>\n",
       "      <th>pop</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>Hey, Soul Sister</td>\n",
       "      <td>Train</td>\n",
       "      <td>neo mellow</td>\n",
       "      <td>2010</td>\n",
       "      <td>97</td>\n",
       "      <td>89</td>\n",
       "      <td>67</td>\n",
       "      <td>-4</td>\n",
       "      <td>8</td>\n",
       "      <td>80</td>\n",
       "      <td>217</td>\n",
       "      <td>19</td>\n",
       "      <td>4</td>\n",
       "      <td>83</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>Love The Way You Lie</td>\n",
       "      <td>Eminem</td>\n",
       "      <td>detroit hip hop</td>\n",
       "      <td>2010</td>\n",
       "      <td>87</td>\n",
       "      <td>93</td>\n",
       "      <td>75</td>\n",
       "      <td>-5</td>\n",
       "      <td>52</td>\n",
       "      <td>64</td>\n",
       "      <td>263</td>\n",
       "      <td>24</td>\n",
       "      <td>23</td>\n",
       "      <td>82</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Unnamed: 0                 title  artist        top genre  year  bpm  nrgy  \\\n",
       "0           1      Hey, Soul Sister   Train       neo mellow  2010   97    89   \n",
       "1           2  Love The Way You Lie  Eminem  detroit hip hop  2010   87    93   \n",
       "\n",
       "   dnce  dB  live  val  dur  acous  spch  pop  \n",
       "0    67  -4     8   80  217     19     4   83  \n",
       "1    75  -5    52   64  263     24    23   82  "
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head(2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "ในกรณีที่ข้อมูลเป็นไฟล์ excel (.xlsx, .xls) Pandas จะใช้ไลบรารี [xlrd](https://pypi.org/project/xlrd/) โดยเรียกใช้ใช้ฟังก์ชั่น <code>read_excel()</code> ซึ่งผลลัพท์ที่ได้จะเป็น Dataframe เช่นเดียวกัน"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Unnamed: 0</th>\n",
       "      <th>title</th>\n",
       "      <th>artist</th>\n",
       "      <th>top genre</th>\n",
       "      <th>year</th>\n",
       "      <th>bpm</th>\n",
       "      <th>nrgy</th>\n",
       "      <th>dnce</th>\n",
       "      <th>dB</th>\n",
       "      <th>live</th>\n",
       "      <th>val</th>\n",
       "      <th>dur</th>\n",
       "      <th>acous</th>\n",
       "      <th>spch</th>\n",
       "      <th>pop</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>601</th>\n",
       "      <td>602</td>\n",
       "      <td>Nothing Breaks Like a Heart (feat. Miley Cyrus)</td>\n",
       "      <td>Mark Ronson</td>\n",
       "      <td>dance pop</td>\n",
       "      <td>2019</td>\n",
       "      <td>114</td>\n",
       "      <td>79</td>\n",
       "      <td>60</td>\n",
       "      <td>-6</td>\n",
       "      <td>42</td>\n",
       "      <td>24</td>\n",
       "      <td>217</td>\n",
       "      <td>1</td>\n",
       "      <td>7</td>\n",
       "      <td>69</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>602</th>\n",
       "      <td>603</td>\n",
       "      <td>Kills You Slowly</td>\n",
       "      <td>The Chainsmokers</td>\n",
       "      <td>electropop</td>\n",
       "      <td>2019</td>\n",
       "      <td>150</td>\n",
       "      <td>44</td>\n",
       "      <td>70</td>\n",
       "      <td>-9</td>\n",
       "      <td>13</td>\n",
       "      <td>23</td>\n",
       "      <td>213</td>\n",
       "      <td>6</td>\n",
       "      <td>6</td>\n",
       "      <td>67</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     Unnamed: 0                                            title  \\\n",
       "601         602  Nothing Breaks Like a Heart (feat. Miley Cyrus)   \n",
       "602         603                                 Kills You Slowly   \n",
       "\n",
       "               artist   top genre  year  bpm  nrgy  dnce  dB  live  val  dur  \\\n",
       "601       Mark Ronson   dance pop  2019  114    79    60  -6    42   24  217   \n",
       "602  The Chainsmokers  electropop  2019  150    44    70  -9    13   23  213   \n",
       "\n",
       "     acous  spch  pop  \n",
       "601      1     7   69  \n",
       "602      6     6   67  "
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Read data from Excel File and print the first five rows\n",
    "\n",
    "xlsx_path = 'docs/Top_Spotify_songs_from_2010_2019.xlsx'\n",
    "\n",
    "df = pd.read_excel(xlsx_path)\n",
    "df.tail(2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "ในกรณีที่เกิด Error “XLRDError: Excel xlsx file; not supported” ให้ทำการติดตั้ง xlrd โดยใช้คำสั่งต่อไปนี้ แล้วลองโหลดใหม่ดูอีกที"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Requirement already satisfied: xlrd in /Users/tube.sc/.pyenv/versions/3.8.13/lib/python3.8/site-packages (2.0.1)\n"
     ]
    }
   ],
   "source": [
    "# Dependency needed to install file \n",
    "# Pandas.read_excel a.) uses a library called xlrd internally.\n",
    "\n",
    "!pip install xlrd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "## ตรวจสอบชุดข้อมูล"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**เช็คจำนวนแถวและจำนวนคอลัมน์**\n",
    "ใน Pandas มีวิธีเช็คจำนวนแถวและจำนวนคอลัมน์ได้ง่ายๆ โดยใช้ "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(603, 15)"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "ชุดข้อมูลมี 603 rows และ 15 columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**ดูค่าสถิติเบื้องต้นของข้อมูลแต่ละ column ใน DataFrame** เช่น mean (ค่าเฉลี่ย), median (ค่ากลาง), standard deviation (std), min (ค่าต่ำสุด), max (ค่าสูงสุด) ฯลฯ"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Unnamed: 0</th>\n",
       "      <th>year</th>\n",
       "      <th>bpm</th>\n",
       "      <th>nrgy</th>\n",
       "      <th>dnce</th>\n",
       "      <th>dB</th>\n",
       "      <th>live</th>\n",
       "      <th>val</th>\n",
       "      <th>dur</th>\n",
       "      <th>acous</th>\n",
       "      <th>spch</th>\n",
       "      <th>pop</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>603.000000</td>\n",
       "      <td>603.000000</td>\n",
       "      <td>603.000000</td>\n",
       "      <td>603.000000</td>\n",
       "      <td>603.000000</td>\n",
       "      <td>603.000000</td>\n",
       "      <td>603.000000</td>\n",
       "      <td>603.000000</td>\n",
       "      <td>603.000000</td>\n",
       "      <td>603.000000</td>\n",
       "      <td>603.000000</td>\n",
       "      <td>603.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>302.000000</td>\n",
       "      <td>2014.592040</td>\n",
       "      <td>118.545605</td>\n",
       "      <td>70.504146</td>\n",
       "      <td>64.379768</td>\n",
       "      <td>-5.578773</td>\n",
       "      <td>17.774461</td>\n",
       "      <td>52.225539</td>\n",
       "      <td>224.674959</td>\n",
       "      <td>14.326700</td>\n",
       "      <td>8.358209</td>\n",
       "      <td>66.520730</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>174.215384</td>\n",
       "      <td>2.607057</td>\n",
       "      <td>24.795358</td>\n",
       "      <td>16.310664</td>\n",
       "      <td>13.378718</td>\n",
       "      <td>2.798020</td>\n",
       "      <td>13.102543</td>\n",
       "      <td>22.513020</td>\n",
       "      <td>34.130059</td>\n",
       "      <td>20.766165</td>\n",
       "      <td>7.483162</td>\n",
       "      <td>14.517746</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>1.000000</td>\n",
       "      <td>2010.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>-60.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>134.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>151.500000</td>\n",
       "      <td>2013.000000</td>\n",
       "      <td>100.000000</td>\n",
       "      <td>61.000000</td>\n",
       "      <td>57.000000</td>\n",
       "      <td>-6.000000</td>\n",
       "      <td>9.000000</td>\n",
       "      <td>35.000000</td>\n",
       "      <td>202.000000</td>\n",
       "      <td>2.000000</td>\n",
       "      <td>4.000000</td>\n",
       "      <td>60.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>302.000000</td>\n",
       "      <td>2015.000000</td>\n",
       "      <td>120.000000</td>\n",
       "      <td>74.000000</td>\n",
       "      <td>66.000000</td>\n",
       "      <td>-5.000000</td>\n",
       "      <td>12.000000</td>\n",
       "      <td>52.000000</td>\n",
       "      <td>221.000000</td>\n",
       "      <td>6.000000</td>\n",
       "      <td>5.000000</td>\n",
       "      <td>69.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>452.500000</td>\n",
       "      <td>2017.000000</td>\n",
       "      <td>129.000000</td>\n",
       "      <td>82.000000</td>\n",
       "      <td>73.000000</td>\n",
       "      <td>-4.000000</td>\n",
       "      <td>24.000000</td>\n",
       "      <td>69.000000</td>\n",
       "      <td>239.500000</td>\n",
       "      <td>17.000000</td>\n",
       "      <td>9.000000</td>\n",
       "      <td>76.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>603.000000</td>\n",
       "      <td>2019.000000</td>\n",
       "      <td>206.000000</td>\n",
       "      <td>98.000000</td>\n",
       "      <td>97.000000</td>\n",
       "      <td>-2.000000</td>\n",
       "      <td>74.000000</td>\n",
       "      <td>98.000000</td>\n",
       "      <td>424.000000</td>\n",
       "      <td>99.000000</td>\n",
       "      <td>48.000000</td>\n",
       "      <td>99.000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       Unnamed: 0         year         bpm        nrgy        dnce  \\\n",
       "count  603.000000   603.000000  603.000000  603.000000  603.000000   \n",
       "mean   302.000000  2014.592040  118.545605   70.504146   64.379768   \n",
       "std    174.215384     2.607057   24.795358   16.310664   13.378718   \n",
       "min      1.000000  2010.000000    0.000000    0.000000    0.000000   \n",
       "25%    151.500000  2013.000000  100.000000   61.000000   57.000000   \n",
       "50%    302.000000  2015.000000  120.000000   74.000000   66.000000   \n",
       "75%    452.500000  2017.000000  129.000000   82.000000   73.000000   \n",
       "max    603.000000  2019.000000  206.000000   98.000000   97.000000   \n",
       "\n",
       "               dB        live         val         dur       acous        spch  \\\n",
       "count  603.000000  603.000000  603.000000  603.000000  603.000000  603.000000   \n",
       "mean    -5.578773   17.774461   52.225539  224.674959   14.326700    8.358209   \n",
       "std      2.798020   13.102543   22.513020   34.130059   20.766165    7.483162   \n",
       "min    -60.000000    0.000000    0.000000  134.000000    0.000000    0.000000   \n",
       "25%     -6.000000    9.000000   35.000000  202.000000    2.000000    4.000000   \n",
       "50%     -5.000000   12.000000   52.000000  221.000000    6.000000    5.000000   \n",
       "75%     -4.000000   24.000000   69.000000  239.500000   17.000000    9.000000   \n",
       "max     -2.000000   74.000000   98.000000  424.000000   99.000000   48.000000   \n",
       "\n",
       "              pop  \n",
       "count  603.000000  \n",
       "mean    66.520730  \n",
       "std     14.517746  \n",
       "min      0.000000  \n",
       "25%     60.000000  \n",
       "50%     69.000000  \n",
       "75%     76.000000  \n",
       "max     99.000000  "
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "หรือจะดูเฉพาะ mean (ค่าเฉลี่ย), median (ค่ากลาง), standard deviation (std), min (ค่าต่ำสุด), max (ค่าสูงสุด) ก็สามารถทำได้\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/var/folders/nb/qx4_7k_n2mx3zzwh39qtrg_c0000gn/T/ipykernel_62169/3698961737.py:1: FutureWarning: The default value of numeric_only in DataFrame.mean is deprecated. In a future version, it will default to False. In addition, specifying 'numeric_only=None' is deprecated. Select only valid columns or specify the value of numeric_only to silence this warning.\n",
      "  df.mean()\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "Unnamed: 0     302.000000\n",
       "year          2014.592040\n",
       "bpm            118.545605\n",
       "nrgy            70.504146\n",
       "dnce            64.379768\n",
       "dB              -5.578773\n",
       "live            17.774461\n",
       "val             52.225539\n",
       "dur            224.674959\n",
       "acous           14.326700\n",
       "spch             8.358209\n",
       "pop             66.520730\n",
       "dtype: float64"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.mean()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/var/folders/nb/qx4_7k_n2mx3zzwh39qtrg_c0000gn/T/ipykernel_62169/530051474.py:1: FutureWarning: The default value of numeric_only in DataFrame.median is deprecated. In a future version, it will default to False. In addition, specifying 'numeric_only=None' is deprecated. Select only valid columns or specify the value of numeric_only to silence this warning.\n",
      "  df.median()\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "Unnamed: 0     302.0\n",
       "year          2015.0\n",
       "bpm            120.0\n",
       "nrgy            74.0\n",
       "dnce            66.0\n",
       "dB              -5.0\n",
       "live            12.0\n",
       "val             52.0\n",
       "dur            221.0\n",
       "acous            6.0\n",
       "spch             5.0\n",
       "pop             69.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.median()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/var/folders/nb/qx4_7k_n2mx3zzwh39qtrg_c0000gn/T/ipykernel_62169/3390915376.py:1: FutureWarning: The default value of numeric_only in DataFrame.std is deprecated. In a future version, it will default to False. In addition, specifying 'numeric_only=None' is deprecated. Select only valid columns or specify the value of numeric_only to silence this warning.\n",
      "  df.std()\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "Unnamed: 0    174.215384\n",
       "year            2.607057\n",
       "bpm            24.795358\n",
       "nrgy           16.310664\n",
       "dnce           13.378718\n",
       "dB              2.798020\n",
       "live           13.102543\n",
       "val            22.513020\n",
       "dur            34.130059\n",
       "acous          20.766165\n",
       "spch            7.483162\n",
       "pop            14.517746\n",
       "dtype: float64"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.std()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/var/folders/nb/qx4_7k_n2mx3zzwh39qtrg_c0000gn/T/ipykernel_62169/3962516015.py:1: FutureWarning: The default value of numeric_only in DataFrame.min is deprecated. In a future version, it will default to False. In addition, specifying 'numeric_only=None' is deprecated. Select only valid columns or specify the value of numeric_only to silence this warning.\n",
      "  df.min()\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "Unnamed: 0               1\n",
       "artist               3OH!3\n",
       "top genre     acoustic pop\n",
       "year                  2010\n",
       "bpm                      0\n",
       "nrgy                     0\n",
       "dnce                     0\n",
       "dB                     -60\n",
       "live                     0\n",
       "val                      0\n",
       "dur                    134\n",
       "acous                    0\n",
       "spch                     0\n",
       "pop                      0\n",
       "dtype: object"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.min()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/var/folders/nb/qx4_7k_n2mx3zzwh39qtrg_c0000gn/T/ipykernel_62169/1151452817.py:1: FutureWarning: The default value of numeric_only in DataFrame.max is deprecated. In a future version, it will default to False. In addition, specifying 'numeric_only=None' is deprecated. Select only valid columns or specify the value of numeric_only to silence this warning.\n",
      "  df.max()\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "Unnamed: 0               603\n",
       "artist             will.i.am\n",
       "top genre     tropical house\n",
       "year                    2019\n",
       "bpm                      206\n",
       "nrgy                      98\n",
       "dnce                      97\n",
       "dB                        -2\n",
       "live                      74\n",
       "val                       98\n",
       "dur                      424\n",
       "acous                     99\n",
       "spch                      48\n",
       "pop                       99\n",
       "dtype: object"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.max()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**เช็คความผิดปกติใน DataFrame เบื้องต้น**\n",
    "\n",
    "หากเราใช้ฟังชั่น info() เราก็จะทราบว่าข้อมูลเบื้องต้นของ DataFrame เช่น มีกี่แถว, มีกี่คอลัมน์, Missing value มีมากน้อยเท่าไหร่, แต่ละคอลัมน์เป็น Data Type อะไรบ้าง ซึ่งเป็นการเช็คหาความผิดปกติใน DataFrame เบื้องต้นได้"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 603 entries, 0 to 602\n",
      "Data columns (total 15 columns):\n",
      " #   Column      Non-Null Count  Dtype \n",
      "---  ------      --------------  ----- \n",
      " 0   Unnamed: 0  603 non-null    int64 \n",
      " 1   title       603 non-null    object\n",
      " 2   artist      603 non-null    object\n",
      " 3   top genre   603 non-null    object\n",
      " 4   year        603 non-null    int64 \n",
      " 5   bpm         603 non-null    int64 \n",
      " 6   nrgy        603 non-null    int64 \n",
      " 7   dnce        603 non-null    int64 \n",
      " 8   dB          603 non-null    int64 \n",
      " 9   live        603 non-null    int64 \n",
      " 10  val         603 non-null    int64 \n",
      " 11  dur         603 non-null    int64 \n",
      " 12  acous       603 non-null    int64 \n",
      " 13  spch        603 non-null    int64 \n",
      " 14  pop         603 non-null    int64 \n",
      "dtypes: int64(12), object(3)\n",
      "memory usage: 70.8+ KB\n"
     ]
    }
   ],
   "source": [
    "df.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "ชุดข้อมูลนี้ไม่มีช่องใดที่มีค่าเป็น NaN (ไม่มีข้อมูล)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['Unnamed: 0', 'title', 'artist', 'top genre', 'year', 'bpm', 'nrgy',\n",
       "       'dnce', 'dB', 'live', 'val', 'dur', 'acous', 'spch', 'pop'],\n",
       "      dtype='object')"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Unnamed: 0     int64\n",
       "title         object\n",
       "artist        object\n",
       "top genre     object\n",
       "year           int64\n",
       "bpm            int64\n",
       "nrgy           int64\n",
       "dnce           int64\n",
       "dB             int64\n",
       "live           int64\n",
       "val            int64\n",
       "dur            int64\n",
       "acous          int64\n",
       "spch           int64\n",
       "pop            int64\n",
       "dtype: object"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.dtypes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**แสดง index ของ DataFrame**\n",
    "ตอนสร้าง DataFrame ถ้าไม่ได้กำหนด index จะเป็นเลขลำดับ 0, 1, 2, 3, … โดยอัตโนมัติ"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "RangeIndex(start=0, stop=603, step=1)"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.index"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "✄ ---------- เพิ่มเติม ----------- ✄ "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Unnamed: 0    0\n",
       "title         0\n",
       "artist        0\n",
       "top genre     0\n",
       "year          0\n",
       "bpm           0\n",
       "nrgy          0\n",
       "dnce          0\n",
       "dB            0\n",
       "live          0\n",
       "val           0\n",
       "dur           0\n",
       "acous         0\n",
       "spch          0\n",
       "pop           0\n",
       "dtype: int64"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# ******************************************************************\n",
    "df.isnull().sum() # ตรวจสอบค่า null values ว่าในแต่ละคอลัมน์มีอยู่กี่ค่า"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Unnamed: 0    603\n",
       "title         603\n",
       "artist        603\n",
       "top genre     603\n",
       "year          603\n",
       "bpm           603\n",
       "nrgy          603\n",
       "dnce          603\n",
       "dB            603\n",
       "live          603\n",
       "val           603\n",
       "dur           603\n",
       "acous         603\n",
       "spch          603\n",
       "pop           603\n",
       "dtype: int64"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.count() # นับจำนวนข้อมูลในแต่ละคอลัมน์"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
    "#******************************************************************"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "ตรวจสอบจำนวนเพลงที่...\n",
    "\n",
    "ชื่อเพลงไม่ซ้ำกัน 584 เพลง\n",
    "\n",
    "ชื่อศิลปินไม่ซ้ำกัน 184 เพลง"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Unnamed: 0    603\n",
       "title         584\n",
       "artist        184\n",
       "top genre      50\n",
       "year           10\n",
       "bpm           104\n",
       "nrgy           77\n",
       "dnce           70\n",
       "dB             14\n",
       "live           61\n",
       "val            94\n",
       "dur           144\n",
       "acous          75\n",
       "spch           39\n",
       "pop            71\n",
       "dtype: int64"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# ดูค่า nunique ของแต่ละคอลัมน์ (Count number of distinct elements in specified axis.\n",
    "\n",
    "df.nunique()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [],
   "source": [
    "# ✄ ****************************************************************** ✄\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "## การเข้าถึงข้อมูล (เบื้องต้น)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**การเข้าถึง Column**\n",
    "\n",
    "เราสามารถดูข้อมูลหรือเข้าถึง column ที่ต้องได้ โดยการใส่ชื่อคอลัมน์เป็น string ในวงเล็บเหลี่ยม [] (วงเล็บเดี่ยว) หรือใช้ชื่อคอลัมน์เป็นชื่อ attribute ก็ได้\n",
    "\n",
    "Pandas มีออบเจ็กต์พิเศษ 2 ชนิดสำหรับใช้บรรจุข้อมูล ซีรีส์ (Pandas series) และ เดตาเฟรม (Pandas dataFrame)) ดังแสดงในรูป"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![](images/Anatomy_of_Pandas_Data_Structures.png)\n",
    "[src] (https://towardsdatascience.com/how-to-master-pandas-for-data-science-b8ab0a9b1042)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      2010\n",
       "1      2010\n",
       "2      2010\n",
       "3      2010\n",
       "4      2010\n",
       "       ... \n",
       "598    2019\n",
       "599    2019\n",
       "600    2019\n",
       "601    2019\n",
       "602    2019\n",
       "Name: year, Length: 603, dtype: int64"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Access to the column year\n",
    "y = df['year']\n",
    "y"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      2010\n",
       "1      2010\n",
       "2      2010\n",
       "3      2010\n",
       "4      2010\n",
       "       ... \n",
       "598    2019\n",
       "599    2019\n",
       "600    2019\n",
       "601    2019\n",
       "602    2019\n",
       "Name: year, Length: 603, dtype: int64"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y = df.year\n",
    "y"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "pandas.core.series.Series"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "ผลลัพธ์ที่ได้เป็น 1 column เรียกว่า Panda series (1-D dataframe)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "ในกรณีที่ต้องการให้เป็น Dataframe ก็สามารถทำได้โดยใช้ วงเล็บเหลี่ยม [[ ]] (วงเล็บคู่) เช่น column <b>artist</b>\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>artist</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Train</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Eminem</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Kesha</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Lady Gaga</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Bruno Mars</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>598</th>\n",
       "      <td>Mark Ronson</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>599</th>\n",
       "      <td>Ed Sheeran</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>600</th>\n",
       "      <td>DJ Khaled</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>601</th>\n",
       "      <td>Mark Ronson</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>602</th>\n",
       "      <td>The Chainsmokers</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>603 rows × 1 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "               artist\n",
       "0               Train\n",
       "1              Eminem\n",
       "2               Kesha\n",
       "3           Lady Gaga\n",
       "4          Bruno Mars\n",
       "..                ...\n",
       "598       Mark Ronson\n",
       "599        Ed Sheeran\n",
       "600         DJ Khaled\n",
       "601       Mark Ronson\n",
       "602  The Chainsmokers\n",
       "\n",
       "[603 rows x 1 columns]"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Get the column as a dataframe\n",
    "y = df[['artist']]\n",
    "y"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "pandas.core.frame.DataFrame"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "หากต้องการหลายคอลัมน์ก็สามารถทำได้ แค่ใส่ชื่อ dataframe ในกรณีนี้คือ <code>df</code> และชื่อของชื่อของคอลัมน์ที่ต้องการในวงเล็บคู่ "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>artist</th>\n",
       "      <th>year</th>\n",
       "      <th>top genre</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Train</td>\n",
       "      <td>2010</td>\n",
       "      <td>neo mellow</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Eminem</td>\n",
       "      <td>2010</td>\n",
       "      <td>detroit hip hop</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Kesha</td>\n",
       "      <td>2010</td>\n",
       "      <td>dance pop</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Lady Gaga</td>\n",
       "      <td>2010</td>\n",
       "      <td>dance pop</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Bruno Mars</td>\n",
       "      <td>2010</td>\n",
       "      <td>pop</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>598</th>\n",
       "      <td>Mark Ronson</td>\n",
       "      <td>2019</td>\n",
       "      <td>dance pop</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>599</th>\n",
       "      <td>Ed Sheeran</td>\n",
       "      <td>2019</td>\n",
       "      <td>pop</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>600</th>\n",
       "      <td>DJ Khaled</td>\n",
       "      <td>2019</td>\n",
       "      <td>dance pop</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>601</th>\n",
       "      <td>Mark Ronson</td>\n",
       "      <td>2019</td>\n",
       "      <td>dance pop</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>602</th>\n",
       "      <td>The Chainsmokers</td>\n",
       "      <td>2019</td>\n",
       "      <td>electropop</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>603 rows × 3 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "               artist  year        top genre\n",
       "0               Train  2010       neo mellow\n",
       "1              Eminem  2010  detroit hip hop\n",
       "2               Kesha  2010        dance pop\n",
       "3           Lady Gaga  2010        dance pop\n",
       "4          Bruno Mars  2010              pop\n",
       "..                ...   ...              ...\n",
       "598       Mark Ronson  2019        dance pop\n",
       "599        Ed Sheeran  2019              pop\n",
       "600         DJ Khaled  2019        dance pop\n",
       "601       Mark Ronson  2019        dance pop\n",
       "602  The Chainsmokers  2019       electropop\n",
       "\n",
       "[603 rows x 3 columns]"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Access to multiple columns\n",
    "y = df[['artist','year','top genre']]\n",
    "y"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "<img src = \"images/DataEgTwo.png)\n",
    "            \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**จัดเรียงข้อมูล (Sorting): <code>sort_index()</code>, <code>sort_values()</code>**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "- <code>sort_index()</code> เรียงลำดับ row ตามค่าของ index\n",
    "- <code>sort_values()</code> เรียงลำดับ row ตามค่าของ column ที่กำหนด"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Unnamed: 0</th>\n",
       "      <th>title</th>\n",
       "      <th>artist</th>\n",
       "      <th>top genre</th>\n",
       "      <th>year</th>\n",
       "      <th>bpm</th>\n",
       "      <th>nrgy</th>\n",
       "      <th>dnce</th>\n",
       "      <th>dB</th>\n",
       "      <th>live</th>\n",
       "      <th>val</th>\n",
       "      <th>dur</th>\n",
       "      <th>acous</th>\n",
       "      <th>spch</th>\n",
       "      <th>pop</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>442</th>\n",
       "      <td>443</td>\n",
       "      <td>Million Years Ago</td>\n",
       "      <td>Adele</td>\n",
       "      <td>british soul</td>\n",
       "      <td>2016</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>-60</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>227</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>44</th>\n",
       "      <td>45</td>\n",
       "      <td>You Lost Me</td>\n",
       "      <td>Christina Aguilera</td>\n",
       "      <td>dance pop</td>\n",
       "      <td>2010</td>\n",
       "      <td>43</td>\n",
       "      <td>39</td>\n",
       "      <td>23</td>\n",
       "      <td>-6</td>\n",
       "      <td>14</td>\n",
       "      <td>7</td>\n",
       "      <td>257</td>\n",
       "      <td>85</td>\n",
       "      <td>4</td>\n",
       "      <td>56</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>87</th>\n",
       "      <td>88</td>\n",
       "      <td>1+1</td>\n",
       "      <td>BeyoncÈ</td>\n",
       "      <td>dance pop</td>\n",
       "      <td>2011</td>\n",
       "      <td>63</td>\n",
       "      <td>38</td>\n",
       "      <td>30</td>\n",
       "      <td>-7</td>\n",
       "      <td>7</td>\n",
       "      <td>26</td>\n",
       "      <td>274</td>\n",
       "      <td>38</td>\n",
       "      <td>3</td>\n",
       "      <td>60</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>6</td>\n",
       "      <td>Baby</td>\n",
       "      <td>Justin Bieber</td>\n",
       "      <td>canadian pop</td>\n",
       "      <td>2010</td>\n",
       "      <td>65</td>\n",
       "      <td>86</td>\n",
       "      <td>73</td>\n",
       "      <td>-5</td>\n",
       "      <td>11</td>\n",
       "      <td>54</td>\n",
       "      <td>214</td>\n",
       "      <td>4</td>\n",
       "      <td>14</td>\n",
       "      <td>77</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>460</th>\n",
       "      <td>461</td>\n",
       "      <td>Praying</td>\n",
       "      <td>Kesha</td>\n",
       "      <td>dance pop</td>\n",
       "      <td>2017</td>\n",
       "      <td>73</td>\n",
       "      <td>39</td>\n",
       "      <td>58</td>\n",
       "      <td>-7</td>\n",
       "      <td>11</td>\n",
       "      <td>32</td>\n",
       "      <td>230</td>\n",
       "      <td>49</td>\n",
       "      <td>3</td>\n",
       "      <td>77</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>314</th>\n",
       "      <td>315</td>\n",
       "      <td>I'll Show You</td>\n",
       "      <td>Justin Bieber</td>\n",
       "      <td>canadian pop</td>\n",
       "      <td>2015</td>\n",
       "      <td>192</td>\n",
       "      <td>61</td>\n",
       "      <td>36</td>\n",
       "      <td>-7</td>\n",
       "      <td>18</td>\n",
       "      <td>8</td>\n",
       "      <td>200</td>\n",
       "      <td>5</td>\n",
       "      <td>10</td>\n",
       "      <td>68</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>462</th>\n",
       "      <td>463</td>\n",
       "      <td>The Greatest</td>\n",
       "      <td>Sia</td>\n",
       "      <td>australian dance</td>\n",
       "      <td>2017</td>\n",
       "      <td>192</td>\n",
       "      <td>73</td>\n",
       "      <td>67</td>\n",
       "      <td>-6</td>\n",
       "      <td>6</td>\n",
       "      <td>73</td>\n",
       "      <td>210</td>\n",
       "      <td>1</td>\n",
       "      <td>27</td>\n",
       "      <td>76</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>200</th>\n",
       "      <td>201</td>\n",
       "      <td>How Ya Doin'? (feat. Missy Elliott)</td>\n",
       "      <td>Little Mix</td>\n",
       "      <td>dance pop</td>\n",
       "      <td>2013</td>\n",
       "      <td>201</td>\n",
       "      <td>95</td>\n",
       "      <td>36</td>\n",
       "      <td>-3</td>\n",
       "      <td>37</td>\n",
       "      <td>51</td>\n",
       "      <td>211</td>\n",
       "      <td>9</td>\n",
       "      <td>48</td>\n",
       "      <td>50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>362</th>\n",
       "      <td>363</td>\n",
       "      <td>L.A.LOVE (la la)</td>\n",
       "      <td>Fergie</td>\n",
       "      <td>dance pop</td>\n",
       "      <td>2015</td>\n",
       "      <td>202</td>\n",
       "      <td>39</td>\n",
       "      <td>48</td>\n",
       "      <td>-8</td>\n",
       "      <td>26</td>\n",
       "      <td>27</td>\n",
       "      <td>193</td>\n",
       "      <td>2</td>\n",
       "      <td>9</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>276</th>\n",
       "      <td>277</td>\n",
       "      <td>FourFiveSeconds</td>\n",
       "      <td>Rihanna</td>\n",
       "      <td>barbadian pop</td>\n",
       "      <td>2015</td>\n",
       "      <td>206</td>\n",
       "      <td>27</td>\n",
       "      <td>58</td>\n",
       "      <td>-6</td>\n",
       "      <td>13</td>\n",
       "      <td>35</td>\n",
       "      <td>188</td>\n",
       "      <td>88</td>\n",
       "      <td>5</td>\n",
       "      <td>80</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>603 rows × 15 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "     Unnamed: 0                                title              artist  \\\n",
       "442         443                    Million Years Ago               Adele   \n",
       "44           45                          You Lost Me  Christina Aguilera   \n",
       "87           88                                  1+1             BeyoncÈ   \n",
       "5             6                                 Baby       Justin Bieber   \n",
       "460         461                              Praying               Kesha   \n",
       "..          ...                                  ...                 ...   \n",
       "314         315                        I'll Show You       Justin Bieber   \n",
       "462         463                         The Greatest                 Sia   \n",
       "200         201  How Ya Doin'? (feat. Missy Elliott)          Little Mix   \n",
       "362         363                     L.A.LOVE (la la)              Fergie   \n",
       "276         277                      FourFiveSeconds             Rihanna   \n",
       "\n",
       "            top genre  year  bpm  nrgy  dnce  dB  live  val  dur  acous  spch  \\\n",
       "442      british soul  2016    0     0     0 -60     0    0  227      0     0   \n",
       "44          dance pop  2010   43    39    23  -6    14    7  257     85     4   \n",
       "87          dance pop  2011   63    38    30  -7     7   26  274     38     3   \n",
       "5        canadian pop  2010   65    86    73  -5    11   54  214      4    14   \n",
       "460         dance pop  2017   73    39    58  -7    11   32  230     49     3   \n",
       "..                ...   ...  ...   ...   ...  ..   ...  ...  ...    ...   ...   \n",
       "314      canadian pop  2015  192    61    36  -7    18    8  200      5    10   \n",
       "462  australian dance  2017  192    73    67  -6     6   73  210      1    27   \n",
       "200         dance pop  2013  201    95    36  -3    37   51  211      9    48   \n",
       "362         dance pop  2015  202    39    48  -8    26   27  193      2     9   \n",
       "276     barbadian pop  2015  206    27    58  -6    13   35  188     88     5   \n",
       "\n",
       "     pop  \n",
       "442    0  \n",
       "44    56  \n",
       "87    60  \n",
       "5     77  \n",
       "460   77  \n",
       "..   ...  \n",
       "314   68  \n",
       "462   76  \n",
       "200   50  \n",
       "362    0  \n",
       "276   80  \n",
       "\n",
       "[603 rows x 15 columns]"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.sort_values(by='bpm')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Unnamed: 0</th>\n",
       "      <th>title</th>\n",
       "      <th>artist</th>\n",
       "      <th>top genre</th>\n",
       "      <th>year</th>\n",
       "      <th>bpm</th>\n",
       "      <th>nrgy</th>\n",
       "      <th>dnce</th>\n",
       "      <th>dB</th>\n",
       "      <th>live</th>\n",
       "      <th>val</th>\n",
       "      <th>dur</th>\n",
       "      <th>acous</th>\n",
       "      <th>spch</th>\n",
       "      <th>pop</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>Hey, Soul Sister</td>\n",
       "      <td>Train</td>\n",
       "      <td>neo mellow</td>\n",
       "      <td>2010</td>\n",
       "      <td>97</td>\n",
       "      <td>89</td>\n",
       "      <td>67</td>\n",
       "      <td>-4</td>\n",
       "      <td>8</td>\n",
       "      <td>80</td>\n",
       "      <td>217</td>\n",
       "      <td>19</td>\n",
       "      <td>4</td>\n",
       "      <td>83</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>Love The Way You Lie</td>\n",
       "      <td>Eminem</td>\n",
       "      <td>detroit hip hop</td>\n",
       "      <td>2010</td>\n",
       "      <td>87</td>\n",
       "      <td>93</td>\n",
       "      <td>75</td>\n",
       "      <td>-5</td>\n",
       "      <td>52</td>\n",
       "      <td>64</td>\n",
       "      <td>263</td>\n",
       "      <td>24</td>\n",
       "      <td>23</td>\n",
       "      <td>82</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>TiK ToK</td>\n",
       "      <td>Kesha</td>\n",
       "      <td>dance pop</td>\n",
       "      <td>2010</td>\n",
       "      <td>120</td>\n",
       "      <td>84</td>\n",
       "      <td>76</td>\n",
       "      <td>-3</td>\n",
       "      <td>29</td>\n",
       "      <td>71</td>\n",
       "      <td>200</td>\n",
       "      <td>10</td>\n",
       "      <td>14</td>\n",
       "      <td>80</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>Bad Romance</td>\n",
       "      <td>Lady Gaga</td>\n",
       "      <td>dance pop</td>\n",
       "      <td>2010</td>\n",
       "      <td>119</td>\n",
       "      <td>92</td>\n",
       "      <td>70</td>\n",
       "      <td>-4</td>\n",
       "      <td>8</td>\n",
       "      <td>71</td>\n",
       "      <td>295</td>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "      <td>79</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>Just the Way You Are</td>\n",
       "      <td>Bruno Mars</td>\n",
       "      <td>pop</td>\n",
       "      <td>2010</td>\n",
       "      <td>109</td>\n",
       "      <td>84</td>\n",
       "      <td>64</td>\n",
       "      <td>-5</td>\n",
       "      <td>9</td>\n",
       "      <td>43</td>\n",
       "      <td>221</td>\n",
       "      <td>2</td>\n",
       "      <td>4</td>\n",
       "      <td>78</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>598</th>\n",
       "      <td>599</td>\n",
       "      <td>Find U Again (feat. Camila Cabello)</td>\n",
       "      <td>Mark Ronson</td>\n",
       "      <td>dance pop</td>\n",
       "      <td>2019</td>\n",
       "      <td>104</td>\n",
       "      <td>66</td>\n",
       "      <td>61</td>\n",
       "      <td>-7</td>\n",
       "      <td>20</td>\n",
       "      <td>16</td>\n",
       "      <td>176</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>75</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>599</th>\n",
       "      <td>600</td>\n",
       "      <td>Cross Me (feat. Chance the Rapper &amp; PnB Rock)</td>\n",
       "      <td>Ed Sheeran</td>\n",
       "      <td>pop</td>\n",
       "      <td>2019</td>\n",
       "      <td>95</td>\n",
       "      <td>79</td>\n",
       "      <td>75</td>\n",
       "      <td>-6</td>\n",
       "      <td>7</td>\n",
       "      <td>61</td>\n",
       "      <td>206</td>\n",
       "      <td>21</td>\n",
       "      <td>12</td>\n",
       "      <td>75</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>600</th>\n",
       "      <td>601</td>\n",
       "      <td>No Brainer (feat. Justin Bieber, Chance the Ra...</td>\n",
       "      <td>DJ Khaled</td>\n",
       "      <td>dance pop</td>\n",
       "      <td>2019</td>\n",
       "      <td>136</td>\n",
       "      <td>76</td>\n",
       "      <td>53</td>\n",
       "      <td>-5</td>\n",
       "      <td>9</td>\n",
       "      <td>65</td>\n",
       "      <td>260</td>\n",
       "      <td>7</td>\n",
       "      <td>34</td>\n",
       "      <td>70</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>601</th>\n",
       "      <td>602</td>\n",
       "      <td>Nothing Breaks Like a Heart (feat. Miley Cyrus)</td>\n",
       "      <td>Mark Ronson</td>\n",
       "      <td>dance pop</td>\n",
       "      <td>2019</td>\n",
       "      <td>114</td>\n",
       "      <td>79</td>\n",
       "      <td>60</td>\n",
       "      <td>-6</td>\n",
       "      <td>42</td>\n",
       "      <td>24</td>\n",
       "      <td>217</td>\n",
       "      <td>1</td>\n",
       "      <td>7</td>\n",
       "      <td>69</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>602</th>\n",
       "      <td>603</td>\n",
       "      <td>Kills You Slowly</td>\n",
       "      <td>The Chainsmokers</td>\n",
       "      <td>electropop</td>\n",
       "      <td>2019</td>\n",
       "      <td>150</td>\n",
       "      <td>44</td>\n",
       "      <td>70</td>\n",
       "      <td>-9</td>\n",
       "      <td>13</td>\n",
       "      <td>23</td>\n",
       "      <td>213</td>\n",
       "      <td>6</td>\n",
       "      <td>6</td>\n",
       "      <td>67</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>603 rows × 15 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "     Unnamed: 0                                              title  \\\n",
       "0             1                                   Hey, Soul Sister   \n",
       "1             2                               Love The Way You Lie   \n",
       "2             3                                            TiK ToK   \n",
       "3             4                                        Bad Romance   \n",
       "4             5                               Just the Way You Are   \n",
       "..          ...                                                ...   \n",
       "598         599                Find U Again (feat. Camila Cabello)   \n",
       "599         600      Cross Me (feat. Chance the Rapper & PnB Rock)   \n",
       "600         601  No Brainer (feat. Justin Bieber, Chance the Ra...   \n",
       "601         602    Nothing Breaks Like a Heart (feat. Miley Cyrus)   \n",
       "602         603                                   Kills You Slowly   \n",
       "\n",
       "               artist        top genre  year  bpm  nrgy  dnce  dB  live  val  \\\n",
       "0               Train       neo mellow  2010   97    89    67  -4     8   80   \n",
       "1              Eminem  detroit hip hop  2010   87    93    75  -5    52   64   \n",
       "2               Kesha        dance pop  2010  120    84    76  -3    29   71   \n",
       "3           Lady Gaga        dance pop  2010  119    92    70  -4     8   71   \n",
       "4          Bruno Mars              pop  2010  109    84    64  -5     9   43   \n",
       "..                ...              ...   ...  ...   ...   ...  ..   ...  ...   \n",
       "598       Mark Ronson        dance pop  2019  104    66    61  -7    20   16   \n",
       "599        Ed Sheeran              pop  2019   95    79    75  -6     7   61   \n",
       "600         DJ Khaled        dance pop  2019  136    76    53  -5     9   65   \n",
       "601       Mark Ronson        dance pop  2019  114    79    60  -6    42   24   \n",
       "602  The Chainsmokers       electropop  2019  150    44    70  -9    13   23   \n",
       "\n",
       "     dur  acous  spch  pop  \n",
       "0    217     19     4   83  \n",
       "1    263     24    23   82  \n",
       "2    200     10    14   80  \n",
       "3    295      0     4   79  \n",
       "4    221      2     4   78  \n",
       "..   ...    ...   ...  ...  \n",
       "598  176      1     3   75  \n",
       "599  206     21    12   75  \n",
       "600  260      7    34   70  \n",
       "601  217      1     7   69  \n",
       "602  213      6     6   67  \n",
       "\n",
       "[603 rows x 15 columns]"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.sort_index()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "**การดูข้อมูลและการเข้าถึงข้อมูล: <code>iloc()</code>**\n",
    "\n",
    "วิธีหนึ่งในการเข้าถึงข้อมูลแต่ละช่องคือใช้เมธอด <code>iloc()</code> (**i**nteger-**loc**ation based indexing)\n",
    "\n",
    "ยกตัวอย่าง เราสามารถเข้าถึงแถวที่ 1 และคอลัมน์ที่ 2 ได้ดังนี้"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Hey, Soul Sister'"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Access the value on the first row and the first column\n",
    "df.iloc[0, 1]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "ถ้าเป็นแถวที่ 1 และคอลัมน์ที่ 3 เขียนได้ดังนี้"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Train'"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Access the value on the second row and the first column\n",
    "df.iloc[0,2]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "เข้าถึงแถวที่ 2 และคอลัมน์ที่ 2 และแถวที่ 2 และคอลัมน์ที่ 3 ได้ดังนี้"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Love The Way You Lie'"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Access the value on the first row and the third column\n",
    "df.iloc[1,1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Eminem'"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Access the value on the second row and the third column\n",
    "df.iloc[1,2]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "แสดงได้ดังรูป"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "![](images/2_loc.png)\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "นอกจากจะใช้ index แล้วเรายังสามารถเข้าถึงคอลัมน์โดยใช้ชื่อได้เช่นกันโดยใช้เมธอด <code>loc()</code> (**loc**ation based indexing) (ผลลัพธ์เหมือนกับก่อนหน้านี้)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Hey, Soul Sister'"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Access the column using the name\n",
    "\n",
    "df.loc[0, 'title']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Train'"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Access the column using the name\n",
    "\n",
    "df.loc[0, 'artist']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Love The Way You Lie'"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Access the column using the name\n",
    "\n",
    "df.loc[1, 'title']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Eminem'"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Access the column using the name\n",
    "\n",
    "df.loc[1, 'artist']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**การตัด (Slicing)**\n",
    "\n",
    "เราสามารถทำการตัดข้อมูลโดยใช้ทั้งดัชนีและชื่อของคอลัมน์ได้"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>title</th>\n",
       "      <th>artist</th>\n",
       "      <th>top genre</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Love The Way You Lie</td>\n",
       "      <td>Eminem</td>\n",
       "      <td>detroit hip hop</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>TiK ToK</td>\n",
       "      <td>Kesha</td>\n",
       "      <td>dance pop</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                  title  artist        top genre\n",
       "1  Love The Way You Lie  Eminem  detroit hip hop\n",
       "2               TiK ToK   Kesha        dance pop"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Slicing the dataframe\n",
    "\n",
    "z = df.iloc[1:3, 1:4]\n",
    "z"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![](images/3_loc.png)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>title</th>\n",
       "      <th>artist</th>\n",
       "      <th>top genre</th>\n",
       "      <th>year</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Hey, Soul Sister</td>\n",
       "      <td>Train</td>\n",
       "      <td>neo mellow</td>\n",
       "      <td>2010</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Love The Way You Lie</td>\n",
       "      <td>Eminem</td>\n",
       "      <td>detroit hip hop</td>\n",
       "      <td>2010</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>TiK ToK</td>\n",
       "      <td>Kesha</td>\n",
       "      <td>dance pop</td>\n",
       "      <td>2010</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Bad Romance</td>\n",
       "      <td>Lady Gaga</td>\n",
       "      <td>dance pop</td>\n",
       "      <td>2010</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                  title     artist        top genre  year\n",
       "0      Hey, Soul Sister      Train       neo mellow  2010\n",
       "1  Love The Way You Lie     Eminem  detroit hip hop  2010\n",
       "2               TiK ToK      Kesha        dance pop  2010\n",
       "3           Bad Romance  Lady Gaga        dance pop  2010"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Slicing the dataframe using name\n",
    "\n",
    "z = df.loc[0:3, 'title':'year']\n",
    "z"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![](images/4_loc.png)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**การจัดกลุ่มข้อมูลตาม column ที่กำหนด: <code>groupby()</code>**\n",
    "\n",
    "จัดกลุ่มข้อมูลตาม column ที่กำหนดโดยใช้เมธอด <code>groupby()</code>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/var/folders/nb/qx4_7k_n2mx3zzwh39qtrg_c0000gn/T/ipykernel_62169/130325292.py:1: FutureWarning: The default value of numeric_only in DataFrameGroupBy.median is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.\n",
      "  df.groupby('year').median()\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Unnamed: 0</th>\n",
       "      <th>bpm</th>\n",
       "      <th>nrgy</th>\n",
       "      <th>dnce</th>\n",
       "      <th>dB</th>\n",
       "      <th>live</th>\n",
       "      <th>val</th>\n",
       "      <th>dur</th>\n",
       "      <th>acous</th>\n",
       "      <th>spch</th>\n",
       "      <th>pop</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>year</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2010</th>\n",
       "      <td>26.0</td>\n",
       "      <td>125.0</td>\n",
       "      <td>82.0</td>\n",
       "      <td>67.0</td>\n",
       "      <td>-5.0</td>\n",
       "      <td>13.0</td>\n",
       "      <td>61.0</td>\n",
       "      <td>221.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>65.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2011</th>\n",
       "      <td>78.0</td>\n",
       "      <td>126.0</td>\n",
       "      <td>78.0</td>\n",
       "      <td>65.0</td>\n",
       "      <td>-5.0</td>\n",
       "      <td>14.0</td>\n",
       "      <td>57.0</td>\n",
       "      <td>231.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>66.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012</th>\n",
       "      <td>122.0</td>\n",
       "      <td>125.0</td>\n",
       "      <td>79.0</td>\n",
       "      <td>67.0</td>\n",
       "      <td>-5.0</td>\n",
       "      <td>12.0</td>\n",
       "      <td>68.0</td>\n",
       "      <td>225.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>72.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013</th>\n",
       "      <td>175.0</td>\n",
       "      <td>126.0</td>\n",
       "      <td>78.0</td>\n",
       "      <td>63.0</td>\n",
       "      <td>-5.0</td>\n",
       "      <td>13.0</td>\n",
       "      <td>52.0</td>\n",
       "      <td>227.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>67.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2014</th>\n",
       "      <td>239.5</td>\n",
       "      <td>125.0</td>\n",
       "      <td>72.0</td>\n",
       "      <td>63.5</td>\n",
       "      <td>-6.0</td>\n",
       "      <td>13.5</td>\n",
       "      <td>54.0</td>\n",
       "      <td>223.5</td>\n",
       "      <td>8.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>65.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2015</th>\n",
       "      <td>316.0</td>\n",
       "      <td>120.0</td>\n",
       "      <td>75.0</td>\n",
       "      <td>67.0</td>\n",
       "      <td>-5.0</td>\n",
       "      <td>13.0</td>\n",
       "      <td>52.0</td>\n",
       "      <td>220.0</td>\n",
       "      <td>6.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>68.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2016</th>\n",
       "      <td>403.5</td>\n",
       "      <td>113.0</td>\n",
       "      <td>71.0</td>\n",
       "      <td>63.0</td>\n",
       "      <td>-6.0</td>\n",
       "      <td>13.0</td>\n",
       "      <td>43.0</td>\n",
       "      <td>217.5</td>\n",
       "      <td>5.5</td>\n",
       "      <td>6.0</td>\n",
       "      <td>69.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2017</th>\n",
       "      <td>476.0</td>\n",
       "      <td>106.0</td>\n",
       "      <td>72.0</td>\n",
       "      <td>68.0</td>\n",
       "      <td>-5.0</td>\n",
       "      <td>12.0</td>\n",
       "      <td>54.0</td>\n",
       "      <td>222.0</td>\n",
       "      <td>9.0</td>\n",
       "      <td>6.0</td>\n",
       "      <td>71.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2018</th>\n",
       "      <td>540.5</td>\n",
       "      <td>110.0</td>\n",
       "      <td>67.0</td>\n",
       "      <td>67.5</td>\n",
       "      <td>-6.0</td>\n",
       "      <td>11.5</td>\n",
       "      <td>47.5</td>\n",
       "      <td>211.5</td>\n",
       "      <td>5.0</td>\n",
       "      <td>6.0</td>\n",
       "      <td>74.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019</th>\n",
       "      <td>588.0</td>\n",
       "      <td>104.0</td>\n",
       "      <td>68.0</td>\n",
       "      <td>70.0</td>\n",
       "      <td>-6.0</td>\n",
       "      <td>10.0</td>\n",
       "      <td>52.0</td>\n",
       "      <td>202.0</td>\n",
       "      <td>13.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>85.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      Unnamed: 0    bpm  nrgy  dnce   dB  live   val    dur  acous  spch   pop\n",
       "year                                                                          \n",
       "2010        26.0  125.0  82.0  67.0 -5.0  13.0  61.0  221.0    3.0   5.0  65.0\n",
       "2011        78.0  126.0  78.0  65.0 -5.0  14.0  57.0  231.0    5.0   5.0  66.0\n",
       "2012       122.0  125.0  79.0  67.0 -5.0  12.0  68.0  225.0    2.0   5.0  72.0\n",
       "2013       175.0  126.0  78.0  63.0 -5.0  13.0  52.0  227.0    4.0   5.0  67.0\n",
       "2014       239.5  125.0  72.0  63.5 -6.0  13.5  54.0  223.5    8.0   5.0  65.0\n",
       "2015       316.0  120.0  75.0  67.0 -5.0  13.0  52.0  220.0    6.0   5.0  68.0\n",
       "2016       403.5  113.0  71.0  63.0 -6.0  13.0  43.0  217.5    5.5   6.0  69.0\n",
       "2017       476.0  106.0  72.0  68.0 -5.0  12.0  54.0  222.0    9.0   6.0  71.0\n",
       "2018       540.5  110.0  67.0  67.5 -6.0  11.5  47.5  211.5    5.0   6.0  74.5\n",
       "2019       588.0  104.0  68.0  70.0 -6.0  10.0  52.0  202.0   13.0   5.0  85.0"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby('year').median()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/var/folders/nb/qx4_7k_n2mx3zzwh39qtrg_c0000gn/T/ipykernel_62169/3811477203.py:1: FutureWarning: The default value of numeric_only in DataFrameGroupBy.mean is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.\n",
      "  df.groupby('artist').mean().head()\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Unnamed: 0</th>\n",
       "      <th>year</th>\n",
       "      <th>bpm</th>\n",
       "      <th>nrgy</th>\n",
       "      <th>dnce</th>\n",
       "      <th>dB</th>\n",
       "      <th>live</th>\n",
       "      <th>val</th>\n",
       "      <th>dur</th>\n",
       "      <th>acous</th>\n",
       "      <th>spch</th>\n",
       "      <th>pop</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>artist</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>3OH!3</th>\n",
       "      <td>32.0</td>\n",
       "      <td>2010.0</td>\n",
       "      <td>138.0</td>\n",
       "      <td>89.0</td>\n",
       "      <td>68.0</td>\n",
       "      <td>-4.0</td>\n",
       "      <td>36.0</td>\n",
       "      <td>83.0</td>\n",
       "      <td>192.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>8.0</td>\n",
       "      <td>62.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5 Seconds of Summer</th>\n",
       "      <td>230.0</td>\n",
       "      <td>2014.0</td>\n",
       "      <td>160.0</td>\n",
       "      <td>95.0</td>\n",
       "      <td>49.0</td>\n",
       "      <td>-4.0</td>\n",
       "      <td>33.0</td>\n",
       "      <td>44.0</td>\n",
       "      <td>202.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>13.0</td>\n",
       "      <td>71.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>A Great Big World</th>\n",
       "      <td>250.0</td>\n",
       "      <td>2014.0</td>\n",
       "      <td>138.0</td>\n",
       "      <td>15.0</td>\n",
       "      <td>45.0</td>\n",
       "      <td>-9.0</td>\n",
       "      <td>9.0</td>\n",
       "      <td>9.0</td>\n",
       "      <td>229.0</td>\n",
       "      <td>87.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>61.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Adam Lambert</th>\n",
       "      <td>30.0</td>\n",
       "      <td>2010.0</td>\n",
       "      <td>158.5</td>\n",
       "      <td>79.5</td>\n",
       "      <td>54.5</td>\n",
       "      <td>-4.5</td>\n",
       "      <td>21.5</td>\n",
       "      <td>62.0</td>\n",
       "      <td>227.5</td>\n",
       "      <td>1.0</td>\n",
       "      <td>8.0</td>\n",
       "      <td>62.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Adele</th>\n",
       "      <td>317.8</td>\n",
       "      <td>2014.7</td>\n",
       "      <td>119.3</td>\n",
       "      <td>47.3</td>\n",
       "      <td>52.3</td>\n",
       "      <td>-11.4</td>\n",
       "      <td>10.4</td>\n",
       "      <td>35.4</td>\n",
       "      <td>257.5</td>\n",
       "      <td>35.3</td>\n",
       "      <td>4.0</td>\n",
       "      <td>61.5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                     Unnamed: 0    year    bpm  nrgy  dnce    dB  live   val  \\\n",
       "artist                                                                         \n",
       "3OH!3                      32.0  2010.0  138.0  89.0  68.0  -4.0  36.0  83.0   \n",
       "5 Seconds of Summer       230.0  2014.0  160.0  95.0  49.0  -4.0  33.0  44.0   \n",
       "A Great Big World         250.0  2014.0  138.0  15.0  45.0  -9.0   9.0   9.0   \n",
       "Adam Lambert               30.0  2010.0  158.5  79.5  54.5  -4.5  21.5  62.0   \n",
       "Adele                     317.8  2014.7  119.3  47.3  52.3 -11.4  10.4  35.4   \n",
       "\n",
       "                       dur  acous  spch   pop  \n",
       "artist                                         \n",
       "3OH!3                192.0    1.0   8.0  62.0  \n",
       "5 Seconds of Summer  202.0    0.0  13.0  71.0  \n",
       "A Great Big World    229.0   87.0   3.0  61.0  \n",
       "Adam Lambert         227.5    1.0   8.0  62.5  \n",
       "Adele                257.5   35.3   4.0  61.5  "
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby('artist').mean().head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<hr>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## [Exercise]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1) จงเขียนโค้ด กำหนดให้ตัวแปร <code>q</code> เก็บข้อมูลคอลัมน์ <b>title</b> ในรูปแบบของ dataframe\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Write your code below and press Shift+Enter to execute\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "2) เขียนโค้ดกำหนดค่าให้ตัวแปร <code>q</code> เก็บ dataframe ที่ประกอบด้วยคอลัมน์ <b>artist</b> <b>title</b> และ <b>year</b>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Write your code below and press Shift+Enter to execute\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "3) เขียนโค้ดเข้าถึงข้อมูลที่อยู่ในแถวที่ 102 และคอลัมน์ที่ 3 ของ <code>df</code>\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Write your code below and press Shift+Enter to execute\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "4) ค้นหาชื่อเพลงที่มีค่า <code>nrgy</code> ต่ำสุดและสูงสุด "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Write your code below and press Shift+Enter to execute\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Author\n",
    "\n",
    "S.C.\n",
    "\n",
    "### Change Log\n",
    "\n",
    "    \n",
    "|  Date  |  Version | Change Description |\n",
    "|---|---|---|\n",
    "| 08-08-2021 | 0.1  | First edition |\n",
    "|   |   |   |\n",
    "|   |   |   |\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel(3.8.13))",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.13"
  },
  "latex_envs": {
   "LaTeX_envs_menu_present": true,
   "autoclose": false,
   "autocomplete": true,
   "bibliofile": "biblio.bib",
   "cite_by": "apalike",
   "current_citInitial": 1,
   "eqLabelWithNumbers": true,
   "eqNumInitial": 1,
   "hotkeys": {
    "equation": "Ctrl-E",
    "itemize": "Ctrl-I"
   },
   "labels_anchors": false,
   "latex_user_defs": false,
   "report_style_numbering": false,
   "user_envs_cfg": false
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": true,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  },
  "toc-autonumbering": true,
  "toc-showcode": false,
  "toc-showmarkdowntxt": false,
  "toc-showtags": false
 },
 "nbformat": 4,
 "nbformat_minor": 4
}