Loading... <div class="tip inlineBlock error"> 这里是ipynb格式 [5.pandas 新增数据列.html](http://type.zimopy.com/usr/uploads/2022/12/1528750922.html) </div> **下面是md格式** # pandas 新增数据列 1. 直接赋值 2. df.apply 3. df.assign 4. 按条件选择分组分别赋值 ```python import pandas as pd ``` ```python fpath = "../datas/beijing_tianqi/beijing_tianqi_2018.csv" df =pd.read_csv(fpath) ``` ```python df.head() ``` <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>ymd</th> <th>bWendu</th> <th>yWendu</th> <th>tianqi</th> <th>fengxiang</th> <th>fengli</th> <th>aqi</th> <th>aqiInfo</th> <th>aqiLevel</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>2018-01-01</td> <td>3℃</td> <td>-6℃</td> <td>晴~多云</td> <td>东北风</td> <td>1-2级</td> <td>59</td> <td>良</td> <td>2</td> </tr> <tr> <th>1</th> <td>2018-01-02</td> <td>2℃</td> <td>-5℃</td> <td>阴~多云</td> <td>东北风</td> <td>1-2级</td> <td>49</td> <td>优</td> <td>1</td> </tr> <tr> <th>2</th> <td>2018-01-03</td> <td>2℃</td> <td>-5℃</td> <td>多云</td> <td>北风</td> <td>1-2级</td> <td>28</td> <td>优</td> <td>1</td> </tr> <tr> <th>3</th> <td>2018-01-04</td> <td>0℃</td> <td>-8℃</td> <td>阴</td> <td>东北风</td> <td>1-2级</td> <td>28</td> <td>优</td> <td>1</td> </tr> <tr> <th>4</th> <td>2018-01-05</td> <td>3℃</td> <td>-6℃</td> <td>多云~晴</td> <td>西北风</td> <td>1-2级</td> <td>50</td> <td>优</td> <td>1</td> </tr> </tbody> </table> </div> # 1.直接赋值 实例:清理温度列,变成数字类型 ```python df.loc[:,"bWendu"] = df["bWendu"].str.replace("℃","").astype("int32") df.loc[:,"yWendu"] = df["yWendu"].str.replace("℃","").astype("int32") ``` ## 实例:计算温差-加列 ```python #注意,df["bWendu"] 其实就是一个Series,后面的减法返回的是Series df.loc[:,"wencha"] = df["bWendu"] - df["yWendu"] ``` ```python df.head() ``` <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>ymd</th> <th>bWendu</th> <th>yWendu</th> <th>tianqi</th> <th>fengxiang</th> <th>fengli</th> <th>aqi</th> <th>aqiInfo</th> <th>aqiLevel</th> <th>wencha</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>2018-01-01</td> <td>3</td> <td>-6</td> <td>晴~多云</td> <td>东北风</td> <td>1-2级</td> <td>59</td> <td>良</td> <td>2</td> <td>9</td> </tr> <tr> <th>1</th> <td>2018-01-02</td> <td>2</td> <td>-5</td> <td>阴~多云</td> <td>东北风</td> <td>1-2级</td> <td>49</td> <td>优</td> <td>1</td> <td>7</td> </tr> <tr> <th>2</th> <td>2018-01-03</td> <td>2</td> <td>-5</td> <td>多云</td> <td>北风</td> <td>1-2级</td> <td>28</td> <td>优</td> <td>1</td> <td>7</td> </tr> <tr> <th>3</th> <td>2018-01-04</td> <td>0</td> <td>-8</td> <td>阴</td> <td>东北风</td> <td>1-2级</td> <td>28</td> <td>优</td> <td>1</td> <td>8</td> </tr> <tr> <th>4</th> <td>2018-01-05</td> <td>3</td> <td>-6</td> <td>多云~晴</td> <td>西北风</td> <td>1-2级</td> <td>50</td> <td>优</td> <td>1</td> <td>9</td> </tr> </tbody> </table> </div> # 2. df.apply方法 Apply a function along an axis of the DataFrame. Objects passed to the function are Series objects whose index is either the DataFrame's index(axis=O) or the DataFrame's columns (axis=1). **apply就是设置axis等于0还是1,来决定是循环一行还是一列 实例:添加一列温度类型: 1. 如果最高温度大于33度就是高温 2. 低于-10度是低温 3. 否则是常温 > df.apply()#因为.apply是遍历整个dataframe,x相当于每次需要计算的一行或一列, > x[index]是一个数,可以进行判断,一行(沿列从上到下进行计算,axis=1)还是一列由axis决定df,默认可以不用写 ```python def get_wendu_type(x): #x就是遍历dataframe的每一行也就是一个series,默认传入的是df if x["bWendu"]>33: return "高温" if x["yWendu"]<-10: return "低温" return "常温" # 注意需要设置axis == 1,这是series的index还是columns df.loc[:,"wendu_type"] = df.apply(get_wendu_type,axis=1) df ``` <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>ymd</th> <th>bWendu</th> <th>yWendu</th> <th>tianqi</th> <th>fengxiang</th> <th>fengli</th> <th>aqi</th> <th>aqiInfo</th> <th>aqiLevel</th> <th>wencha</th> <th>wendu_type</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>2018-01-01</td> <td>3</td> <td>-6</td> <td>晴~多云</td> <td>东北风</td> <td>1-2级</td> <td>59</td> <td>良</td> <td>2</td> <td>9</td> <td>常温</td> </tr> <tr> <th>1</th> <td>2018-01-02</td> <td>2</td> <td>-5</td> <td>阴~多云</td> <td>东北风</td> <td>1-2级</td> <td>49</td> <td>优</td> <td>1</td> <td>7</td> <td>常温</td> </tr> <tr> <th>2</th> <td>2018-01-03</td> <td>2</td> <td>-5</td> <td>多云</td> <td>北风</td> <td>1-2级</td> <td>28</td> <td>优</td> <td>1</td> <td>7</td> <td>常温</td> </tr> <tr> <th>3</th> <td>2018-01-04</td> <td>0</td> <td>-8</td> <td>阴</td> <td>东北风</td> <td>1-2级</td> <td>28</td> <td>优</td> <td>1</td> <td>8</td> <td>常温</td> </tr> <tr> <th>4</th> <td>2018-01-05</td> <td>3</td> <td>-6</td> <td>多云~晴</td> <td>西北风</td> <td>1-2级</td> <td>50</td> <td>优</td> <td>1</td> <td>9</td> <td>常温</td> </tr> <tr> <th>...</th> <td>...</td> <td>...</td> <td>...</td> <td>...</td> <td>...</td> <td>...</td> <td>...</td> <td>...</td> <td>...</td> <td>...</td> <td>...</td> </tr> <tr> <th>360</th> <td>2018-12-27</td> <td>-5</td> <td>-12</td> <td>多云~晴</td> <td>西北风</td> <td>3级</td> <td>48</td> <td>优</td> <td>1</td> <td>7</td> <td>低温</td> </tr> <tr> <th>361</th> <td>2018-12-28</td> <td>-3</td> <td>-11</td> <td>晴</td> <td>西北风</td> <td>3级</td> <td>40</td> <td>优</td> <td>1</td> <td>8</td> <td>低温</td> </tr> <tr> <th>362</th> <td>2018-12-29</td> <td>-3</td> <td>-12</td> <td>晴</td> <td>西北风</td> <td>2级</td> <td>29</td> <td>优</td> <td>1</td> <td>9</td> <td>低温</td> </tr> <tr> <th>363</th> <td>2018-12-30</td> <td>-2</td> <td>-11</td> <td>晴~多云</td> <td>东北风</td> <td>1级</td> <td>31</td> <td>优</td> <td>1</td> <td>9</td> <td>低温</td> </tr> <tr> <th>364</th> <td>2018-12-31</td> <td>-2</td> <td>-10</td> <td>多云</td> <td>东北风</td> <td>1级</td> <td>56</td> <td>良</td> <td>2</td> <td>8</td> <td>常温</td> </tr> </tbody> </table> <p>365 rows × 11 columns</p> </div> ```python # 查看温度类型的计数 df.loc[:,"wendu_type"].value_counts() ``` 常温 328 高温 29 低温 8 Name: wendu_type, dtype: int64 # 3.df.assign方法 Assign new columns to a DataFrame. Returns a new object with all original columns in addition to new ones. 实例:将温度从摄氏度变成华摄氏度 ## 可以同时添加多个新的列 ```python df.assign( yWendu_huashi = lambda x : x["yWendu"] * 9 /5 + 32, # 摄氏度转华氏度 bWendu_huashi = lambda x :x["bWendu"] * 9 /5 + 32 ) ``` <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>ymd</th> <th>bWendu</th> <th>yWendu</th> <th>tianqi</th> <th>fengxiang</th> <th>fengli</th> <th>aqi</th> <th>aqiInfo</th> <th>aqiLevel</th> <th>wencha</th> <th>wendu_type</th> <th>yWendu_huashi</th> <th>bWendu_huashi</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>2018-01-01</td> <td>3</td> <td>-6</td> <td>晴~多云</td> <td>东北风</td> <td>1-2级</td> <td>59</td> <td>良</td> <td>2</td> <td>9</td> <td>常温</td> <td>21.2</td> <td>37.4</td> </tr> <tr> <th>1</th> <td>2018-01-02</td> <td>2</td> <td>-5</td> <td>阴~多云</td> <td>东北风</td> <td>1-2级</td> <td>49</td> <td>优</td> <td>1</td> <td>7</td> <td>常温</td> <td>23.0</td> <td>35.6</td> </tr> <tr> <th>2</th> <td>2018-01-03</td> <td>2</td> <td>-5</td> <td>多云</td> <td>北风</td> <td>1-2级</td> <td>28</td> <td>优</td> <td>1</td> <td>7</td> <td>常温</td> <td>23.0</td> <td>35.6</td> </tr> <tr> <th>3</th> <td>2018-01-04</td> <td>0</td> <td>-8</td> <td>阴</td> <td>东北风</td> <td>1-2级</td> <td>28</td> <td>优</td> <td>1</td> <td>8</td> <td>常温</td> <td>17.6</td> <td>32.0</td> </tr> <tr> <th>4</th> <td>2018-01-05</td> <td>3</td> <td>-6</td> <td>多云~晴</td> <td>西北风</td> <td>1-2级</td> <td>50</td> <td>优</td> <td>1</td> <td>9</td> <td>常温</td> <td>21.2</td> <td>37.4</td> </tr> <tr> <th>...</th> <td>...</td> <td>...</td> <td>...</td> <td>...</td> <td>...</td> <td>...</td> <td>...</td> <td>...</td> <td>...</td> <td>...</td> <td>...</td> <td>...</td> <td>...</td> </tr> <tr> <th>360</th> <td>2018-12-27</td> <td>-5</td> <td>-12</td> <td>多云~晴</td> <td>西北风</td> <td>3级</td> <td>48</td> <td>优</td> <td>1</td> <td>7</td> <td>低温</td> <td>10.4</td> <td>23.0</td> </tr> <tr> <th>361</th> <td>2018-12-28</td> <td>-3</td> <td>-11</td> <td>晴</td> <td>西北风</td> <td>3级</td> <td>40</td> <td>优</td> <td>1</td> <td>8</td> <td>低温</td> <td>12.2</td> <td>26.6</td> </tr> <tr> <th>362</th> <td>2018-12-29</td> <td>-3</td> <td>-12</td> <td>晴</td> <td>西北风</td> <td>2级</td> <td>29</td> <td>优</td> <td>1</td> <td>9</td> <td>低温</td> <td>10.4</td> <td>26.6</td> </tr> <tr> <th>363</th> <td>2018-12-30</td> <td>-2</td> <td>-11</td> <td>晴~多云</td> <td>东北风</td> <td>1级</td> <td>31</td> <td>优</td> <td>1</td> <td>9</td> <td>低温</td> <td>12.2</td> <td>28.4</td> </tr> <tr> <th>364</th> <td>2018-12-31</td> <td>-2</td> <td>-10</td> <td>多云</td> <td>东北风</td> <td>1级</td> <td>56</td> <td>良</td> <td>2</td> <td>8</td> <td>常温</td> <td>14.0</td> <td>28.4</td> </tr> </tbody> </table> <p>365 rows × 13 columns</p> </div> # 4. 按条件选择分组分别赋值 按条件先选择数据,然后对这部分数据赋值新列 实例:高低温差大于10度,则认为温差大 ```python # 先创建空列 (这是第一中创建新列的方法) df ["wencha_type"] ="" df.loc[df["bWendu"]-df["yWendu"]>10,"wencha_type"] = "温差大" df.loc[df["bWendu"]-df["yWendu"]<=10,"wencha_type"] = "温度正常" ``` ```python df["wencha_type"].value_counts() ``` 温度正常 187 温差大 178 Name: wencha_type, dtype: int64 这一篇的数据集在上一篇的文章末尾下载 最后修改:2022 年 12 月 14 日 © 允许规范转载 打赏 赞赏作者 支付宝微信 赞 如果觉得我的文章对你有用,请随意赞赏