17 авг. 2022 г.
читать 1 мин
Часто вас может заинтересовать вычисление суммы одного или нескольких столбцов в кадре данных pandas. К счастью, вы можете легко сделать это в pandas, используя функцию sum() .
В этом руководстве показано несколько примеров использования этой функции.
Пример 1: найти сумму одного столбца
Предположим, у нас есть следующие Pandas DataFrame:
import pandas as pd
import numpy as np
#create DataFrame
df = pd.DataFrame({'rating': [90, 85, 82, 88, 94, 90, 76, 75, 87, 86],
'points': [25, 20, 14, 16, 27, 20, 12, 15, 14, 19],
'assists': [5, 7, 7, 8, 5, 7, 6, 9, 9, 5],
'rebounds': [np.nan, 8, 10, 6, 6, 9, 6, 10, 10, 7]})
#view DataFrame
df
rating points assists rebounds
0 90 25 5 NaN
1 85 20 7 8
2 82 14 7 10
3 88 16 8 6
4 94 27 5 6
5 90 20 7 9
6 76 12 6 6
7 75 15 9 10
8 87 14 9 10
9 86 19 5 7
Мы можем найти сумму столбца под названием «баллы», используя следующий синтаксис:
df['points']. sum ()
182
Функция sum() также будет исключать NA по умолчанию. Например, если мы найдем сумму столбца «рикошеты», первое значение «NaN» будет просто исключено из расчета:
df['rebounds']. sum ()
72.0
Пример 2. Найдите сумму нескольких столбцов
Мы можем найти сумму нескольких столбцов, используя следующий синтаксис:
#find sum of points and rebounds columns
df[['rebounds', 'points']]. sum ()
rebounds 72.0
points 182.0
dtype: float64
Пример 3: найти сумму всех столбцов
Мы также можем найти сумму всех столбцов, используя следующий синтаксис:
#find sum of all columns in DataFrame
df.sum ()
rating 853.0
points 182.0
assists 68.0
rebounds 72.0
dtype: float64
Для столбцов, которые не являются числовыми, функция sum() просто не будет вычислять сумму этих столбцов.
Вы можете найти полную документацию по функции sum() здесь .
I can sum the items in column zero fine. But where do I change the code to sum column 2, or 3, or 4 in the matrix?
I’m easily stumped.
def main():
matrix = []
for i in range(2):
s = input("Enter a 4-by-4 matrix row " + str(i) + ": ")
items = s.split() # Extracts items from the string
list = [ eval(x) for x in items ] # Convert items to numbers
matrix.append(list)
print("Sum of the elements in column 0 is", sumColumn(matrix))
def sumColumn(m):
for column in range(len(m[0])):
total = 0
for row in range(len(m)):
total += m[row][column]
return total
main()
lvc
34k9 gold badges72 silver badges98 bronze badges
asked Apr 18, 2014 at 0:40
numpy could do this for you quite easily:
def sumColumn(matrix):
return numpy.sum(matrix, axis=1) # axis=1 says "get the sum along the columns"
Of course, if you wanted do it by hand, here’s how I would fix your code:
def sumColumn(m):
answer = []
for column in range(len(m[0])):
t = 0
for row in m:
t += row[column]
answer.append(t)
return answer
Still, there is a simpler way, using zip:
def sumColumn(m):
return [sum(col) for col in zip(*m)]
answered Apr 18, 2014 at 0:49
inspectorG4dgetinspectorG4dget
109k27 gold badges147 silver badges238 bronze badges
3
One-liner:
column_sums = [sum([row[i] for row in M]) for i in range(0,len(M[0]))]
also
row_sums = [sum(row) for row in M]
for any rectangular, non-empty matrix (list of lists) M
. e.g.
>>> M = [[1,2,3],
>>> [4,5,6],
>>> [7,8,9]]
>>>
>>> [sum([row[i] for row in M]) for i in range(0,len(M[0]))]
[12, 15, 18]
>>> [sum(row) for row in M]
[6, 15, 24]
answered Jan 30, 2015 at 17:55
ChrisWChrisW
1,26512 silver badges12 bronze badges
1
Here is your code changed to return the sum of whatever column you specify:
def sumColumn(m, column):
total = 0
for row in range(len(m)):
total += m[row][column]
return total
column = 1
print("Sum of the elements in column", column, "is", sumColumn(matrix, column))
answered Apr 18, 2014 at 1:08
user3286261user3286261
3913 silver badges7 bronze badges
To get the sum of all columns in the matrix you can use the below python numpy code:
matrixname.sum(axis=0)
answered Oct 30, 2018 at 10:43
1
import numpy as np
np.sum(M,axis=1)
where M is the matrix
Buddy
10.9k5 gold badges41 silver badges58 bronze badges
answered Nov 27, 2018 at 22:47
This can be made easier if you represent the matrix as a flat array:
m = [
1,2,3,4,
10,11,12,13,
100,101,102,103,
1001,1002,1003,1004
]
def sum_column(m, n):
return sum(m[i] for i in range(n, 4 * 4, 4))
answered Apr 18, 2014 at 0:50
michaelmeyermichaelmeyer
7,9556 gold badges30 silver badges35 bronze badges
How do I add up all of the values of a column in a python array? Ideally I want to do this without importing any additional libraries.
input_val = [[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5]]
output_val = [3, 6, 9, 12, 15]
I know I this can be done in a nested for loop, wondering if there was a better way (like a list comprehension)?
Stephen Rauch♦
47.4k31 gold badges105 silver badges134 bronze badges
asked Apr 17, 2017 at 21:04
0
zip
and sum
can get that done:
Code:
[sum(x) for x in zip(*input_val)]
zip
takes the contents of the input list and transposes them so that each element of the contained lists is produced at the same time. This allows the sum
to see the first elements of each contained list, then next iteration will get the second element of each list, etc…
Test Code:
input_val = [[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5]]
print([sum(x) for x in zip(*input_val)])
Results:
[3, 6, 9, 12, 15]
answered Apr 17, 2017 at 21:08
Stephen Rauch♦Stephen Rauch
47.4k31 gold badges105 silver badges134 bronze badges
0
In case you decide to use any library, numpy easily does this:
np.sum(input_val,axis=0)
answered Apr 17, 2017 at 21:09
JavNoorJavNoor
4022 silver badges11 bronze badges
2
You may also use sum
with zip
within the map
function:
# In Python 3.x
>>> list(map(sum, zip(*input_val)))
[3, 6, 9, 12, 15]
# explicitly type-cast it to list as map returns generator expression
# In Python 2.x, explicit type-casting to list is not needed as `map` returns list
>>> map(sum, zip(*input_val))
[3, 6, 9, 12, 15]
answered Apr 17, 2017 at 21:10
Moinuddin QuadriMoinuddin Quadri
46.3k12 gold badges95 silver badges125 bronze badges
Try this:
input_val = [[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5]]
output_val = [sum([i[b] for i in input_val]) for b in range(len(input_val[0]))]
print output_val
answered Apr 17, 2017 at 21:12
Ajax1234Ajax1234
69.3k8 gold badges61 silver badges102 bronze badges
Please construct your array using the NumPy library:
import numpy as np
create the array using the array( ) function and save it in a variable:
arr = np.array(([1, 2, 3, 4, 5],[1, 2, 3, 4, 5],[1, 2, 3, 4, 5]))
apply sum( ) function to the array specifying it for the columns by setting the axis parameter to zero:
arr.sum(axis = 0)
answered Aug 1, 2020 at 22:15
ZhannieZhannie
1772 silver badges5 bronze badges
This should work:
[sum(i) for i in zip(*input_val)]
answered Apr 17, 2017 at 21:09
AlexAlex
1,41213 silver badges26 bronze badges
I guess you can use:
import numpy as np
new_list = sum(map(np.array, input_val))
answered Apr 17, 2017 at 21:11
Pedro LobitoPedro Lobito
92.8k30 gold badges254 silver badges266 bronze badges
I think this is the most pythonic way of doing this
map(sum, [x for x in zip(*input_val)])
answered Apr 17, 2017 at 21:14
Asav PatelAsav Patel
1,0731 gold badge7 silver badges24 bronze badges
One-liner using list comprehensions: for each column (length of one row), make a list of all the entries in that column, and sum that list.
output_val = [sum([input_val[i][j] for i in range(len(input_val))])
for j in range(len(input_val[0]))]
answered Apr 17, 2017 at 21:10
PrunePrune
76.5k14 gold badges58 silver badges80 bronze badges
Try this code. This will make output_val
end up as [3, 6, 9, 12, 15]
given your input_val
:
input_val = [[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5]]
vals_length = len(input_val[0])
output_val = [0] * vals_length # init empty output array with 0's
for i in range(vals_length): # iterate for each index in the inputs
for vals in input_val:
output_val[i] += vals[i] # add to the same index
print(output_val) # [3, 6, 9, 12, 15]
Al Sweigart
11.3k10 gold badges63 silver badges92 bronze badges
answered Apr 17, 2017 at 21:11
LLLLLL
3,5062 gold badges25 silver badges44 bronze badges
Using Numpy you can easily solve this issue in one line:
1: Input
input_val = [[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5]]
2: Numpy does the math for you
np.sum(input_val,axis=0)
3: Then finally the results
array([ 3, 6, 9, 12, 15])
answered Nov 22, 2018 at 7:30
output_val=input_val.sum(axis=0)
this would make the code even simpler I guess
Stephen Rauch♦
47.4k31 gold badges105 silver badges134 bronze badges
answered Jan 28, 2018 at 1:26
1
You can use the sum function instead of np.sum simply.
input_val = np.array([[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5]])
sum(input_val)
output: array([ 3, 6, 9, 12, 15])
answered Jun 21, 2022 at 7:40
In today’s recipe we’ll touch on the basics of adding numeric values in a pandas DataFrame.
We’ll cover the following cases:
- Sum all rows of one or multiple columns
- Sum by column name/label into a new column
- Adding values by index
- Dealing with nan values
- Sum values that meet a certain condition
Creating the dataset
We’ll start by creating a simple dataset
# Python3
# import pandas into your Python environment.
import pandas as pd
# Now, let's create the dataframe
budget = pd.DataFrame({"person": ["John", "Kim", "Bob"],
"quarter": [1, 1, 1] ,
"consumer_budg": [15000, 35000, 45000],
"enterprise_budg": [20000, 30000, 40000] })
budget.head()
person | quarter | consumer_budg | enterprise_budg | |
---|---|---|---|---|
0 | John | 1 | 15000 | 20000 |
1 | Kim | 1 | 35000 | 30000 |
2 | Bob | 1 | 45000 | 40000 |
How to sum a column? (or more)
For a single column we’ll simply use the Series Sum() method.
# one column
budget['consumer_budg'].sum()
95000
Also the DataFrame has a Sum() method, which we’ll use to add multiple columns:
#addingmultiple columns
cols = ['consumer_budg', 'enterprise_budg']
budget[cols].sum()
We’ll receive a Series objects with the results:
consumer_budg 95000 enterprise_budg 90000 dtype: int64
Sum row values into a new column
More interesting is the case that we want to compute the values by adding multiple column values in a specific row. See this simple example below
# using the column label names
budget['total_budget'] = budget['consumer_budg'] + budget['enterprise_budg']
We have created a new column as shown below:
person | quarter | consumer_budg | enterprise_budg | total_budget | |
---|---|---|---|---|---|
0 | John | 1 | 15000 | 20000 | 35000 |
1 | Kim | 1 | 35000 | 30000 | 65000 |
2 | Bob | 1 | 45000 | 40000 | 85000 |
Note: We could have also used the loc method to subset by label.
Adding columns by index
We can also refer to the columns to sum by index, using the iloc method.
# by index
budget['total_budget'] = budget.iloc[:,2]+ budget.iloc[:,3]
Result will be similar as above
Sum with conditions
In this example, we would like to define a column named high_budget and populate it only if the total_budget is over the 80K threshold.
budget['high_budget'] = budget.query('consumer_budg + enterprise_budg > 80000')['total_budget']
Adding columns with null values
Here we might need a bit of pre-processing to get rid of the null values using fillna().
Let’s quickly create a sample dataset containing null values (see last row).
# with nan
import numpy as np
budget_nan = pd.DataFrame({"person": ["John", "Kim", "Bob", 'Court'],
"quarter": [1, 1, 1,1] ,
"consumer_budg": [15000, 35000, 45000, 50000],
"enterprise_budg": [20000, 30000, 40000, np.nan ] })
person | quarter | consumer_budg | enterprise_budg | high_budget | |
---|---|---|---|---|---|
0 | John | 1 | 15000 | 20000.0 | 35000.0 |
1 | Kim | 1 | 35000 | 30000.0 | 65000.0 |
2 | Bob | 1 | 45000 | 40000.0 | 85000.0 |
3 | Court | 1 | 50000 | NaN | NaN |
Now lets use the DataFrame fillna() method to mass override the null values with Zeros so that we can sum the column values.
budget_nan.fillna(0, inplace=True)
budget_nan['high_budget'] = budget_nan['consumer_budg'] + budget_nan['enterprise_budg']
budget_nan
Voi’la
person | quarter | consumer_budg | enterprise_budg | high_budget | |
---|---|---|---|---|---|
0 | John | 1 | 15000 | 20000.0 | 35000.0 |
1 | Kim | 1 | 35000 | 30000.0 | 65000.0 |
2 | Bob | 1 | 45000 | 40000.0 | 85000.0 |
3 | Court | 1 | 50000 | 0.0 | 50000.0 |
- Метод получения суммы столбца
- “Совокупная сумма с
групповой
суммой” - Метод получения суммы столбцов на основе Условия других Столбцов Значения
Мы познакомимся с тем, как получить сумму Pandas DataFrame столбца
, а также с такими методами, как вычисление кумулятивной суммы с groupby
, и суммы столбцов фрейма данных на основе условных значений других столбцов.
Метод получения суммы столбца
Сначала мы создаем случайный массив, используя библиотеку NumPy
, а затем получаем сумму каждого столбца, используя функцию sum()
.
import numpy as np
import pandas as pd
df = pd.DataFrame(
np.random.randint(0,10,size=(10, 4)),
columns=list('1234'))
print(df)
Total = df['1'].sum()
print ("Column 1 sum:",Total)
Total = df['2'].sum()
print ("Column 2 sum:",Total)
Total = df['3'].sum()
print ("Column 3 sum:",Total)
Total = df['4'].sum()
print ("Column 4 sum:",Total)
Если вы запустите этот код, то получите следующий вывод (значение может быть разным в вашем случае),
1 2 3 4
0 2 2 3 8
1 9 4 3 1
2 8 5 6 0
3 9 5 7 4
4 2 7 3 7
5 9 4 1 3
6 6 7 7 3
7 0 4 2 8
8 0 6 6 4
9 5 8 7 2
Column 1 sum: 50
Column 2 sum: 52
Column 3 sum: 45
Column 4 sum: 40
“Совокупная сумма с групповой
суммой”
Мы можем получить кумулятивную сумму, используя метод групповых
. Рассмотрим следующий Датафрейм со столбцами Date
, Fruit
и Sale
:
import pandas as pd
df = pd.DataFrame(
{
'Date':
['08/09/2018',
'10/09/2018',
'08/09/2018',
'10/09/2018'],
'Fruit':
['Apple',
'Apple',
'Banana',
'Banana'],
'Sale':
[34,
12,
22,
27]
})
Если мы хотим вычислить кумулятивную сумму Продажа за фрукт и для каждой даты мы можем это сделать:
import pandas as pd
df = pd.DataFrame(
{
'Date':
['08/09/2018',
'10/09/2018',
'08/09/2018',
'10/09/2018'],
'Fruit':
['Apple',
'Apple',
'Banana',
'Banana'],
'Sale':
[34,
12,
22,
27]
})
print(df.groupby(by=['Fruit','Date']).sum().groupby(level=[0]).cumsum())
После запуска вышеуказанных кодов мы получим следующий вывод, который показывает кумулятивную сумму фруктов за каждую дату:
Fruit Date Sale
Apple 08/09/2018 34
10/09/2018 46
Banana 08/09/2018 22
10/09/2018 49
Метод получения суммы столбцов на основе Условия других Столбцов Значения
Этот метод обеспечивает функциональность получения суммы, если заданное условие истинно
и замены суммы на заданное значение
, если условие False
. Рассмотрим следующий код
import numpy as np
import pandas as pd
df = pd.DataFrame(
np.random.randn(5,3),
columns=list('xyz'))
df['sum'] = df.loc[df['x'] > 0,['x','y']].sum(axis=1)
df['sum'].fillna(0, inplace=True)
print(df)
В приведенном выше коде мы добавили новый столбец sum
в DataFrame
, который является суммой первых столбцов ['x', 'y']
если ['x']
больше чем 1, то мы заменяем sum
на 0
.
После запуска кода мы получим следующий вывод (значения могут быть изменены в вашем случае).
x y z sum
0 -1.067619 1.053494 0.179490 0.000000
1 -0.349935 0.531465 -1.350914 0.000000
2 -1.650904 1.534314 1.773287 0.000000
3 2.486195 0.800890 -0.132991 3.287085
4 1.581747 -0.667217 -0.182038 0.914530