Как найти сумму столбца в python

  • Редакция Кодкампа

17 авг. 2022 г.
читать 1 мин


Часто вас может заинтересовать вычисление суммы одного или нескольких столбцов в кадре данных pandas. К счастью, вы можете легко сделать это в pandas, используя функцию sum() .

В этом руководстве показано несколько примеров использования этой функции.

Пример 1: найти сумму одного столбца

Предположим, у нас есть следующие Pandas DataFrame:

import pandas as pd
import numpy as np

#create DataFrame
df = pd.DataFrame({'rating': [90, 85, 82, 88, 94, 90, 76, 75, 87, 86],
 'points': [25, 20, 14, 16, 27, 20, 12, 15, 14, 19],
 'assists': [5, 7, 7, 8, 5, 7, 6, 9, 9, 5],
 'rebounds': [np.nan, 8, 10, 6, 6, 9, 6, 10, 10, 7]})

#view DataFrame 
df

 rating points assists rebounds
0 90 25 5 NaN
1 85 20 7 8
2 82 14 7 10
3 88 16 8 6
4 94 27 5 6
5 90 20 7 9
6 76 12 6 6
7 75 15 9 10
8 87 14 9 10
9 86 19 5 7

Мы можем найти сумму столбца под названием «баллы», используя следующий синтаксис:

df['points']. sum ()

182

Функция sum() также будет исключать NA по умолчанию. Например, если мы найдем сумму столбца «рикошеты», первое значение «NaN» будет просто исключено из расчета:

df['rebounds']. sum ()

72.0

Пример 2. Найдите сумму нескольких столбцов

Мы можем найти сумму нескольких столбцов, используя следующий синтаксис:

#find sum of points and rebounds columns
df[['rebounds', 'points']]. sum ()

rebounds 72.0
points 182.0
dtype: float64

Пример 3: найти сумму всех столбцов

Мы также можем найти сумму всех столбцов, используя следующий синтаксис:

#find sum of all columns in DataFrame
df.sum ()

rating 853.0
points 182.0
assists 68.0
rebounds 72.0
dtype: float64

Для столбцов, которые не являются числовыми, функция sum() просто не будет вычислять сумму этих столбцов.

Вы можете найти полную документацию по функции sum() здесь .

I can sum the items in column zero fine. But where do I change the code to sum column 2, or 3, or 4 in the matrix?
I’m easily stumped.

def main():
    matrix = []

    for i in range(2):
        s = input("Enter a 4-by-4 matrix row " + str(i) + ": ") 
        items = s.split() # Extracts items from the string
        list = [ eval(x) for x in items ] # Convert items to numbers   
        matrix.append(list)

    print("Sum of the elements in column 0 is", sumColumn(matrix))

def sumColumn(m):
    for column in range(len(m[0])):
        total = 0
        for row in range(len(m)):
            total += m[row][column]
        return total

main()

lvc's user avatar

lvc

34k9 gold badges72 silver badges98 bronze badges

asked Apr 18, 2014 at 0:40

vonbraun's user avatar

numpy could do this for you quite easily:

def sumColumn(matrix):
    return numpy.sum(matrix, axis=1)  # axis=1 says "get the sum along the columns"

Of course, if you wanted do it by hand, here’s how I would fix your code:

def sumColumn(m):
    answer = []
    for column in range(len(m[0])):
        t = 0
        for row in m:
            t += row[column]
        answer.append(t)
    return answer

Still, there is a simpler way, using zip:

def sumColumn(m):
    return [sum(col) for col in zip(*m)]

answered Apr 18, 2014 at 0:49

inspectorG4dget's user avatar

inspectorG4dgetinspectorG4dget

109k27 gold badges147 silver badges238 bronze badges

3

One-liner:

column_sums = [sum([row[i] for row in M]) for i in range(0,len(M[0]))]

also

row_sums = [sum(row) for row in M]

for any rectangular, non-empty matrix (list of lists) M. e.g.

>>> M = [[1,2,3],
>>>     [4,5,6],
>>>     [7,8,9]]
>>>
>>> [sum([row[i] for row in M]) for i in range(0,len(M[0]))]
[12, 15, 18] 
>>> [sum(row) for row in M]
[6, 15, 24]

answered Jan 30, 2015 at 17:55

ChrisW's user avatar

ChrisWChrisW

1,26512 silver badges12 bronze badges

1

Here is your code changed to return the sum of whatever column you specify:

def sumColumn(m, column):
    total = 0
    for row in range(len(m)):
        total += m[row][column]
    return total

column = 1
print("Sum of the elements in column", column, "is", sumColumn(matrix, column))

answered Apr 18, 2014 at 1:08

user3286261's user avatar

user3286261user3286261

3913 silver badges7 bronze badges

To get the sum of all columns in the matrix you can use the below python numpy code:

matrixname.sum(axis=0)

Karol Dowbecki's user avatar

answered Oct 30, 2018 at 10:43

Vaka Chiranjeevi's user avatar

1

import numpy as np
np.sum(M,axis=1)

where M is the matrix

Buddy's user avatar

Buddy

10.9k5 gold badges41 silver badges58 bronze badges

answered Nov 27, 2018 at 22:47

Fernando's user avatar

This can be made easier if you represent the matrix as a flat array:

m = [
    1,2,3,4,
    10,11,12,13,
    100,101,102,103,
    1001,1002,1003,1004
]

def sum_column(m, n):
    return sum(m[i] for i in range(n, 4 * 4, 4))

answered Apr 18, 2014 at 0:50

michaelmeyer's user avatar

michaelmeyermichaelmeyer

7,9556 gold badges30 silver badges35 bronze badges

How do I add up all of the values of a column in a python array? Ideally I want to do this without importing any additional libraries.

input_val = [[1, 2, 3, 4, 5],
             [1, 2, 3, 4, 5],
             [1, 2, 3, 4, 5]]

output_val = [3, 6, 9, 12, 15]

I know I this can be done in a nested for loop, wondering if there was a better way (like a list comprehension)?

Stephen Rauch's user avatar

Stephen Rauch

47.4k31 gold badges105 silver badges134 bronze badges

asked Apr 17, 2017 at 21:04

Alexander's user avatar

0

zip and sum can get that done:

Code:

[sum(x) for x in zip(*input_val)]

zip takes the contents of the input list and transposes them so that each element of the contained lists is produced at the same time. This allows the sum to see the first elements of each contained list, then next iteration will get the second element of each list, etc…

Test Code:

input_val = [[1, 2, 3, 4, 5],
             [1, 2, 3, 4, 5],
             [1, 2, 3, 4, 5]]

print([sum(x) for x in zip(*input_val)])

Results:

[3, 6, 9, 12, 15]

answered Apr 17, 2017 at 21:08

Stephen Rauch's user avatar

Stephen RauchStephen Rauch

47.4k31 gold badges105 silver badges134 bronze badges

0

In case you decide to use any library, numpy easily does this:

np.sum(input_val,axis=0)

answered Apr 17, 2017 at 21:09

JavNoor's user avatar

JavNoorJavNoor

4022 silver badges11 bronze badges

2

You may also use sum with zip within the map function:

# In Python 3.x 
>>> list(map(sum, zip(*input_val)))
[3, 6, 9, 12, 15]
# explicitly type-cast it to list as map returns generator expression

# In Python 2.x, explicit type-casting to list is not needed as `map` returns list
>>> map(sum, zip(*input_val))
[3, 6, 9, 12, 15]

answered Apr 17, 2017 at 21:10

Moinuddin Quadri's user avatar

Moinuddin QuadriMoinuddin Quadri

46.3k12 gold badges95 silver badges125 bronze badges

Try this:

input_val = [[1, 2, 3, 4, 5],
         [1, 2, 3, 4, 5],
         [1, 2, 3, 4, 5]]

output_val = [sum([i[b] for i in input_val]) for b in range(len(input_val[0]))]

print output_val

answered Apr 17, 2017 at 21:12

Ajax1234's user avatar

Ajax1234Ajax1234

69.3k8 gold badges61 silver badges102 bronze badges

Please construct your array using the NumPy library:

import numpy as np

create the array using the array( ) function and save it in a variable:

 arr = np.array(([1, 2, 3, 4, 5],[1, 2, 3, 4, 5],[1, 2, 3, 4, 5]))

apply sum( ) function to the array specifying it for the columns by setting the axis parameter to zero:

arr.sum(axis = 0)

answered Aug 1, 2020 at 22:15

Zhannie's user avatar

ZhannieZhannie

1772 silver badges5 bronze badges

This should work:

[sum(i) for i in zip(*input_val)]

answered Apr 17, 2017 at 21:09

Alex's user avatar

AlexAlex

1,41213 silver badges26 bronze badges

I guess you can use:

import numpy as np
new_list = sum(map(np.array, input_val))

answered Apr 17, 2017 at 21:11

Pedro Lobito's user avatar

Pedro LobitoPedro Lobito

92.8k30 gold badges254 silver badges266 bronze badges

I think this is the most pythonic way of doing this

map(sum, [x for x in zip(*input_val)])

answered Apr 17, 2017 at 21:14

Asav Patel's user avatar

Asav PatelAsav Patel

1,0731 gold badge7 silver badges24 bronze badges

One-liner using list comprehensions: for each column (length of one row), make a list of all the entries in that column, and sum that list.

output_val = [sum([input_val[i][j] for i in range(len(input_val))]) 
                 for j in range(len(input_val[0]))]

answered Apr 17, 2017 at 21:10

Prune's user avatar

PrunePrune

76.5k14 gold badges58 silver badges80 bronze badges

Try this code. This will make output_val end up as [3, 6, 9, 12, 15] given your input_val:

input_val = [[1, 2, 3, 4, 5],
             [1, 2, 3, 4, 5],
             [1, 2, 3, 4, 5]]

vals_length = len(input_val[0])
output_val = [0] * vals_length # init empty output array with 0's
for i in range(vals_length): # iterate for each index in the inputs
    for vals in input_val:
        output_val[i] += vals[i] # add to the same index

print(output_val) # [3, 6, 9, 12, 15]

Al Sweigart's user avatar

Al Sweigart

11.3k10 gold badges63 silver badges92 bronze badges

answered Apr 17, 2017 at 21:11

LLL's user avatar

LLLLLL

3,5062 gold badges25 silver badges44 bronze badges

Using Numpy you can easily solve this issue in one line:

1: Input

input_val = [[1, 2, 3, 4, 5],
             [1, 2, 3, 4, 5],
             [1, 2, 3, 4, 5]]

2: Numpy does the math for you

np.sum(input_val,axis=0)

3: Then finally the results

array([ 3,  6,  9, 12, 15])

answered Nov 22, 2018 at 7:30

Tom Souza's user avatar

output_val=input_val.sum(axis=0)

this would make the code even simpler I guess

Stephen Rauch's user avatar

Stephen Rauch

47.4k31 gold badges105 silver badges134 bronze badges

answered Jan 28, 2018 at 1:26

Rop Shan's user avatar

1

You can use the sum function instead of np.sum simply.

input_val = np.array([[1, 2, 3, 4, 5],
         [1, 2, 3, 4, 5],
         [1, 2, 3, 4, 5]])
sum(input_val)

output: array([ 3,  6,  9, 12, 15])

answered Jun 21, 2022 at 7:40

Nadir's user avatar

In today’s recipe we’ll touch on the basics of adding numeric values in a pandas DataFrame.

We’ll cover the following cases:

  • Sum all rows of one or multiple columns
  • Sum by column name/label into a new column
  • Adding values by index
  • Dealing with nan values
  • Sum values that meet a certain condition

Creating the dataset

We’ll start by creating a simple dataset

# Python3
# import pandas into your Python environment.
import pandas as pd

# Now, let's create the dataframe 
budget = pd.DataFrame({"person": ["John", "Kim", "Bob"],
                        "quarter": [1, 1, 1] ,
                        "consumer_budg": [15000, 35000, 45000],
                         "enterprise_budg": [20000, 30000, 40000] })
budget.head()
person quarter consumer_budg enterprise_budg
0 John 1 15000 20000
1 Kim 1 35000 30000
2 Bob 1 45000 40000

How to sum a column? (or more)

For a single column we’ll simply use the Series Sum() method.

# one column
budget['consumer_budg'].sum()

95000

Also the DataFrame has a Sum() method, which we’ll use to add multiple columns:

#addingmultiple columns
cols = ['consumer_budg', 'enterprise_budg']
budget[cols].sum()

We’ll receive a Series objects with the results:

consumer_budg      95000
enterprise_budg    90000
dtype: int64

Sum row values into a new column

More interesting is the case that we want to compute the values by adding multiple column values in a specific row. See this simple example below

# using the column label names
budget['total_budget'] = budget['consumer_budg'] + budget['enterprise_budg']

We have created a new column as shown below:

person quarter consumer_budg enterprise_budg total_budget
0 John 1 15000 20000 35000
1 Kim 1 35000 30000 65000
2 Bob 1 45000 40000 85000

Note: We could have also used the loc method to subset by label.

Adding columns by index

We can also refer to the columns to sum by index, using the iloc method.

# by index
budget['total_budget'] = budget.iloc[:,2]+ budget.iloc[:,3]

Result will be similar as above

Sum with conditions

In this example, we would like to define a column named high_budget and populate it only if the total_budget is over the 80K threshold.

budget['high_budget'] = budget.query('consumer_budg + enterprise_budg > 80000')['total_budget']

Adding columns with null values

Here we might need a bit of pre-processing to get rid of the null values using fillna().

Let’s quickly create a sample dataset containing null values (see last row).

# with nan
import numpy as np
budget_nan = pd.DataFrame({"person": ["John", "Kim", "Bob", 'Court'],
                        "quarter": [1, 1, 1,1] ,
                        "consumer_budg": [15000, 35000, 45000, 50000],
                         "enterprise_budg": [20000, 30000, 40000, np.nan ] })
person quarter consumer_budg enterprise_budg high_budget
0 John 1 15000 20000.0 35000.0
1 Kim 1 35000 30000.0 65000.0
2 Bob 1 45000 40000.0 85000.0
3 Court 1 50000 NaN NaN

Now lets use the DataFrame fillna() method to mass override the null values with Zeros so that we can sum the column values.

budget_nan.fillna(0, inplace=True)
budget_nan['high_budget'] = budget_nan['consumer_budg'] + budget_nan['enterprise_budg']
budget_nan

Voi’la

person quarter consumer_budg enterprise_budg high_budget
0 John 1 15000 20000.0 35000.0
1 Kim 1 35000 30000.0 65000.0
2 Bob 1 45000 40000.0 85000.0
3 Court 1 50000 0.0 50000.0
  1. Метод получения суммы столбца
  2. “Совокупная сумма с групповой суммой”
  3. Метод получения суммы столбцов на основе Условия других Столбцов Значения

Как получить сумму колонки Pandas

Мы познакомимся с тем, как получить сумму Pandas DataFrame столбца, а также с такими методами, как вычисление кумулятивной суммы с groupby, и суммы столбцов фрейма данных на основе условных значений других столбцов.

Метод получения суммы столбца

Сначала мы создаем случайный массив, используя библиотеку NumPy, а затем получаем сумму каждого столбца, используя функцию sum().

import numpy as np
import pandas as pd

df = pd.DataFrame(
    np.random.randint(0,10,size=(10, 4)),
    columns=list('1234'))
print(df)
Total = df['1'].sum()
print ("Column 1 sum:",Total)
Total = df['2'].sum()
print ("Column 2 sum:",Total)
Total = df['3'].sum()
print ("Column 3 sum:",Total)
Total = df['4'].sum()
print ("Column 4 sum:",Total) 

Если вы запустите этот код, то получите следующий вывод (значение может быть разным в вашем случае),

   1  2  3  4
0  2  2  3  8
1  9  4  3  1
2  8  5  6  0
3  9  5  7  4
4  2  7  3  7
5  9  4  1  3
6  6  7  7  3
7  0  4  2  8
8  0  6  6  4
9  5  8  7  2
Column 1 sum: 50
Column 2 sum: 52
Column 3 sum: 45
Column 4 sum: 40

“Совокупная сумма с групповой суммой”

Мы можем получить кумулятивную сумму, используя метод групповых. Рассмотрим следующий Датафрейм со столбцами Date, Fruit и Sale:

import pandas as pd

df = pd.DataFrame(
    {
        'Date': 
             ['08/09/2018', 
              '10/09/2018', 
              '08/09/2018', 
              '10/09/2018'],
        'Fruit': 
             ['Apple', 
              'Apple', 
              'Banana', 
              'Banana'],
        'Sale':
             [34,
              12,
              22,
              27]
    })

Если мы хотим вычислить кумулятивную сумму Продажа за фрукт и для каждой даты мы можем это сделать:

import pandas as pd

df = pd.DataFrame(
    {
        'Date': 
             ['08/09/2018', 
              '10/09/2018', 
              '08/09/2018', 
              '10/09/2018'],
        'Fruit': 
             ['Apple', 
              'Apple', 
              'Banana', 
              'Banana'],
        'Sale':
             [34,
              12,
              22,
              27]
    })

print(df.groupby(by=['Fruit','Date']).sum().groupby(level=[0]).cumsum())

После запуска вышеуказанных кодов мы получим следующий вывод, который показывает кумулятивную сумму фруктов за каждую дату:

Fruit  Date         Sale
Apple  08/09/2018    34
       10/09/2018    46
Banana 08/09/2018    22
       10/09/2018    49
        

Метод получения суммы столбцов на основе Условия других Столбцов Значения

Этот метод обеспечивает функциональность получения суммы, если заданное условие истинно и замены суммы на заданное значение, если условие False. Рассмотрим следующий код

import numpy as np
import pandas as pd

df = pd.DataFrame(
    np.random.randn(5,3), 
    columns=list('xyz'))

df['sum'] = df.loc[df['x'] > 0,['x','y']].sum(axis=1)

df['sum'].fillna(0, inplace=True)
print(df)

В приведенном выше коде мы добавили новый столбец sum в DataFrame, который является суммой первых столбцов ['x', 'y'] если ['x'] больше чем 1, то мы заменяем sum на 0.

После запуска кода мы получим следующий вывод (значения могут быть изменены в вашем случае).

          x         y         z       sum
0 -1.067619  1.053494  0.179490  0.000000
1 -0.349935  0.531465 -1.350914  0.000000
2 -1.650904  1.534314  1.773287  0.000000
3  2.486195  0.800890 -0.132991  3.287085
4  1.581747 -0.667217 -0.182038  0.914530

Понравилась статья? Поделить с друзьями:
  • Как найти сотрудников в ростове
  • Сталкер тень чернобыля как исправить тени
  • Как найти кпд циклического процесса газа
  • Как найти все скопированные тексты на айфоне
  • Как найти свой номер снилс ребенка