How to Iterate Through a Dataframe in Python

How to Iterate Over Rows in a Pandas DataFrame

Discussing how to iterate over rows in pandas and why it's better to avoid it (if possible)

Giorgos Myrianthous

Introduction

Iterating over pandas DataFrames is definitely not a best practise and you should only consider doing so only when this is absolutely necessary and when you have exhausted every other possible option that is likely to be more elegant and efficient.

Iterating through pandas objects is generally slow. In many cases, iterating manually over the rows is not needed and can be avoided

— pandas documentation

In today's article, we will discuss how to avoid iterating t hrough DataFrames in pandas. We'll also go through a "checklist" that you may need to reference every time before choosing to go with an iterative approach. Additionally, we will explore how to do so in cases where no other option is suitable to your specific use-case. Lastly, we will discuss why you should avoid modifying pandas object while iterating over them.

Do you really need to iterate over rows?

As highlighted in the official pandas documentation, the iteration through DataFrames is very inefficient and it can usually be avoided. Usually, pandas newcomers are not familiar with the concept of vectorisation and are unaware that most operations in pandas should (and can) be performed in a non-iterative context.

Before attempting to iterate through pandas objects, you must first ensure that none of the options below suit the needs of your use-case:

  • Vectorisation over iteration: pandas comes with rich set of built-in methods whose performance is optimised. Most of the operations could potentially be performed using one of these methods. Additionally, you can even take a look at numpy and check whether any of its functions can be used in your context.
  • Applying a function to rows: A common requirement is definitely when it comes to apply a function to every row, which designed to work — say — over only one row at a time, and not on the full DataFrame or Series. In such cases, it's always best to use apply() method instead of iterating through the pandas object. For more details, you can refer to this section of the pandas documentation that explains how to apply your own or another library's functions to pandas objects.
  • Iterative manipulations: In case you need to perform iterative manipulations and at the same time performance is a concern, then you may have to take a look into cython or numba. For more details around these concepts you can read this section of the pandas documentation.
  • Printing a DataFrame: If you want to print out a DataFrame then simply use DataFrame.to_string() method in order to render the DataFrame to a console-friendly tabular output.

Iterating over the rows of a DataFrame

In case none of the above options will work for you, then you may still want to iterate through pandas objects. You can do so using either iterrows() or itertuples() built-in methods.

Before seeing both methods in action, let's create an example DataFrame that we'll use to iterate over.

            import pandas as pd                          
df = pd.DataFrame({
'colA': [1, 2, 3, 4, 5],
'colB': ['a', 'b', 'c', 'd', 'e'],
'colC': [True, True, False, True, False],
})
print(df)
colA colB colC
0 1 a True
1 2 b True
2 3 c False
3 4 d True
4 5 e False
  • pandas.DataFrame.iterrows( ) method is used to iterate over DataFrame rows as (index, Series) pairs. Note that this method does not preserve the dtypes across rows due to the fact that this method will convert each row into a Series. If you need to preserve the dtypes of the pandas object, then you should use itertuples() method instead.
            for index, row in              df.iterrows():
print(row['colA'], row['colB'], row['colC'])
1 a True
2 b True
3 c False
4 d True
5 e False
  • pandas.DataFrame.itertuples() method is used to iterate over DataFrame rows as namedtuples. In general, itertuples() is expected to be faster compared to iterrows() .
            for row in              df.itertuples():
print(row.colA, row.colB, row.colC)
1 a True
2 b True
3 c False
4 d True
5 e False

For more details regarding Named Tuples in Python, you can read the article below.

Modifying while iterating over rows

At this point, it's important to highlight that you should never modify a pandas DataFrame or Series you are iterating over. Depending on the data types of your pandas object, the iterator may return a copy of the object rather than a view. In this case, writing anything to a copy won't have the desired effect.

For instance, let's suppose we want to double the values of each row in colA. An iterative approach won't do the trick:

            for index, row in df.iterrows():
row['colA'] = row['colA'] * 2
print(df)
colA colB colC
0 1 a True
1 2 b True
2 3 c False
3 4 d True
4 5 e False

In similar use-cases, you should use apply() method instead.

            df['colA'] = df['colA'].apply(              lambda x: x * 2              )            print(df)
colA colB colC
0 2 a True
1 4 b True
2 6 c False
3 8 d True
4 10 e False

Final Thoughts

In today's article, we discussed why it's important to avoid iterative approaches while working with pandas objects and prefer vectorised or really any other approach that is suitable to your specific use-case.

pandas comes with a rich set of built-in methods which are optimized to work on large pandas objects and you should always favour these over any other iterative solution. In case you still want/have to iterate over a DataFrame or Series, you can use iterrows() or itertuples() methods.

Lastly, we discussed why you must always avoid modifying a pandas object that you are iterating through as this may not work as expected.

Become a member and read every story on Medium. Your membership fee directly supports me and other writers you read.

How to Iterate Through a Dataframe in Python

Source: https://towardsdatascience.com/how-to-iterate-over-rows-in-a-pandas-dataframe-6aa173fc6c84

0 Response to "How to Iterate Through a Dataframe in Python"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel