Pandas merge two dataframes with different number of rows

Last update on August 19 2022 21:50:47 (UTC/GMT +8 hours)

Pandas Joining and merging DataFrame: Exercise-14 with Solution

Write a Pandas program to merge two given dataframes with different columns.

Table of Contents Show

Pandas Joining and merging DataFrame: Exercise-14 with Solution
Python: Tips of the Day
Concatenate Examples
Merge Examples
How do I merge two DataFrames with different number of rows?
Can you merge DataFrames of different lengths Pandas?
How do I merge two DataFrames in Pandas based on rows?
How do I merge two DataFrames with different indexes?

Test Data:

data1:
  key1 key2   P   Q
0   K0   K0  P0  Q0
1   K0   K1  P1  Q1
2   K1   K0  P2  Q2
3   K2   K1  P3  Q3

data2:
  key1 key2   R   S
0   K0   K0  R0  S0
1   K1   K0  R1  S1
2   K1   K0  R2  S2
3   K2   K0  R3  S3

Sample Solution:

Python Code :

import pandas as pd
data1 = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],
                     'key2': ['K0', 'K1', 'K0', 'K1'],
                     'P': ['P0', 'P1', 'P2', 'P3'],
                     'Q': ['Q0', 'Q1', 'Q2', 'Q3']}) 
data2 = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],
                      'key2': ['K0', 'K0', 'K0', 'K0'],
                      'R': ['R0', 'R1', 'R2', 'R3'],
                      'S': ['S0', 'S1', 'S2', 'S3']})
print("Original DataFrames:")
print(data1)
print("--------------------")
print(data2)
print("\nMerge two dataframes with different columns:")
result = pd.concat([data1,data2], axis=0, ignore_index=True)
print(result)

Sample Output:

Original DataFrames:
  key1 key2   P   Q
0   K0   K0  P0  Q0
1   K0   K1  P1  Q1
2   K1   K0  P2  Q2
3   K2   K1  P3  Q3
--------------------
  key1 key2   R   S
0   K0   K0  R0  S0
1   K1   K0  R1  S1
2   K1   K0  R2  S2
3   K2   K0  R3  S3

Merge two dataframes with different columns:
     P    Q    R    S key1 key2
0   P0   Q0  NaN  NaN   K0   K0
1   P1   Q1  NaN  NaN   K0   K1
2   P2   Q2  NaN  NaN   K1   K0
3   P3   Q3  NaN  NaN   K2   K1
4  NaN  NaN   R0   S0   K0   K0
5  NaN  NaN   R1   S1   K1   K0
6  NaN  NaN   R2   S2   K1   K0
7  NaN  NaN   R3   S3   K2   K0

Python Code Editor:

Have another way to solve this solution? Contribute your code (and comments) through Disqus.

Previous: Write a Pandas program to combine the columns of two potentially differently-indexed DataFrames into a single result DataFrame.
Next: Write a Pandas program to Combine two DataFrame objects by filling null values in one DataFrame with non-null values from other DataFrame.

Python: Tips of the Day

Using yield to create a simple iterator:

>>> def foo(lst):
>>>   for x in lst:
>>>       yield x
>>>       yield x*2

>>> a = [1, 3]
>>> list(foo(a))
[1, 2, 3, 6]

Topics covered in this section:

Join and merge options available
Data used in this example
Concatenate examples

Concatenate (stack) by rows
Concatenate (stack) by columns

Merge examples

Merge and keep matching observations only
Merge and keep all observations
Merge and keep matching observations and all observations from the left data frame
Merge and keep matching observations and all observations from the right data frame
Merging on different columns with unique values

There are a few methods you can use to combine data frames in Python. These methods are .concat(), and .merge(). Both of them are apart of the Pandas library. So let’s import Pandas!

pandas.concat() method

The pandas.concat() method combines two data frames by stacking them on top of each other. If one of the data frames does not contain a variable column or variable rows, observations in that data frame will be filled with NaN values.

pandas.merge() Method

The pandas.merge() method joins two data frames by a “key” variable that contains unique values.

With pandas.merge(), you can only combine 2 data frames at a time. If you have more than 2 data frames to merge, you will have to use this method multiple times. Let’s go over pandas.merge() and some of the available arguments to pass. Here is the general structure and the recommended bare minimum arguments to pass. This method has more argument to pass if desired.

pandas.merge(left_data_frame, right_data_frame, on= , how= )

left is one of the data frames
right is the other data frame
on is the variable, a.k.a the column, you want to merge on. This variable has to have the same name in both data frames.

If the data frames has different column names for the merge variables you can use left_on and right_on.

left_on is the variable name in the left data frame to be merged on
right_on is the variable name in the left data frame to be merged on

how is where you pass the options of merging. These include:

“inner”, where only the observations with matching values based on the “on” argument that is passed are kept.
“left”, where all observations will be kept from the data frame in the left argument regardless if there is matching values with the data frame in the right argument. Observations that do not have a matching value based on the on argument in the “right” data frame will be discarded.
“right”, where all observations will be kept from the data frame in the right argument regardless if there is matching values with the data frame in the left argument. Observations that do not have a matching value based on the on argument in the “left” data frame will be discarded.
“outer”, all observations will be kept from both data frames.

Data used for Examples

Normally, we would use a real data set for our examples. However, for this section we will create a few data sets so it will be easier to demonstrate what is occurring.

Creating 4 data frames

dataframe1 = pd.DataFrame({'ID': ['0011','0013','0014','0016','0017'], 
                           'First Name': ['Joseph', 'Mike', 'Jordan', 'Steven', 'Susan']})
dataframe2 = pd.DataFrame({'ID': ['0010','0011','0013','0014','0017'], 
                           'Last Name': ['Gordan', 'Johnson', 'Might' , 'Jackson', 'Shack']})
dataframe3 = pd.DataFrame({'ID': ['0020','0022','0025'],
                           'First Name': ['Adam', 'Jackie', 'Sue']})
dataframe4 = pd.DataFrame({'Key': ['0020','0022','0025'],
                           'Scores': [95, 90, 80]})

	First Name	ID
0	Joseph	0011
1	Mike	0013
2	Jordan	0014
3	Steven	0016
4	Susan	0017

	ID	Last Name
0	0010	Gordan
1	0011	Johnson
2	0013	Might
3	0014	Jackson
4	0017	Shack

	First Name	ID
0	Adam	0020
1	Jackie	0022
2	Sue	0025

	Key	Scores
0	0020	95
1	0022	90
2	0025	80

Concatenate Examples

With pandas.concat(), you pass the data frames that you want to join as a list. You can pass as many as you need to join. Remember, this method joins the data frames by rows (stacking them on top of each other) by default. To join them by columns (put the data frames side to side), you pass the “axis= 1” argument.

Concatenation Example 1. Joining data frames by rows (stacking one on top of another)

If you were to join data frames by rows with an uneven number of columns, i.e. one data frame has more columns than the other, the data frame without the columns will have “NaN” values filled in where data is missing. This will be shown in example 2.

new_concat_ROWS_dataframe = pd.concat([dataframe1, dataframe3])

new_concat_ROWS_dataframe

	First Name	ID
0	Joseph	0011
1	Mike	0013
2	Jordan	0014
3	Steven	0016
4	Susan	0017
0	Adam	0020
1	Jackie	0022
2	Sue	0025

Look at the index, notice how Python kept the original index from both data frames and just stacked them on top of each other as well. If you wish for a new index starting at 0, pass the “ignore_index” argument as “true”.

new_concat_ROWS_dataframe = pd.concat([dataframe1, dataframe3], ignore_index= "true")

new_concat_ROWS_dataframe

	First Name	ID
0	Joseph	0011
1	Mike	0013
2	Jordan	0014
3	Steven	0016
4	Susan	0017
5	Adam	0020
6	Jackie	0022
7	Sue	0025

Concatenation Example 2. Joining data frames by columns (stacking them side by side)

Passing the “axis= 1” argument will join the data frames by columns, placing the data frames next to each other. If one data frame has more rows than the other, the data frame that has less rows will be filled with “NaN” values where the extra rows will occur.

new_concat_COL_dataframe = pd.concat([dataframe1, dataframe3], axis=1)

new_concat_COL_dataframe

	First Name	ID	First Name	ID
0	Joseph	0011	Adam	0020
1	Mike	0013	Jackie	0022
2	Jordan	0014	Sue	0025
3	Steven	0016	NaN	Nan
4	Susan	0017	NaN	Nan

In this situation, it’s clear that stacking the data frames on top of each other (joining by rows) is what we want. The second example would make for a horrible data frame.

Merge Examples

Merge Example 1. Keep matching observations only

Passing the “how = ‘inner'” argument will keep observations that have a match on the merge variable in both data frames.

new_merged_dataframe = pd.merge(dataframe1, dataframe2, on= "ID", how= "inner")

new_merged_dataframe

	First Name	ID	Last Name
0	Joseph	0011	Johnson
1	Mike	0013	Might
2	Jordan	0014	Jackson
3	Susan	0017	Shack

Merge Example 2. Keep all observations

Passing the “how = ‘outer'” argument will keep all observations from both data frames.

new_OUTER_merged_dataframe = pd.merge(dataframe1, dataframe2, on= "ID", how= "outer")

new_OUTER_merged_dataframe

	First Name	ID	Last Name
0	Joseph	0011	Johnson
1	Mike	0013	Might
2	Jordan	0014	Jackson
3	Steven	0016	NaN
4	Susan	0017	Shack
5	Nan	0010	Gordan

Merge Example 3. Keeping match observations and all observations from the left data frame

Passing the “how= ‘left'” argument will keep all observations in the data frame that is being passed in the left argument regardless if there is a matching value in the data frame that is being passed in the right argument. Rows without a matching merge variable from the right data frame will be discarded.

new_LEFT_merged_dataframe = pd.merge(dataframe1, dataframe2, on= "ID", how= "left")

new_LEFT_merged_dataframe

	First Name	ID	Last Name
0	Joseph	0011	Johnson
1	Mike	0013	Might
2	Jordan	0014	Jackson
3	Steven	0016	NaN
4	Susan	0017	Shack

Merge Example 4. Keeping match observations and all observations from the right data frame

Passing the “how= ‘right'” argument will keep all observations in the data frame that is being passed in the right argument regardless if there is a matching value in the data frame that is being passed in the left argument. Rows without a matching merge variable from the left data frame will be discarded.

new_RIGHT_merged_dataframe = pd.merge(dataframe1, dataframe2, on= "ID", how= "right")

new_RIGHT_merged_dataframe

	First Name	ID	Last Name
0	Joseph	0011	Johnson
1	Mike	0013	Might
2	Jordan	0014	Jackson
3	Susan	0017	Shack
4	NaN	0010	Gordan

Merge Example 5. Merging on different columns with unique values

If the two data frames each contain the unique identifier, but are stored under different columns, you can merge using the left_on and right_on arguments. If going this route, you have to pass both arguments. The general structure is left_on = column_name_with_unique_identifier, right_on = column_name_with_unique_identifier.

Note: Merging data frames this way will keep both columns stated in the left_on and right_on arguments.

merged_dataframe = pd.merge(dataframe3, dataframe4, left_on= "ID", right_on= "Key", how= "inner")

merged_dataframe

	First Name	ID	Key	Scores
0	Adam	0020	0020	95
1	Jackie	0022	0022	90
2	Sue	0025	0025	80

There is more useful information and advanced concatenation/ merging methods on the official documentation page.

How do I merge two DataFrames with different number of rows?

The concat() function can be used to concatenate two Dataframes by adding the rows of one to the other. The merge() function is equivalent to the SQL JOIN clause. 'left', 'right' and 'inner' joins are all possible.

Can you merge DataFrames of different lengths Pandas?

It can be done using the merge() method.

How do I merge two DataFrames in Pandas based on rows?

The concat() function in pandas is used to append either columns or rows from one DataFrame to another. The concat() function does all the heavy lifting of performing concatenation operations along an axis while performing optional set logic (union or intersection) of the indexes (if any) on the other axes.

How do I merge two DataFrames with different indexes?

How to Merge Two Pandas DataFrames on Index.

Use join: By default, this performs a left join. df1. join(df2).

Use merge. By default, this performs an inner join. pd. merge(df1, df2, left_index=True, right_index=True).

Use concat. By default, this performs an outer join..

Pandas merge two dataframes with different number of rows

Pandas Joining and merging DataFrame: Exercise-14 with Solution

Python: Tips of the Day

pandas.concat() method

pandas.merge() Method

Data used for Examples

Concatenate Examples

Merge Examples

How do I merge two DataFrames with different number of rows?

Can you merge DataFrames of different lengths Pandas?

How do I merge two DataFrames in Pandas based on rows?

How do I merge two DataFrames with different indexes?

Related Posts

Merge multiple excel sheets into one workbook online free

How to make planted flowers in merge mansion

Advertising

LATEST NEWS

After how many weeks pregnancy is safe

Icd 10 major depressive disorder recurrent moderate

Where is the account number on a td bank check

Blood in discharge but not period reddit

Bank of america home affordable mortgage program

2005 jeep grand cherokee 3 inch lift kit

Cooking frozen salmon fillets in air fryer

If a friend drives my car are they insured

Best place to see redwoods in san francisco

Travel requirements to st thomas virgin islands

Advertising

Populer

Advertising

About

Legal

Help

Social