pandas-inspired Excel scripting for Office Scripts
Now that you’ve learned how to aggregate and summarize data within a DataFrame, you may often need to combine multiple DataFrames to enrich your analysis. Whether you are combining datasets based on shared keys, appending rows, or joining columns, the ability to merge DataFrames is essential in data processing.
frosts provides two powerful methods for combining DataFrames: .merge()
for key-based joins, and .concat()
for stacking data row-wise.
.merge(other: DataFrame, on: string[], how: "inner" | "left" | "outer" = "inner")
Merges the current DataFrame with another one based on key columns, similar to SQL joins.
other
: Another DataFrame
to merge withon
: Column name(s) used as the join key(s).how
: Type of join to perform.
"inner"
(default): Only keeps rows with matches in both DataFrames."left"
: Keeps all rows from the current DataFrame, matching where possible from other."outer"
: Keeps all rows from both DataFrames, filling in null for missing values.Let’s say we have two DataFrames
employees
EmployeeID | Name | Department |
---|---|---|
1 | Alice | HR |
2 | Bob | Engineering |
3 | Charlie | Marketing |
4 | Diana | Sales |
salaries
EmployeeID | Salary |
---|---|
2 | 90000 |
3 | 75000 |
4 | 68000 |
5 | 72000 |
employees.merge(salaries,["Employee ID"],"inner");
EmployeeID | Name | Department | Salary |
---|---|---|---|
2 | Bob | Engineering | 90000 |
3 | Charlie | Marketing | 75000 |
4 | Diana | Sales | 68000 |
Only rows where EmployeeID
exists in both DataFrames will appear.
employees.merge(salaries,["Employee ID"], "left");
EmployeeID | Name | Department | Salary |
---|---|---|---|
1 | Alice | HR | null |
2 | Bob | Engineering | 90000 |
3 | Charlie | Marketing | 75000 |
4 | Diana | Sales | 68000 |
Keeps all rows from employees
, adds data from salaries
when possible
EmployeeID | Name | Department | Salary |
---|---|---|---|
1 | Alice | HR | null |
2 | Bob | Engineering | 90000 |
3 | Charlie | Marketing | 75000 |
4 | Diana | Sales | 68000 |
5 | null | null | 72000 |
Keep all rows from both DataFrames, filling missing values with null
.
You can also join on multiple shared keys, for example the following join
const df1 = new DataFrame([
["EmployeeID", "Date", "HoursWorked"],
[101, "2024-01-01", 8],
[101, "2024-01-02", 7],
[102, "2024-01-01", 6]
]);
const df2 = new DataFrame([
["EmployeeID", "Date", "Project"],
[101, "2024-01-01", "Alpha"],
[101, "2024-01-02", "Beta"],
[103, "2024-01-01", "Gamma"]
]);
const result = df1.merge(df2, ["EmployeeID", "Date"], "inner");
Would result in this table
EmployeeID | Date | HoursWorked | Project |
---|---|---|---|
101 | 2024-01-01 | 8 | Alpha |
101 | 2024-01-02 | 7 | Beta |
This type of join is especially useful when working with time series data or logs and need to match both an ID and a timestamp, as shown here.
.validate_key(key: DataFrame, on: [string, string] | string, errors: "raise" | "return" = "raise")
Checks whether all join key values in the current DataFrame exist in the corresponding column of another DataFrame.
key
: A reference DataFrame (typically the lookup table or foreign key source).on
: Column(s) to match.
string
, the same column name is used in both DataFrames.[leftCol, rightCol]
, matches this[leftCol]
to key[rightCol]
.errors = "return"
: An array of missing values.errors = "raise"
: Throws an error and stops execution if mismatches are found.void
.âś… Use When:
1) Checking keys with the same column names
df.check_key(referenceTable, "ProjectID");
Validates that all ProjectIDs
in df
exist in referenceTable
.
2) Checking keys with different column names
df.check_key(referenceTable, ["UserID", "StaffID"]);
Checks if every UserID
in df
has a match in the StaffID
column of referenceTable
.
3) Failing fast on missing keys
df.check_key(referenceTable, "ProjectID", "raise");
If any ProjectID
is not found, would throw
KeyIncompleteError: The following values were not found in the selected key
[1234, 4567, 8910]
4) Graceful fallback
const missing = df.check_key(referenceTable, "ProjectID", "return");
if (missing?.length) {
// Return here instead of crashing main
// possibly send to PowerAutomate/logic app
return JSON.stringify(missing)
//Output: "[1234,4567,8910]"
}
.concat(other:DataFrame, columnSelection: ("inner"|"outer"|"left") = "outer")
The .concat()
method appends the rows of the other
DataFrame to the current one. It aligns columns based on the columnSelection mode
"outer"
includes all columns from both DataFrames, filling empties with null
(default)"inner"
includes only shared columns'left'
includes all columns from the first DataFrame
, filling missing values in the second with null
For the inputs tables
df1
Name | Age | Department |
---|---|---|
Alice | 30 | Sales |
Bob | 25 | Marketing |
df2
Name | Age | Location |
---|---|---|
Carol | 28 | New York |
Dave | 35 | Chicago |
df1.concat(df2)
Name | Age | Department | Location |
---|---|---|---|
Alice | 30 | Sales | null |
Bob | 25 | Marketing | null |
Carol | 28 | null | New York |
Dave | 35 | null | Chicago |
df1.concat(df2, "inner");
Name | Age |
---|---|
Alice | 30 |
Bob | 25 |
Carol | 28 |
Dave | 35 |
df1.concat(df2, "left");
Name | Age | Department |
---|---|---|
Alice | 30 | Sales |
Bob | 25 | Marketing |
Carol | 28 | null |
Dave | 35 | null |
✅ With your datasets successfully combined, the final step is often saving or sharing your results—let’s look at how to export and import DataFrames using Excel, CSV, and JSON.