[JSON] Support Rename Fields for JSON operator
Closed this issue · 13 comments
Issue Description
Current State
- It is very difficult to manipulate JSON data with JSON operator.
Proposed Change
- Please fetch this JSON Schema to implement the functions.
- Manipulating JSON data
JSON schema pseudo code
JsonOperator:
Task: Rename fields
Input:
data:
type: object
description: Original data, which can be a JSON object or array of objects.
fields:
type: array
description: An array of objects specifying the fields to be renamed.
items:
type: object
properties:
currentField:
type: string
description: The field name in the original data to be replaced, supports nested paths if "supportDotNotation" is true.
newField:
type: string
description: The new field name that will replace the currentField, supports nested paths if "supportDotNotation" is true.
# supportDotNotation:
# type: boolean
# default: true
# description: Determines whether to interpret field names as paths using dot notation. If false, fields are treated as literal keys.
conflictResolution:
type: string
enum: [overwrite, skip, error]
default: overwrite
description: Defines how conflicts are handled when the newField already exists in the data.
Output:
data:
type: object
description: The modified data with the specified fields renamed.
Key Features:
conflictResolution
: Handling conflicts when renaming fields in JSON, especially when working with nested objects and dot notation, is critical to avoid data loss or unexpected behavior. Allow users to specify how they want conflicts to be resolved (e.g., via a parameter such as conflictResolution: 'overwrite'|'skip'|'error'
),
- Provides flexibility and control to the user.
- Adapts to different use cases.
Here are different strategies to manage conflicts and some considerations for each.
1. Overwrite the Existing Field (Default Behavior)
Description: If the newField
already exists in the object, overwrite its value with the value from currentField
.
Pros:
- Simple and straightforward.
- Useful when the intention is to replace the existing value.
Cons: - Can lead to data loss if not used carefully.
Implementation:
if new_key in obj:
obj[new_key] = obj.pop(current_key)
else:
obj[new_key] = obj.pop(current_key)
2. Skip the Renaming Operation
Description: If the newField
already exists, skip the renaming operation for that particular field.
Pros:
- Prevents accidental overwriting of data.
- Safeguards against potential conflicts without altering the existing data.
Cons: - The currentField remains unchanged, which might not be the desired outcome.
Implementation:
if new_key in obj:
# Skip renaming if new_key already exists
continue
else:
obj[new_key] = obj.pop(current_key)
3. Merge Values
Description: If both currentField
and newField
exist and contain objects or arrays, merge the two values. This approach is more complex but can be very powerful.
Pros:
- Preserves both sets of data.
- Useful for combining information rather than choosing one over the other.
Cons: - Can be complex to implement, especially if the data types of
currentField
andnewField
differ. - May require custom logic depending on how you want to merge the data (e.g., combining arrays, merging objects, etc.).
Implementation:
if new_key in obj:
if isinstance(obj[new_key], dict) and isinstance(obj[current_key], dict):
# Merge dictionaries
obj[new_key].update(obj.pop(current_key))
elif isinstance(obj[new_key], list) and isinstance(obj[current_key], list):
# Merge lists
obj[new_key].extend(obj.pop(current_key))
else:
# Handle other types (overwrite, append, etc.)
obj[new_key] = obj.pop(current_key)
else:
obj[new_key] = obj.pop(current_key)
4. Rename with a Suffix or Prefix
Description: If the newField
already exists, rename the new field by appending a suffix or prefix (e.g., _1
, _conflict
) to avoid conflicts.
Pros:
- Both original and new data are preserved.
- Easy to track conflicts.
Cons: - The resulting data structure may become less predictable or harder to work with if many conflicts occur.
Implementation:
suffix = 1
original_new_key = new_key
while new_key in obj:
new_key = f"{original_new_key}_{suffix}"
suffix += 1
obj[new_key] = obj.pop(current_key)
5. Return an Error or Warning
Description: If a conflict is detected, stop the operation and return an error or warning to the user. This forces the user to address the conflict before proceeding.
Pros:
- Prevents accidental data overwriting.
- Makes the user aware of potential issues immediately.
Cons: - Halts the process, which might be undesirable in automated workflows.
Implementation:
if new_key in obj:
raise ValueError(f"Conflict detected: '{new_key}' already exists.")
else:
obj[new_key] = obj.pop(current_key)
Summary:
- Overwrite: Simple and effective, but can lead to data loss.
- Skip: Safe but may leave data unchanged.
- Error/Warning: Forces user intervention; best for critical operations.
Choose the strategy that best aligns with your application's needs and the user's expectations. Implementing a combination of these strategies, such as providing a default behavior with options for customization, can offer the best balance between usability and robustness.
Example Usage:
Scenario: Input data as JSON object
// input
{
"data": {
"name": "John Doe",
"age": 30,
"address": {
"street": "123 Main St",
"city": "Anytown",
"state": "CA"
},
"state": "conflict"
},
"fields": [
{"currentField": "address.street", "newField": "address.road"},
{"currentField": "state", "newField": "address.state"}
],
// "supportDotNotation": true,
"conflictResolution": "overwrite"
}
Conflict Resolution Scenarios:
1. Overwrite (Default):
- The state field in data would be moved to address.state, overwriting the existing address.state field.
- Final output:
{
"data": {
"name": "John Doe",
"age": 30,
"address": {
"road": "123 Main St",
"city": "Anytown",
"state": "conflict"
}
}
}
2. Skip:
- The renaming of state to address.state would be skipped, so both state and address.state remain unchanged.
- Final output:
{
"data": {
"name": "John Doe",
"age": 30,
"address": {
"road": "123 Main St",
"city": "Anytown",
"state": "CA"
},
"state": "conflict"
}
}
3. Error:
- The process would raise an error, stopping execution, because
address.state
already exists.
ValueError: Conflict detected: 'address.state' already exists.
Scenario: Input Data as an Array of Objects
If the input data is an array of objects, the logic needs to be adapted to handle each object in the array individually. The schema and the function would process each object within the array according to the specified fields and conflictResolution
rules.
Below is an example demonstrating how the "Rename Fields" operation would work with input data that is an array of objects.
Input
{
"data": [
{
"name": "John Doe",
"age": 30,
"address": {
"street": "123 Main St",
"city": "Anytown",
"state": "CA"
},
"contacts": [
{
"type": "email",
"value": "john.doe@example.com"
}
]
},
{
"name": "Jane Smith",
"age": 28,
"address": {
"street": "456 Oak St",
"city": "Othertown",
"state": "NY"
}
// Note: Jane Smith does not have a "contacts" field
}
],
"fields": [
{"currentField": "name", "newField": "fullName"},
{"currentField": "address.street", "newField": "address.road"},
{"currentField": "contacts.0.value", "newField": "contacts.0.contactInfo"},
{"currentField": "age", "newField": "yearsOld"}
],
// "supportDotNotation": true,
"conflictResolution": "skip"
}
Explanation:
- Field "name": The "name" field will be renamed to "fullName" for each object in the array.
- Field "address.street": The "street" field inside the "address" object will be renamed to "road" for each object.
- Field "contacts.0.value": The "value" field inside the first element of the "contacts" array will be renamed to "contactInfo" for the first object, but this step will be skipped for the second object because the "contacts" field does not exist.
- Field "age": The "age" field will be renamed to "yearsOld" for each object.
Output:
{
"data": [
{
"fullName": "John Doe",
"yearsOld": 30,
"address": {
"road": "123 Main St",
"city": "Anytown",
"state": "CA"
},
"contacts": [
{
"type": "email",
"contactInfo": "john.doe@example.com"
}
]
},
{
"fullName": "Jane Smith",
"yearsOld": 28,
"address": {
"road": "456 Oak St",
"city": "Othertown",
"state": "NY"
}
// The "contacts" field is not present, so no renaming occurs for "contacts.0.value"
}
]
}
Rules for the Component Hackathon
- Each issue will only be assigned to one person/team at a time.
- You can only work on one issue at a time.
- To express interest in an issue, please comment on it and tag @kuroxx, allowing the Instill AI team to assign it to you.
- Ensure you address all feedback and suggestions provided by the Instill AI team.
- If no commits are made within five days, the issue may be reassigned to another contributor.
- Join our Discord to engage in discussions and seek assistance in #hackathon channel. For technical queries, you can tag @chuang8511.
Component Contribution Guideline | Documentation | Official Go Tutorial
I am interested. Can I work on this issue?
Hey @Danbaba1 I have removed you as an assignee because there is no activity for the past 2 weeks 🙏 Please raise again if you are still working on it, thanks
I would like to give a try for this issue. Can you please assign me?
@AkashJana18 Sounds good, I have assigned it to you!
Hey @chuang8511 @ShihChun-H Could you please guide me on where to make the changes for implementing JSON manipulation with the JsonOperator schema? I haven’t worked with this tech stack before, so any pointers on relevant files, modules, or general structure would be very helpful. Thanks in advance!
I would like to work on this issue. Can you please assign it to me.
Hey @gagan-bhullar-tech I am already working on it would you like to collaborate?
@AkashJana18
Sorry, I put the wrong json schema.
Could you take a look on this?
We have built the task definition. So, what you only have to do is working on Golang implementation.
@chuang8511 so the Golang Implementation needs to be done in pipeline-backend repo?
@AkashJana18
Yes, please check the guideline.
Hey @AkashJana18 , how's it going?
I wanted to let you know that we will need a PR by the end of this week (8th Nov) since we are closing this event.
Please submit:
- Your PR for this ticket
- This Google form
to ensure your contribution is counted!
Alternatively, if you cannot complete this within the time frame but would still like to contribute, you are more than welcome to but please note it would not be within the scope of Hacktoberfest 2024.
Thank you and look forward to your contribution! ✨