/powertools

Python Power Tools

Primary LanguagePythonApache License 2.0Apache-2.0

PowerTools

PowerTools is a utility library designed to simplify and enhance your experience with Python, Apache Spark, and AWS Glue Spark. It provides a collection of tools and functions to streamline your data processing workflows.

Table of Contents

Installation

You can install PowerTools using pip:

pip install powertools

Usage

Quick Start

from lps_glue import LPSGlue

with LPSGlue(spark_shell=True) as lpsglue:
    df = lpsglue.read.csv(path)   # Read data from CSV
    df = lpsglue.tran.add_column(df, 'example_col1', f.lit('example'))  # Add column
    lpsglue.write.hudi(
        df=df,
        path=path,
        primary_key='pk1',
        partition_by=["part1", "part2"]
        order_by='ts',
        dedup=False
    ) # Write df in HUDI format

Python Utilities

*Work In Progress:*
  1. data manipulation using pandas
  2. parallelization using concurrent.futures
  3. and more. Stay tuned for updates!

Spark Utilities

*Coming Soon*

Glue Spark Utilities

There are 5 main modules available in Glue Spark Utilities.

1. Read

Read data in ANY format using Spark without dependencies installation.

CSV

  lpsglue.read.csv(path=filename)
PARQUET
  lpsglue.read.parquet(path=filename)

HUDI

  lpsglue.read.hudi(path=filename)

DELTA LAKE

  lpsglue.read.delta(path=filename)

2. Tran

3. Write

4. Log

5. AWS

Contributing

We welcome contributions to PowerTools! If you have any ideas, suggestions, or bug reports, please open an issue or submit a pull request on our GitHub repository.

License

PowerTools is licensed under the MIT License.