microsoft/picologging

Picologging leaks memory when used in an ASGI Framework

Opened this issue · 9 comments

Hi, I've been using picologging in an ASGI Framework called Starlite. It looks like picologging creates a memory leak when used in this framework. If I switch to the Python stdlib logging module, the memory leak goes away.

I'm providing a reproducible example below. The script makes a request to an external API, validates the response data and then saves it to Postgres. I removed the code to save to postgres because all that's needed to exacerbate the memory leak is a logging statement in that function. This also helps showcase the issue since nothing is being done with the data.

To run this, you just need a REDIS and Postgres connection string, and the following dependancies:

pip install ...

  • starlite[full]
  • saq
  • uvicorn
  • sqlalchemy
  • python-dotenv
  • httpx

Assuming the python file is called main, you can run the script in terminal with: uvicorn main:app

And you'll see memory usage climb quickly within a few minutes. If you switch out picologging for standard Python logging module, the memory leak goes away.

import asyncio
import datetime
import os

import httpx
import picologging
import uvicorn
from dotenv import load_dotenv
from saq import CronJob, Queue, Worker
from sqlalchemy import Column, UniqueConstraint, create_engine
from sqlalchemy.dialects.postgresql import FLOAT, INTEGER, JSONB, TEXT
from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker, create_async_engine
from sqlalchemy.orm import declarative_base
from sqlalchemy.types import TIMESTAMP
from starlite import DTOFactory, LoggingConfig, Starlite, get
from starlite.plugins.sql_alchemy import (
    SQLAlchemyConfig,
    SQLAlchemyPlugin,
    SQLAlchemySessionConfig,
)

########################################################
#################### ENV & LOGGING #####################
load_dotenv()

DATABASE_URL = os.getenv("DATABASE_URL")
REDIS_URL = os.getenv("REDIS_URL")

logging_config = LoggingConfig(
    loggers={
        "app": {
            "level": "DEBUG",
            "handlers": ["queue_listener"],
            "propagate": False,
        }
    }
)

logger = picologging.getLogger("app")


########################################################
######## SQLALCHEMY CONFIG & DB MODELS  ########
Base = declarative_base()
engine = create_engine(url=DATABASE_URL, echo=True, echo_pool=True)

async_engine = create_async_engine(
    url=DATABASE_URL,
    future=True,
    echo=False,
    echo_pool=True,
    max_overflow=10,
    pool_size=5,
    pool_timeout=30,
    pool_recycle=300,
    pool_pre_ping=True,
    pool_use_lifo=True,  # use lifo to reduce the number of idle connections
)

async_session_factory: async_sessionmaker[AsyncSession] = async_sessionmaker(
    async_engine, expire_on_commit=True, autoflush=True
)

session_config = SQLAlchemySessionConfig(expire_on_commit=True, autoflush=True)

sqlalchemyconfig = SQLAlchemyConfig(
    dependency_key="async_session",
    engine_instance=async_engine,
    session_config=session_config,
    session_maker_app_state_key="db_session_maker",
)

dto_factory = DTOFactory(
    plugins=[
        SQLAlchemyPlugin(
            config=sqlalchemyconfig,
        )
    ]
)


async def upsert_sql(
    model,
    objs: list,
    conflict_mode: str = "do_nothing",
    index_elements: list = None,
    constraint: str = None,
) -> None:

    # I've removed the actual code to save to SQL here, because...
    logger.debug("Simply logging in this function causes a big memory leak")


class LUNCValidator(Base):
    __tablename__ = "lunc_validators"

    uid = Column(INTEGER, primary_key=True, nullable=False)
    last_updated = Column(TIMESTAMP(timezone=True), onupdate=datetime.datetime.now)
    operatorAddress = Column(TEXT, nullable=False)
    tokens = Column(FLOAT, nullable=False)
    delegatorShares = Column(FLOAT, nullable=False)
    upTime = Column(FLOAT, nullable=False)
    status = Column(TEXT, nullable=False)
    accountAddress = Column(TEXT, nullable=False)
    description = Column(JSONB, nullable=False)
    votingPower = Column(JSONB, nullable=False)
    commissionInfo = Column(JSONB, nullable=False)
    rewardsPool = Column(JSONB, nullable=False)
    selfDelegation = Column(JSONB, nullable=False)

    __table_args__ = (UniqueConstraint("operatorAddress"),)


DTO_LUNCVALIDATOR = dto_factory("DTO_LUNCVALIDATOR", LUNCValidator, exclude=["uid"])

Base.metadata.create_all(engine)

########################################################
############ Reusable HTTPX Requests Func ##############


async def fetcha(
    url: str,
    params: dict = None,
) -> None:
    async with httpx.AsyncClient() as client:
        resp = await client.get(url=url, params=params)

    results = resp.json()
    return results


########################################################
######## REQUEST JSON AND STORE TO SQL ################
async def make_request(ctx) -> None:
    logger.info("Starting to make request")

    url = "https://fcd.terrarebels.net/v1/staking/validators"
    data = await fetcha(url=url)

    logger.info(f"{len(data)} results in response")

    index_elements = ["operatorAddress"]

    # Validate data using Starlite DTO's (Pydantic behind the scenes)
    dtos = [DTO_LUNCVALIDATOR(**i) for i in data]

    # Save to SQL Function
    # Comment out this function and the memory leak becomes smaller... no idea why.
    await upsert_sql(
        LUNCValidator,
        dtos,
        conflict_mode="update",
        index_elements=index_elements,
    )


## SAQ CONFIG: SCHEDULES REQUEST EVERY 10 SECONDS

queue = Queue.from_url(REDIS_URL)

tb = CronJob(make_request, cron="* * * * * */10")  # Every 10 seconds
worker = Worker(queue=queue, functions=[], cron_jobs=[tb])


async def tasks_on_startup() -> None:
    logger.warning("Starting SAQ worker")

    asyncio.create_task(worker.start())
    asyncio.create_task(worker.schedule())

    logger.info("SAQ started and tasks scheduled")


async def tasks_on_shutdown() -> None:
    logger.warning("Stopping SAQ worker")

    asyncio.create_task(worker.abort())
    asyncio.create_task(worker.stop())

    logger.info("SAQ worker should be dead")


######## BASIC STARLITE APP #########


@get("/")
def hello_world() -> dict[str, str]:
    """Handler function that returns a greeting dictionary."""
    return {"hello": "world"}


app = Starlite(
    route_handlers=[hello_world],
    on_startup=[tasks_on_startup],
    on_shutdown=[tasks_on_shutdown],
    logging_config=logging_config,
    debug=True,
)

if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=8000)

s/Starlite/Litestar 😄

dkull commented

Also had a leak with picologging, all is good when swapped back to std logging.
The leak manifested when i passed a logger object deep into threaded + asyncio code. The leak only happened when i called a log function deep down, it didn't leak when the loglevel was higher than the message, or when i didn't call the log function.
Bashed my head against it for a few hours but couldn't crack it, probably leaking fd's and the gc can't collect it or something. Migrating back to std logging for now.

thanks for the report. Sorry I haven't replied earlier, I'll repro this and fix the leak. Hopefully its one of the ones we're tackling in #137

For the original sample, I was able to get it running, but I wasn't able to reproduce a memory leak. I wonder if it's the versions that are running? picologging 0.9.2 came with some memory fixes.

To be sure, I've written a minimal sample to use the Litestar queue logger handler (similar to the OP's sample). lite star has changed some of the APIs.

Under Memray for 100,000 HTTP calls, memory usage is stable--

screenshot 2023-09-26 at 10 52 17
from litestar import  Litestar, get, Request
from litestar.logging import LoggingConfig
from litestar.testing import TestClient


logging_config = LoggingConfig(
    loggers={
        "app": {
            "level": "DEBUG",
            "handlers": ["queue_listener"],
            "propagate": False,
        }
    }
)


@get("/")
def hello_world(request: Request) -> dict[str, str]:
    """Handler function that returns a greeting dictionary."""
    request.logger.info("No results in response")
    request.logger.debug("doing things...")
    return {"hello": "world"}


app = Litestar(
    route_handlers=[hello_world],
    logging_config=logging_config,
    debug=True,
)

if __name__ == "__main__":
    with TestClient(app=app) as client:
        for _ in range(100_000):
            response = client.get("/")

@dkull

The leak manifested when i passed a logger object deep into threaded + asyncio code. The leak only happened when i called a log function deep down, it didn't leak when the loglevel was higher than the message, or when i didn't call the log function.

Please could you give me some more detail? I believe everyone, I just can't reproduce a leak to identify where it's coming from.

Hey @tonybaloney thanks for looking into this. I'm not 100% certain what version of picologging I was using when I submitted this ticket, but I'm pretty sure it was 0.9.2. I don't think starlite (now called Litestar) had any version constraints on picologging, and I found this in an old requirements.txt file: picologging==0.9.2 for a starlite project I had.

No one else from the litestar project reported this issue with picologging, so there must be something unique about my usage. I was requesting large JSON responses in background tasks... 2-5MB or more, generally. Passing around the data and logging in each function exacerbated the leak.

Are you able to repro with the example I provided? That was made using starlite==1.51.12. You'll have to update the endpoint, it's been migrated since I reported this issue. Try this URL to get large JSON responses: https://fcd.terraclassic.community/v1/staking/validators

Let me know if you have any issues running it.

Try updating to 0.9.3, that has a few bug fixes.

Also, there was a memory leak in Litestar/Starlite related to the storage of large responses, but like you pointed out this only occurs when using picologging and not normal logging. Probably worth the update anyway. litestar-org/litestar#1137

@tonybaloney I just ran my original script with pico 0.9.3. I didn't have time to set up memray, but looking at activity monitor in macos, you can see memory consumption is pretty high after ~5 minutes of requests.

image

And for comparison, here's memory consumption with regular logging lib... I turned it on, ate dinner and came back after 30 minutes or so.

image

For reference, Starlite normally consumes around 90-125mb of memory at idle.

What version of python are you using for tests?
I noticed a leak after switching from python 3.8 to python 3.11. I am using piccologging with other web framework.

I've observed the memory leak on 3.10 and 3.11, I haven't tried earlier Python versions