Cost control starts with a code commit

Why FinOps begins with developers

Cost optimisation is often seen as an operational task—left for DevOps or FinOps teams who analyze cloud bills long after the infrastructure is provisioned. However, true cost efficiency begins much earlier: with the code developers write and the architectural choices embedded in every commit.

Even the smallest decisions can have a significant impact on costs and create a bad influence on the team's culture and future development.

The bad habits of making temporary things "just for a while"

Recently, I was reviewing log patterns in Datadog, and I noticed a lot of log entries that didn't seem to be carrying any useful information. It was just a printout of the x-forwarded-for header without any additional context.

After digging into the application code, I found that it was committed almost 6 months ago for debugging purposes and was never removed.

def get_ip_from_request(request: Request) -> str:
    if h := request.headers.get("x-forwarded-for"):
        # tmp for debug
        logging.info(f"x-forwarded-for header: {h}")

        # IP addresses are in order (e.g. client, proxy1, proxy2)
        return h.split(",")[0].strip()
    return request.client.host

Why this is a problem at all and what are the consequences?

It all depends on the scale of the application. In this case, the get_ip_from_request function was called around 20 million times per month, which resulted in an additional 120 million log entries during the period it was in the codebase.

The most important thing is, of course, the costs. Datadog charges $3.75 per 1 million logs indexed with a 30-day retention period. So the cost of this temporary debugging code was around $75 per month, totaling $450 over the 6-month period.

And that's only for a 30-day retention period. If logs were kept for 60 days (the price for this is individual per contract, but it's ~1.5x the 30-day rate), the cost would be $112.5 per month and $675 for half a year. With a 90-day retention period (price per million is around 2.3x the 30-day rate), the cost would go up to $172.5 monthly and $1035 in total.

The second issue is log hygiene. Excessive noise leads to lower-quality logs. During incident response, it's much harder to find the root cause of a problem if there are a lot of logs that are not relevant to the incident. It's like finding a needle in a haystack.

What could be done better and how to prevent this from happening?

First of all, if such debug information is needed, it should be added with the proper log level, so it's not just a temporary code snippet that will be forgotten.

def get_ip_from_request(request: Request) -> str:
    if h := request.headers.get("x-forwarded-for"):
        logging.debug(f"x-forwarded-for header: {h}")
        return h.split(",")[0].strip()
    return request.client.host

Thanks to this, the log entries would be generated only when the debug level is enabled, and the costs would be significantly lower. Most likely, the costs would be close to zero, as DEBUG level logs are, in most cases, not enabled in production environments.

The other step is to have a safety net in place. Even if production is set to the DEBUG level, there should be filtering on the logging backend side that excludes such logs from indexing, allowing you only to live tail them.

It would be even better to have alerting set up for such cases. For example, if there are more than 1000 log entries with a DEBUG level from the same logger in a short period of time, an alert should be sent to the team.

Conclusion

To prevent these kinds of costly mistakes and stop money from leaking through unoptimized code, organizations must focus on three core areas:

Developer education: Engineers need to understand the financial impact of their code. With proper training on cloud pricing models and observability costs, they can take full responsibility for the architectural and debugging decisions they make.
Regular log reviews: Teams should hold recurring sessions to review log volumes, identify repeating or noisy patterns, and clean them up before they compound into massive bills.
Automated alerting: The most important step after remediating a costly pattern is to ensure it doesn't return. Setting up proper alerting for unusual spikes in log volume or specific log levels acts as a critical safety net.