Chef Ideas

We believe that the best way to build software is to do it in close collaboration with the people who use it. We invite you to submit your ideas using the form below. Please be sure to include the problem for which you are solving and the benefits of implementing the idea.

We do our best to implement as many Ideas as we can. Our Product team will evaluate all submitted ideas in a timely manner and will disposition each into one of the following categories: will integrate into the product roadmap, further research is needed, unlikely to implement.

Thanks for collaborating with us!

Automate HA Logging Should Be Improved for Troubleshooting

When an Automate HA cluster is failed over, the postgres cluster logging is minimal to non-existent.


The logging situation overall, for all modules should improve to the point of usability before Automate HA is marked GA.

  • For examples of good logging, see a standalone Chef Server's logging output

  • Or Chef Backend. For example, the replication lag indications in the logs

  • Sean Horn
  • Mar 28 2023
  • New
  • Attach files
  • Matt Gough commented
    29 Mar 03:32pm

    Automate HA pglogs from backend psql nodes are not captured by chef-automate gatherlog bundles. These logs are pretty essential to troubleshooting.

    On live system they are in directory:
    /hab/svc/automate-ha-postgresql/var/pg_log

    In the gatherlog bundle, the /hab/svc/automate-ha-postgresql/var/ directory is not captured:
    /hab/svc/automate-ha-postgresql > ls
    total 0

    drwxr-x---@ 15 user 1083951318 480 20 Mar 14:32 config

    drwxr-xr-x@ 12 user 1083951318 384 20 Mar 14:32 logs

  • Matt Gough commented
    29 Mar 09:04am

    Proper postgresql logs would be ideal as well. Can we ship PSQL with logging setup?
    https://www.loggly.com/use-cases/postgresql-logs-logging-setup-and-troubleshooting/

  • Sean Horn commented
    28 Mar 04:49pm
    • Debug logging that applies to all services

    • Actual debug logging

    • Replication lag, so customers can choose the correct system to fail over to, or none at all

    • Per request output for services. This could be flagged by debug logging

  • +7