Basic Airflow Troubleshooting in QDS

Description:

If you are configuring an Airflow cluster in QDS, here are some things to check while troubleshooting:

  1. Airflow installation - Airflow is installed inside a virtual environment location at “/usr/lib/python27/virtualenv”. Airflow requires python 2.7 which is available only in this virtualenv inside Qubole Cluster AMIs. Activate the virtualenv before invoking the airflow cli.

  2. Location of service logs - Logs of Airflow services e.g scheduler, webserver, celery etc. are available at “/media/ephemeral0/logs/airflow”. These services are instantiated during cluster bringup, so these logs can be referred for any troubleshooting while bringing up the cluster.

  3. Airflow Home - An environment variable $AIRFLOW_HOME is permanently set to “/usr/lib/airflow” for all machine users. Airflow configuration file (airflow.cfg), dags ( “/dags” folder), logs (“ /logs” folder) are present inside AIRFLOW_HOME folder. Please note that logs of the jobs triggered by airflow are available at $AIRFLOW_HOME/logs

  4.  

    Restarting Airflow services - 
    Scheduler - sudo monit restart scheduler
    Webserver - sudo monit restart webserver
    Celery workers - sudo monit restart worker
    Rabbitmq - sudo monit restart rabbitmq
  5. Setup a cron to cleanup root partition space filled by task log. Edit the crontab - sudo crontab -e
    Add the following line at the end and save - 0 0 * * * /bin/find $AIRFLOW_HOME/logs -type f -mtime +7 -exec rm -f {} \;
 
Have more questions? Submit a request

Comments

Powered by Zendesk