Description:
If you are configuring an Airflow cluster in QDS, here are some things to check while troubleshooting:
-
Airflow installation - Airflow is installed inside a virtual environment location at “/usr/lib/python27/virtualenv”. Airflow requires python 2.7 which is available only in this virtualenv inside Qubole Cluster AMIs. Activate the virtualenv before invoking the airflow cli.
-
Location of service logs - Logs of Airflow services e.g scheduler, webserver, celery etc. are available at “/media/ephemeral0/logs/airflow”. These services are instantiated during cluster bringup, so these logs can be referred for any troubleshooting while bringing up the cluster.
-
Airflow Home - An environment variable $AIRFLOW_HOME is permanently set to “/usr/lib/airflow” for all machine users. Airflow configuration file (airflow.cfg), dags ( “/dags” folder), logs (“ /logs” folder) are present inside AIRFLOW_HOME folder. Please note that logs of the jobs triggered by airflow are available at $AIRFLOW_HOME/logs
-
Scheduler - sudo monit restart scheduler
Webserver - sudo monit restart webserver
Celery workers - sudo monit restart worker
Rabbitmq - sudo monit restart rabbitmq - Setup a cron to cleanup root partition space filled by task log. Edit the crontab - sudo crontab -e
Add the following line at the end and save - 0 0 * * * /bin/find $AIRFLOW_HOME/logs -type f -mtime +7 -exec rm -f {} \;
Comments