• Uncategorized

About linux : Airflow-on-Docker-Cant-Write-to-Volume-Permission-Denied

Question Detail

Goal

I’m trying to run a simple DAG which creates a pandas DataFrame and writes to a file. The DAG is being run in a Docker container with Airflow, and the file is being written to a named volume.

Problem

When I start the container, I get the error:

Broken DAG: [/usr/local/airflow/dags/simple_datatest.py] [Errno 13] Permission denied: '/usr/local/airflow/data/local_data_input.csv'

Question

Why am I getting this error? And how can I fix this so that it writes properly?

Context

I am loosely following a tutorial here, but I’ve modified the DAG. I’m using the puckel/docker-airflow image from Docker Hub. I’ve attached a volume pointing to the appropriate DAG, and I’ve created another volume to contain the data written within the DAG (created by running docker volume create airflow-data).

The run command is:

docker run -d -p 8080:8080 \
-v /path/to/local/airflow/dags:/usr/local/airflow/dags \
-v airflow-data:/usr/local/airflow/data:Z \
puckel/docker-airflow \
webserver

The DAG located at the /usr/local/airflow/dags path on the container is defined as follows:

import airflow
from airflow import DAG
from airflow.operators import BashOperator
from datetime import datetime, timedelta
import pandas as pd

# Following are defaults which can be overridden later on
default_args = {
    'owner': 'me',
    'depends_on_past': False,
    'start_date': datetime(2021, 12, 31),
    'email': ['[email protected]'],
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': timedelta(minutes=1),
}

dag = DAG('datafile', default_args=default_args)

def task_make_local_dataset():
  print("task_make_local_dataset")
  local_data_create=pd.DataFrame({'a':[1,2,3], 'b':[4,5,6]})
  local_data_create.to_csv('/usr/local/airflow/data/local_data_input.csv')

t1 = BashOperator(
    task_id='write_local_dataset',
    python_callable=task_make_local_dataset(),
    bash_command='python3 ~/airflow/dags/datatest.py',
    dag=dag)

The error in the DAG appears to be in the line

local_data_create.to_csv('/usr/local/airflow/data/local_data_input.csv')

I don’t have permission to write to this location.

Attempts

I’ve tried changing the location of the data directory on the container, but airflow can’t access it. Do I have to change permissions? It seems that this is a really simple thing that most people would want to be able to do: write to a container. I’m guessing I’m just missing something.

Question Answer

Don’t use Puckel Docker Image. It’s not been maintained for years, Airflow 1.10 has reached End Of Life in June 2021. You should only look at Airflow 2 and Airflow has official reference image that you can use:

Airflow 2 has also Quick-Start guides you can use – based on the image and docker compose: https://airflow.apache.org/docs/apache-airflow/stable/start/index.html

And it also has Helm Chart that can be used to productionize your setup. https://airflow.apache.org/docs/helm-chart/stable/index.html

Don’t waste your (and other’s) time on Puckel and Airflow 1.10.

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.