Table of Contents
This article will guide you to install pyarrow on Alpine 3.20. How to build Docker image with pyarrow package based on python:3.12.7-alpine with Alpine version 3.20?
I need to build a Python app with the Alpine image to ensure the highest level of security possible.
With the base image python:3.12.7-slim I can build the image for the app successfully because this base image is based on Debian. But Alpine is different, this is the situation I encountered.
Alpine does not support libraries for the pyarrow package
When I install the pyarrow package on the python:3.12.7-alpine image (this image has Alpine version 3.20) I get the following error.

/ # pip install pyarrow
Collecting pyarrow
Downloading pyarrow-18.0.0.tar.gz (1.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 5.5 MB/s eta 0:00:00
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: pyarrow
Building wheel for pyarrow (pyproject.toml) ... error
error: subprocess-exited-with-error
× Building wheel for pyarrow (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [783 lines of output]
running bdist_wheel
running build
running build_py
creating build/lib.linux-x86_64-cpython-312/pyarrow
...
copying pyarrow/vendored/version.py -> build/lib.linux-x86_64-cpython-312/pyarrow/vendored
running build_ext
creating /tmp/pip-install-ogcm3i1n/pyarrow_45fbd110dc51439481e9486f9d43996a/build/temp.linux-x86_64-cpython-312
-- Running cmake for PyArrow
cmake -DCMAKE_INSTALL_PREFIX=/tmp/pip-install-ogcm3i1n/pyarrow_45fbd110dc51439481e9486f9d43996a/build/lib.linux-x86_64-cpython-312/pyarrow -DPYTHON_EXECUTABLE=/usr/local/bin/python3.12 -DPython3_EXECUTABLE=/usr/local/bin/python3.12 -DPYARROW_CXXFLAGS= -DPYARROW_BUNDLE_ARROW_CPP=off -DPYARROW_BUNDLE_CYTHON_CPP=off -DPYARROW_GENERATE_COVERAGE=off -DCMAKE_BUILD_TYPE=release /tmp/pip-install-ogcm3i1n/pyarrow_45fbd110dc51439481e9486f9d43996a
error: command 'cmake' failed: No such file or directory
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for pyarrow
Failed to build pyarrow
[notice] A new release of pip is available: 24.2 -> 24.3.1
[notice] To update, run: pip install --upgrade pip
ERROR: ERROR: Failed to build installable wheels for some pyproject.toml based projects (pyarrow)When searching on the internet (at the time of writing this article), there are many people encountering such an error when installing pyarrow on Alpine. You can see some issues on Apache Arrow Github.

- Python Alpine and Pyarrow · Issue #43731 · apache/arrow · GitHub
- [Python] Failed to build pyarrow using python:3.10-alpine docker image · Issue #39846 · apache/arrow · GitHub
This is where it gets really tricky, as I don’t know what to do when my Python application requires this pyarrow package.
After spending more than a week trying many ways, the command to build the image failed. Finally, I found an issue on Apache Arrow’s Github that mentioned that we need to build the Arrow C++ library from source to support the pyarrow package.
Install pyarrow on Alpine
Below is the Dockerfile I am using for my Python app. I will explain some important points.
FROM python:3.12.7-alpine
# Setup env
ENV LANG=C.UTF-8 \
LC_ALL=C.UTF-8 \
PYTHONDONTWRITEBYTECODE=1 \
PYTHONFAULTHANDLER=1 \
ACCEPT_EULA=Y
# Check the latest version for the package 'arrow' at https://github.com/apache/arrow/releases
ARG ARROW_VERSION=18.0.0
# Set the working directory in the container
WORKDIR /app
# Copy requirements first to leverage caching
COPY requirements.txt .
# Install build dependencies and required packages
RUN apk add --no-cache --virtual .build-deps \
autoconf \
bash \
bison \
boost-dev \
brotli-dev \
build-base \
bzip2-dev \
cargo \
ca-certificates \
clang \
clang-dev \
cmake \
curl \
curl-dev \
flex \
gcc \
g++ \
git \
grpc-dev \
jemalloc-dev \
libc-dev \
libffi-dev \
libgcc \
libjpeg-turbo-dev \
libstdc++ \
libxml2-dev \
libxslt-dev \
libre2 \
lld \
llvm-dev \
linux-headers \
libstdc++ \
lz4-dev \
make \
musl-dev \
ncurses-libs \
ninja \
openssl-dev \
postgresql-dev \
protobuf-dev \
rapidjson-dev \
re2-dev \
rust \
snappy-dev \
thrift-dev \
unixodbc-dev \
utf8proc-dev \
xsimd-dev \
xz-dev \
zlib-dev \
zstd-dev \
py3-pip \
py3-numpy \
py3-wheel \
&& apk upgrade --no-cache openssl \
&& pip install --upgrade --no-cache-dir pip Werkzeug \
&& pip install --no-cache-dir cython numpy pandas pipenv pytest setuptools six \
&& git clone --no-checkout https://github.com/apache/arrow.git /arrow \
&& cd /arrow \
&& git checkout tags/apache-arrow-${ARROW_VERSION} \
&& mkdir -p /arrow/cpp/build \
&& cd /arrow/cpp/build \
&& cmake .. --preset ninja-release-python-maximal -DARROW_GANDIVA=OFF -DARROW_ACERO=OFF -DARROW_AZURE=OFF -DARROW_CUDA=OFF -DARROW_BUILD_TESTS=OFF \
&& ninja -j$(nproc) \
&& ninja install \
&& pip install --no-cache-dir pyarrow \
&& cd /app && pip install --no-cache-dir -r requirements.txt \
&& rm -rf /var/cache/apk/* /arrow /usr/local/lib/python3.12/site-packages/examples
# Copy the rest of the application code
COPY . .
# Expose the desired port (change if necessary)
EXPOSE 80
# Run the application
CMD ["python", "app.py"]ARG ARROW_VERSION=18.0.0: First, you need to check the latest version of Arrow package now at https://github.com/apache/arrow/releases. You can choose the version you want to install, for example I need to install arrow 18.0.0.RUN apk add --no-cache --virtual .build-deps...: This command will install the necessary library packages during the installation process.git clone --no-checkout...toninja install: The commands between these two commands will clone the Arrow repository and install the Arrow C++ library.pip install --no-cache-dir pyarrow: The command to install the pyarrow package on Apline after we have successfully compiled and installed the Arrow C++ library.
The most important part of the Dockerfile above is this code. This is how you can install pyarrow on Alpine.
&& git clone --no-checkout https://github.com/apache/arrow.git /arrow \
&& cd /arrow \
&& git checkout tags/apache-arrow-${ARROW_VERSION} \
&& mkdir -p /arrow/cpp/build \
&& cd /arrow/cpp/build \
&& cmake .. --preset ninja-release-python-maximal -DARROW_GANDIVA=OFF -DARROW_ACERO=OFF -DARROW_AZURE=OFF -DARROW_CUDA=OFF -DARROW_BUILD_TESTS=OFF \
&& ninja -j$(nproc) \
&& ninja install \
&& pip install --no-cache-dir pyarrow \Build Docker image with pyarrow and Alpine
Now, you have the Dockerfile above, download it to your machine, change the port in the EXPOSE command and the file name in the CMD command. Or change more commands that you need for your application.
Run the command to build the image, I assume you named the file for the Dockerfile above in your repository as Dockerfile.alpine.
docker build -t my-app:latest -f Dockerfile.alpine .
The image build may take up to 30 minutes due to the arrow compilation process.
With python:3.12.7-slim, my image is about 1.6 GB in size (including about 700M of app code).
With the Dockerfile above, using python:3.12.7-alpine, my image is about 3.4 GB in size. That is double the size of the slim image.
The large image size is also a negative point, however, in my case, it prioritizes security so the increased size is acceptable.
Conclusion
With this article, I hope it can help you when you encounter the error of not being able to install pyarrow on Alpine.
Using Alpine seems to be very popular with companies now because it increases the security of the application. I have experienced this error for more than 1 week and hope the above is really useful for you.