Skip to content

Type Mapping

Native Types

Proto Pyarrow Note
bool bool_
bytes binary/large_binary Configurable
double float64
enum int32/string/binary/large_string/large_binary/dictionary(int32, string)/dictionary(int32, binary) Configurable
fixed32 uint32
fixed64 uint64
float float32
int32 int32
int64 int64
message struct
sfixed32 int32
sfixed64 int64
sint32 int32
sint64 int64
string string/large_string Configurable
uint32 uint32
uint64 uint64
protarrow.ProtarrowConfig(
    string_type=pa.large_string(),
    binary_type=pa.large_binary(),
    enum_type=pa.large_string(),
)

When using a string or binary enum_type, the type must match the corresponding string_type or binary_type. For example, enum_type=pa.large_string() requires string_type=pa.large_string().

Dictionary-encoded enums

Enums can be dictionary-encoded for memory-efficient string or binary representation:

protarrow.ProtarrowConfig(
    enum_type=pa.dictionary(pa.int32(), pa.string()),
)

Dictionary-encoded enums with pa.large_string() or pa.large_binary() values are not currently supported due to a limitation in PyArrow that prevents creating DictionaryArray with large string or large binary value types.

Other types

Proto Pyarrow Note
repeated list_/large_list Configurable
map map_
google.protobuf.BoolValue bool_
google.protobuf.BytesValue binary/large_binary Configurable
google.protobuf.DoubleValue float64
google.protobuf.Empty struct([])
google.protobuf.Int32Value int32
google.protobuf.Int64Value int64
google.protobuf.StringValue string/large_string Configurable
google.protobuf.Timestamp timestamp("ns", "UTC") Unit and timezone are configurable
google.protobuf.UInt32Value uint32
google.protobuf.UInt64Value uint64
google.type.Date date32()
google.type.TimeOfDay time64/time32 Unit and type are configurable
google.type.Duration duration("ns") Unit is configurable
protarrow.ProtarrowConfig(
    list_array_type=pa.LargeListArray,
    timestamp_type=pa.timestamp("s", "UTC"),
    time_of_day_type=pa.time32("s"),
    duration_type=pa.duration("s"),
)

Date range limitation

google.type.Date is converted through Python's datetime.date, which only supports dates from 0001-01-01 to 9999-12-31. Proto Date values outside this range (e.g. Date(year=0, month=0, day=0)) cannot be represented as datetime.date. These values are stored using a special sentinel value and will round-trip back as Date(year=0, month=0, day=0) regardless of the original month and day.

Nullability

By default, nullability follows the convention imposed by protobuf:

  • Primitive field, list, map, list value, map key and map value are non-nullable.
  • Non-repeated messages and optional are the only nullable fields.

Some of this can be configured:

protarrow.ProtarrowConfig(
    list_nullable=True,
    map_nullable=True,
    list_value_nullable=True,
    map_value_nullable=True,
)

Map/List values fields names

You can also customize the name of the pa.list_ and pa.map_ items names. This doesn't semantically change the schema of the table, but may change its string representation.

protarrow.ProtarrowConfig(
    list_value_name="array",
    map_value_name="map_value",
)

For example this will change a repated int32 field's arrow type from ListType(list<item: int32>) to ListType(list<array: int32>).