arrow: Integration to 'Apache' 'Arrow'
'Apache' 'Arrow' is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. This package provides an interface to the 'Arrow C++' library.
Help page | Topics |
Functions available in Arrow dplyr queries | acero arrow-dplyr arrow-functions arrow-verbs |
Array Classes | Array DictionaryArray FixedSizeListArray LargeListArray ListArray MapArray StructArray |
ArrayData class | ArrayData |
Create an Arrow Array | arrow_array |
Report information on the package's capabilities | arrow_available arrow_info arrow_with_acero arrow_with_dataset arrow_with_gcs arrow_with_json arrow_with_parquet arrow_with_s3 arrow_with_substrait |
Create an Arrow Table | arrow_table |
Convert an object to an Arrow Array | as_arrow_array as_arrow_array.Array as_arrow_array.ChunkedArray as_arrow_array.Scalar |
Convert an object to an Arrow Table | as_arrow_table as_arrow_table.arrow_dplyr_query as_arrow_table.Dataset as_arrow_table.default as_arrow_table.RecordBatch as_arrow_table.RecordBatchReader as_arrow_table.Schema as_arrow_table.Table |
Convert an object to an Arrow ChunkedArray | as_chunked_array as_chunked_array.Array as_chunked_array.ChunkedArray |
Convert an object to an Arrow DataType | as_data_type as_data_type.DataType as_data_type.Field as_data_type.Schema |
Convert an object to an Arrow RecordBatch | as_record_batch as_record_batch.arrow_dplyr_query as_record_batch.RecordBatch as_record_batch.Table |
Convert an object to an Arrow RecordBatchReader | as_record_batch_reader as_record_batch_reader.arrow_dplyr_query as_record_batch_reader.Dataset as_record_batch_reader.function as_record_batch_reader.RecordBatch as_record_batch_reader.RecordBatchReader as_record_batch_reader.Scanner as_record_batch_reader.Table |
Convert an object to an Arrow Schema | as_schema as_schema.Schema as_schema.StructType |
Create a Buffer | buffer |
Buffer class | Buffer |
Call an Arrow compute function | call_function |
Create a Chunked Array | chunked_array |
ChunkedArray class | ChunkedArray |
Compression Codec class | Codec |
Check whether a compression codec is available | codec_is_available |
Compressed stream classes | CompressedInputStream CompressedOutputStream compression |
Concatenate zero or more Arrays | c.Array concat_arrays |
Concatenate one or more Tables | concat_tables |
Copy files between FileSystems | copy_files |
Manage the global CPU thread pool in libarrow | cpu_count set_cpu_count |
Create a source bundle that includes all thirdparty dependencies | create_package_with_all_dependencies |
CSV Convert Options | csv_convert_options |
CSV Parsing Options | csv_parse_options |
CSV Reading Options | csv_read_options |
CSV Writing Options | csv_write_options |
CSV dataset file format | CsvFileFormat |
File reader options | CsvConvertOptions CsvParseOptions CsvReadOptions CsvWriteOptions JsonParseOptions JsonReadOptions TimestampParser |
Arrow CSV and JSON table reader classes | CsvTableReader JsonTableReader |
Create Arrow data types | binary bool boolean data-type date32 date64 decimal decimal128 decimal256 duration FixedSizeListType fixed_size_binary fixed_size_list_of float float16 float32 float64 halffloat int16 int32 int64 int8 large_binary large_list_of large_utf8 list_of MapType map_of null string struct time32 time64 timestamp uint16 uint32 uint64 uint8 utf8 |
Multi-file datasets | Dataset DatasetFactory FileSystemDataset FileSystemDatasetFactory InMemoryDataset UnionDataset |
Create a DatasetFactory | dataset_factory |
DataType class | DataType |
Create a dictionary type | dictionary |
class DictionaryType | DictionaryType |
Arrow expressions | Expression |
ExtensionArray class | ExtensionArray |
ExtensionType class | ExtensionType |
FeatherReader class | FeatherReader |
Create a Field | field |
Field class | Field |
Dataset file formats | FileFormat IpcFileFormat ParquetFileFormat |
FileSystem entry info | FileInfo |
file selector | FileSelector |
FileSystem classes | FileSystem GcsFileSystem LocalFileSystem S3FileSystem SubTreeFileSystem |
Format-specific write options | FileWriteOptions |
FixedWidthType class | FixedWidthType |
Connect to a Flight server | flight_connect |
Explicitly close a Flight client | flight_disconnect |
Get data from a Flight server | flight_get |
Send data to a Flight server | flight_put |
Format-specific scan options | CsvFragmentScanOptions FragmentScanOptions JsonFragmentScanOptions ParquetFragmentScanOptions |
Connect to a Google Cloud Storage (GCS) bucket | gs_bucket |
Construct Hive partitioning | hive_partition |
Extract a schema from an object | infer_schema |
Infer the arrow Array type from an R object | infer_type type |
InputStream classes | BufferReader InputStream MemoryMappedFile RandomAccessFile ReadableFile |
Install or upgrade the Arrow library | install_arrow |
Install pyarrow for use with reticulate | install_pyarrow |
Manage the global I/O thread pool in libarrow | io_thread_count set_io_thread_count |
JSON dataset file format | JsonFileFormat |
List available Arrow C++ compute functions | list_compute_functions |
See available resources on a Flight server | flight_path_exists list_flights |
Load a Python Flight server | load_flight_server |
Apply a function to a stream of RecordBatches | map_batches |
Value matching for Arrow objects | is_in match_arrow |
Message class | Message |
MessageReader class | MessageReader |
Create a new read/write memory mapped file of a given size | mmap_create |
Open a memory mapped file | mmap_open |
Extension types | new_extension_array new_extension_type register_extension_type reregister_extension_type unregister_extension_type |
Open a multi-file dataset | open_dataset |
Open a multi-file dataset of CSV or other delimiter-separated format | open_csv_dataset open_delim_dataset open_tsv_dataset |
OutputStream classes | BufferOutputStream FileOutputStream OutputStream |
ParquetArrowReaderProperties class | ParquetArrowReaderProperties |
ParquetFileReader class | ParquetFileReader |
ParquetFileWriter class | ParquetFileWriter |
ParquetReaderProperties class | ParquetReaderProperties |
ParquetWriterProperties class | ParquetWriterProperties |
Define Partitioning for a Dataset | DirectoryPartitioning DirectoryPartitioningFactory HivePartitioning HivePartitioningFactory Partitioning |
Read a CSV or other delimited file with Arrow | read_csv2_arrow read_csv_arrow read_delim_arrow read_tsv_arrow |
Read a Feather file (an Arrow IPC file) | read_feather read_ipc_file |
Read Arrow IPC stream format | read_ipc_stream |
Read a JSON file | read_json_arrow |
Read a Message from a stream | read_message |
Read a Parquet file | read_parquet |
Read a Schema from a stream | read_schema |
Create a RecordBatch | record_batch |
RecordBatch class | RecordBatch |
RecordBatchReader classes | RecordBatchFileReader RecordBatchReader RecordBatchStreamReader |
RecordBatchWriter classes | RecordBatchFileWriter RecordBatchStreamWriter RecordBatchWriter |
Register user-defined functions | register_scalar_function |
Connect to an AWS S3 bucket | s3_bucket |
Create an Arrow Scalar | scalar StructScalar |
Arrow scalars | Scalar |
Scan the contents of a dataset | Scanner ScannerBuilder |
Create a schema or extract one from an object. | schema |
Schema class | Schema |
Show the details of an Arrow Execution Plan | show_exec_plan |
Table class | Table |
Create an Arrow object from a DuckDB connection | to_arrow |
Create a (virtual) DuckDB table from an Arrow object | to_duckdb |
Combine and harmonize schemas | unify_schemas |
'table' for Arrow objects | value_counts |
Extension type for generic typed vectors | vctrs_extension_array vctrs_extension_type |
Write CSV file to disk | write_csv_arrow |
Write a dataset | write_dataset |
Write a dataset into partitioned flat files. | write_csv_dataset write_delim_dataset write_tsv_dataset |
Write a Feather file (an Arrow IPC file) | write_feather write_ipc_file |
Write Arrow IPC stream format | write_ipc_stream |
Write Parquet file to disk | write_parquet |
Write Arrow data to a raw vector | write_to_raw |