About 50 results
Open links in new tab
  1. What are the pros and cons of the Apache Parquet format compared …

    Apr 24, 2016 · 30,36,2 Parquet files are most commonly compressed with the Snappy compression algorithm. Snappy compressed files are splittable and quick to inflate. Big data systems want to …

  2. Reading / Fixing a corrupt parquet file - Stack Overflow

    Sep 3, 2024 · Reading / Fixing a corrupt parquet file Asked 1 year, 4 months ago Modified 7 months ago Viewed 2k times

  3. Is it possible to read parquet files in chunks? - Stack Overflow

    Nov 29, 2019 · The Parquet format stores the data in chunks, but there isn't a documented way to read in it chunks like read_csv. Is there a way to read parquet files in chunks?

  4. Python: save pandas data frame to parquet file - Stack Overflow

    Dec 9, 2016 · Is it possible to save a pandas data frame directly to a parquet file? If not, what would be the suggested process? The aim is to be able to send the parquet file to another team, which they …

  5. Extension of Apache parquet files, is it '.pqt' or '.parquet'?

    Oct 19, 2021 · I wonder if there is a consensus regarding the extension of parquet files. I have seen a shorter .pqt extension, which has typical 3-letters (like in csv, tsv, txt, etc) and then there is a rather …

  6. How to append new data to an existing parquet file?

    Apr 27, 2023 · I have parquet files with some data in them. I want to add more data to them frequently every day. I want to do this without having to load the object to memory and then concatenate and …

  7. Updating values in apache parquet file - Stack Overflow

    Mar 3, 2015 · I have a quite hefty parquet file where I need to change values for one of the column. One way to do this would be to update those values in source text files and recreate parquet file but I'm …

  8. indexing - Index in Parquet - Stack Overflow

    Basically Parquet has added two new structures in parquet layout - Column Index and Offset Index. Below is a more detailed technical explanation what it solves and how. Problem Statement In the …

  9. How do I get schema / column names from parquet file?

    Nov 24, 2015 · Also, Cloudera (which supports and contributes heavily to Parquet) has a nice page with examples on usage of hangxie's parquet-tools. An example from that page for your use case: …

  10. Using polars is indeed faster than pandas 2 BUT NOT parquet file and ...

    Sep 25, 2023 · However, memory usage of polars is the same as pandas 2 which is 753MB. if I save csv file into parquet file with pyarrow engine. Pandas 2 has same speed as Polars or pandas is even …