SS22 Scientific Coding with Julia

Saving dataframes

Now that we prepared our data source and noticed how much work it was to restore the original data types, it make sense to save the current state of our dataframe. Our data source was provided in the CSV format which has the major disadvantage that Julia needs to elaborately guess the data type of each column, the used decimal symbol, and so on. But it is also possible to store this so-called meta data along with the dataframe. Arrow.jl provides sufficient metadata to know how each item has to be interpreted (at least for simple types). Additionally, this format is compatible across different computer architectures and can be read with various programming languages (like e.g. Julia, Python and R).

To write a dataframe into an arrow-file simply call:

using Arrow

Arrow.write("survey.arrow", df_survey)

Loading the file again is slightly more complicated since Arrow.Table is immutable by default. Thus we first need to convert it into a named tuple of vectors which then can be transformed into a mutable dataframe.

using Arrow, DataFrames

df_survey = DataFrame(Arrow.columntable(Arrow.Table("survey.arrow")))
CC BY-SA 4.0 - Gregor Ehrensperger, Peter Kandolf, Jonas Kusch. Last modified: September 09, 2022. Website built with Franklin.jl and the Julia programming language.