Closed
Description
$ go version
go version go1.7 linux/amd64
Reading of csv files is, out of the box, quite slow (tl;dr: 3x slower than a simple Java program, 1.5x slower than the obvious python code). A typical example:
package main
import (
"bufio"
"encoding/csv"
"fmt"
"io"
"os"
)
func main() {
f, _ := os.Open("mock_data.csv")
defer f.Close()
r := csv.NewReader(f)
for {
line, err := r.Read()
if err == io.EOF {
break
}
if line[0] == "42" {
fmt.Println(line)
}
}
}
Python3 equivalent:
import csv
with open('mock_data.csv') as f:
r = csv.reader(f)
for row in r:
if row[0] == "42":
print(row)
Equivalent Java code [EDIT: not actually equivalent, please see pauldraper comment below for a better test]
import java.io.BufferedReader;
import java.io.FileReader;
public class ReadCsv {
public static void main(String[] args) {
BufferedReader br;
String line;
try {
br = new BufferedReader(new FileReader("mock_data.csv"));
while ((line = br.readLine()) != null) {
String[] data = line.split(",");
if (data[0].equals("42")) {
System.out.println(line);
}
}
} catch (Exception e) {}
}
}
Tested on a 50MB, 1'000'002 lines csv file generated as:
data = ",Carl,Gauss,[email protected],Male,30.4.17.77\n"
with open("mock_data.csv", "w") as f:
f.write("id,first_name,last_name,email,gender,ip_address\n")
f.write(("1"+data)*int(1e6))
f.write("42"+data);
Results:
Go: avg 1.489 secs
Python: avg 0.933 secs (1.5x faster)
Java: avg 0.493 secs (3.0x faster)
Go error reporting is obviously better than the one you can have with that Java code, and I'm not sure about Python, but people has been complaining about encoding/csv
slowness, so it's probably worth investigating whether the csv
package can be made faster.