Skip to content

Potential Memory Leak in libpostal_parse_address #676

@BookGin

Description

@BookGin

Hi!

There seems to be memory leak issue in libpostal_parse_address. The memory usage will increase over time when parsing the same address.


My country is

This issue is not specific to any country or address. I tried using other addresses or random strings, but the issue still remains.


Here's how I'm using libpostal

The program parses the example address 10M times and use Linux pmap to print its memory usage.

// gcc -o app app.c $(pkg-config --cflags --libs libpostal)
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <libpostal/libpostal.h>

int main(int argc, char **argv) {
    if (!libpostal_setup() || !libpostal_setup_parser()) {
        exit(EXIT_FAILURE);
    }

    libpostal_address_parser_options_t options = libpostal_get_address_parser_default_options();

    int count = 10000000;
    int batch = 100000;
    for (int i = 0; i < count; i++) {
        libpostal_address_parser_response_t *parsed = libpostal_parse_address("781 Franklin Ave Crown Heights Brooklyn NYC NY 11216 USA", options);
        libpostal_address_parser_response_destroy(parsed);
        if (i % batch == 0)
        {
          char command[256];
          sprintf(command, "pmap -x %d > %d.txt", getpid(), i / batch + 1);
          puts(command);
          system(command);
        }
    }

    libpostal_teardown();
    libpostal_teardown_parser();
}

Here's what I did

See above.


Here's what I got

The memory usage increases over time.

echo "File                     Kbytes   RSS    Dirty"; for i in {5..100..5}; do echo -n "$i.txt: " && cat $i.txt | grep total; done
File                     Kbytes   RSS    Dirty
5.txt: total kB         1942360 1924872 1921816
10.txt: total kB         2007900 1960788 1957732
15.txt: total kB         2007900 1980316 1977260
20.txt: total kB         2073436 1999848 1996792
25.txt: total kB         2073436 2019380 2016324
30.txt: total kB         2073436 2038912 2035856
35.txt: total kB         2204508 2058444 2055388
40.txt: total kB         2204508 2077972 2074916
45.txt: total kB         2204508 2097504 2094448
50.txt: total kB         2204508 2117036 2113980
55.txt: total kB         2204508 2136568 2133512
60.txt: total kB         2204508 2156100 2153044
65.txt: total kB         2204508 2175632 2172576
70.txt: total kB         2466652 2195160 2192104
75.txt: total kB         2466652 2214692 2211636
80.txt: total kB         2466652 2234224 2231168
85.txt: total kB         2466652 2253756 2250700
90.txt: total kB         2466652 2273288 2270232
95.txt: total kB         2466652 2292816 2289760
100.txt: total kB         2466652 2312348 2309292

I also use valgrind to run 1M times but it does not report memory leak.

valgrind ./app2
==3615986== Memcheck, a memory error detector
==3615986== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==3615986== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==3615986== Command: ./app2
==3615986==
==3615986== Warning: set address range perms: large range [0x2f85a040, 0x3fa385f0) (undefined)
==3615986== Warning: set address range perms: large range [0x3fa39040, 0x4fc175f0) (undefined)
==3615986== Warning: set address range perms: large range [0x3fa391ca, 0x4fc171ca) (defined)
==3615986== Warning: set address range perms: large range [0x3fa39028, 0x4fc17608) (noaccess)
==3615986== Warning: set address range perms: large range [0x6577c040, 0x82a05c8c) (undefined)
==3615986== Warning: set address range perms: large range [0x2f85a028, 0x3fa38608) (noaccess)
==3615986== Warning: set address range perms: large range [0x6577c028, 0x82a05ca4) (noaccess)
==3615986==
==3615986== HEAP SUMMARY:
==3615986==     in use at exit: 0 bytes in 0 blocks
==3615986==   total heap usage: 71,539,052 allocs, 71,539,052 frees, 7,820,286,857 bytes allocated
==3615986==
==3615986== All heap blocks were freed -- no leaks are possible
==3615986==
==3615986== For lists of detected and suppressed errors, rerun with: -s
==3615986== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Here's what I was expecting

The memory usage should not increase overtime.


For parsing issues, please answer "yes" or "no" to all that apply.

This is not parsing issues.


Here's what I think could be improved

See above.

More information:

  1. libpostal git version: 8f2066b1d30f4290adf59cacc429980f139b8545
  2. OS: Ubuntu 20.04.6 LTS 5.4.0-192-generic

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions