Skip to content

Save 16 bytes RAM by placing esp8266_gpioToFn (core_esp8266_wiring_digital.cpp) array in PROGMEM #6703

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Nov 5, 2019

Conversation

dok-net
Copy link
Contributor

@dok-net dok-net commented Nov 2, 2019

There are macros for each of the GPIO pin function registers GPF0 - GPF15, and there is also a macro that takes the GPIO # as argument. This latter macros uses a lookup table. From global grep over the source tree, there seems to be no other use of this except in the pinMode() function, that IMHO would not be called from an ISR - therefore, no IRAM_ATTR constraint seems to exists.

In this case, 16 bytes in RAM can be saved by marking the array PROGMEM, and using the alignment correction function in the macro.

@dok-net
Copy link
Contributor Author

dok-net commented Nov 2, 2019

For regression considerations, in The Official Arduino AVR core ArduinoCore-avr, Arduino.h, the corresponding macros used in their pinMode() implementation are:

#define digitalPinToPort(P) ( pgm_read_byte( digital_pin_to_port_PGM + (P) ) )
#define digitalPinToBitMask(P) ( pgm_read_byte( digital_pin_to_bit_mask_PGM + (P) ) )

So that's also in flash, not RAM.

@TD-er
Copy link
Contributor

TD-er commented Nov 3, 2019

except in the pinMode() function, that IMHO would not be called from an ISR

I do agree on your opinion about that, but do you expect there are sketches out there using pinMode() from ISR?

Another point against this PR may be that this does probably have an effect on timing when bit-banging pins.
A lot of libraries out there interfacing with sensors use bit-banging to interface with the sensor and sadly some of them are quite timing critical (way too critical in my opinion).
This change does make those libraries depending on the flash speed, which may vary a lot among available nodes.

@dok-net
Copy link
Contributor Author

dok-net commented Nov 3, 2019

@TD-er I've to correct myself, because obviously pinMode() must not be called from ISRs, it's completely lacking the ICACHE_RAM_ATTR attribute.
Are your aware of libraries that actually use pinMode() instead of digitalWrite() to alter the output state of pins? I myself was (mis)lead into this use a while back in EspSoftwareSerial, but have since returned to the well documented use of setting pinMode() to OUTPUT and switching state only via digitalWrite(). Any non-ICACHE_RAM_ATTR function like pinMode() itself could trigger a reload from flash for its code, so this part is non-realtime to begin with.
TL;DR: No, I personally am more confident than before that this PR is fine.

@TD-er
Copy link
Contributor

TD-er commented Nov 3, 2019

I've looked through the libraries included in ESPEasy and it seems BlynkApi does do a lot of pinMode switching. (no idea why though)

template<class Proto>
BLYNK_FORCE_INLINE
void BlynkApi<Proto>::processCmd(const void* buff, size_t len)

Also NewPingESP8266 does switch a lot between pin modes.

Our own implementation of the Dallas 1wire code does have a lot of switches between pin modes (I really don't like that implementation, since it appears to be way too time critical)

Reading the DHT11/22 is also based on switching pin mode.

Our implementation of SHT1x also uses pin mode switching for reading the sensor.

Reading the P1 interface of a smart meter is using the pin mode switching.

PMSx003 (PlanTower dust particle meters) in our code is also using pin mode switching during initialization.

Our code for TTP229 key pad uses pin mode for reading.

Our code for HX711 load cell uses pin mode switching at init of the sensor.

Our code for Kamstrup 401 uses pin mode switching during read.

That's just a quick grep over my code base.

@dok-net
Copy link
Contributor Author

dok-net commented Nov 3, 2019

@TD-er Please check this out. Everything is better than without patch - just why doesn't the extra size of the array in flash show in the build logs...

void setup()
{
    Serial.begin(115200);
    pinMode(D6, INPUT);
}

void loop()
{
    auto start = ESP.getCycleCount();
    for (int i = 0; i < 500000; ++i)
        pinMode(D6, i & 1 ? INPUT : OUTPUT);
    Serial.print("expired cpu cycles = "); Serial.println(ESP.getCycleCount() - start);
}

/*
    Without PR:
IROM   *: 231264          - code in flash         (default or ICACHE_FLASH_ATTR)
IRAM   *: 27116   \ 32768 - code in IRAM          (ICACHE_RAM_ATTR, ISRs...)
DATA   *: 1268  )         - initialized variables (global, static) in RAM\HEAP
RODATA *: 728   ) \ 81920 - constants             (global, static) in RAM\HEAP
BSS    *: 25256 )         - zeroed variables      (global, static) in RAM\HEAP
Program size: 260,376 bytes (used 25% of a 1,044,464 byte maximum) (26.98 secs
Minimum Memory Usage: 27252 bytes (33% of a 81920 byte maximum))

expired cpu cycles = 65266556
expired cpu cycles = 65259568
expired cpu cycles = 65255096
expired cpu cycles = 65262053
expired cpu cycles = 65259225
expired cpu cycles = 65255637
expired cpu cycles = 65261328
expired cpu cycles = 65260323
expired cpu cycles = 65254784
expired cpu cycles = 65262488
expired cpu cycles = 65259197
expired cpu cycles = 65255634

Including PR:
IROM   *: 231264          - code in flash         (default or ICACHE_FLASH_ATTR)
IRAM   *: 27116   \ 32768 - code in IRAM          (ICACHE_RAM_ATTR, ISRs...)
DATA   *: 1252  )         - initialized variables (global, static) in RAM\HEAP
RODATA *: 728   ) \ 81920 - constants             (global, static) in RAM\HEAP
BSS    *: 25256 )         - zeroed variables      (global, static) in RAM\HEAP
Program size: 260,360 bytes (used 25% of a 1,044,464 byte maximum) (27.68 secs)
Minimum Memory Usage: 27236 bytes (33% of a 81920 byte maximum)

expired cpu cycles = 63516546
expired cpu cycles = 63509021
expired cpu cycles = 63506651
expired cpu cycles = 63511641
expired cpu cycles = 63509578
expired cpu cycles = 63504772
expired cpu cycles = 63511526
expired cpu cycles = 63508721
expired cpu cycles = 63505631
expired cpu cycles = 63511196
expired cpu cycles = 63509576
expired cpu cycles = 63504772
*/

@TD-er
Copy link
Contributor

TD-er commented Nov 3, 2019

@dok-net
That's about 2.7% increase in performance. Not bad!

@dok-net
Copy link
Contributor Author

dok-net commented Nov 3, 2019

It will have to be the smaller macro expansion vs. the larger table size.

@devyte
Copy link
Collaborator

devyte commented Nov 4, 2019

I'm not sure about the array in PROGMEM, I can't help thinking about the 1-wire protocols like OneWire or single-wire uart (like the dynamixel bus). In such cases, the usage is usually as follows:

  1. init pin to input
  2. change pinMode to output
  3. send data
  4. change pinMode to input
  5. receive data

Which would be ok, except for async incoming data. In such cases, the sequence would be as follows:

  1. init pin to input
  2. incoming data => IRQ
  3. in ISR, read data
  4. change pinMode to output
  5. send ack or reset or whatever
  6. change pinMode to input

That means that async incoming data could be implemented with pinMode changes within the ISR, which I guess would mean the esp8266_gpioToFn table would need to be in RAM?
I'm not sure about all of this, I'd say it's worth investigating.

@dok-net
Copy link
Contributor Author

dok-net commented Nov 5, 2019

@devyte But I noted above, that calling pinMode() from ISR is illegal to begin with - it's not pinned to IRAM. Why would any library use this and keep crashing :-) That besides the obvious mistake of doing time consuming stuff directly from an ISR.

@devyte devyte merged commit d14419e into esp8266:master Nov 5, 2019
@dok-net dok-net deleted the esp8266_gpioToFn branch November 5, 2019 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants