Caching files #36

mjsobrep · 2019-09-01T03:31:53Z

Issue: #19

Description of changes:
Added support for caching files by saving them and keeping note of them in sqlite db. Cache file size is tracked and when a maximum is exceeded, the least recently used file is deleted.

TODO

Expose the cache size to the launch system
Allow the cache location to be set in the launch system
Test more widely (only tested with the simple action server example)
Document all of this for general use
Add unit tests for this functionality

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

This still needs to be added to the documentation and exposed to the roslaunch system

codecov · 2019-09-01T14:15:31Z

Codecov Report

Merging #36 (ee320dd) into master (c8ebbf7) will increase coverage by 4.96%.
The diff coverage is 94.68%.

❗ Current head ee320dd differs from pull request most recent head b3829ee. Consider uploading reports for the commit b3829ee to get more accurate results

@@            Coverage Diff             @@
##           master      #36      +/-   ##
==========================================
+ Coverage   77.45%   82.41%   +4.96%     
==========================================
  Files           3        4       +1     
  Lines         275      364      +89     
==========================================
+ Hits          213      300      +87     
- Misses         62       64       +2

Flag	Coverage Δ
ROS_1	`82.41% <94.68%> (+4.96%)`	⬆️
kinetic	`82.41% <94.68%> (+4.96%)`	⬆️
melodic	`82.41% <94.68%> (+4.96%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
tts/src/tts/db.py	`86.84% <86.84%> (ø)`
tts/src/tts/synthesizer.py	`92.53% <100.00%> (+8.19%)`	⬆️
tts/src/tts/__init__.py	`96.42% <0.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dfd3802...b3829ee. Read the comment docs.

mjsobrep · 2019-09-01T15:39:31Z

I can't get the code coverage up because of #37 . When those tests do start running, I expect that it will be a good idea to add random strings to the end of any test text that need to be pinging the server. Otherwise repeated runs of tests will just hit the cache. There also need to be some tests which intentionally hit the cache.

mjsobrep · 2019-09-09T19:42:05Z

@AAlon is this something that you all are interested in? Should I keep working on it?

AAlon · 2019-09-13T17:35:52Z

@AAlon is this something that you all are interested in? Should I keep working on it?

Definitely! thank you for contributing. We'll be reviewing this soon.

cevans87

Thanks for submitting this. I'm a fan of the approach. Just a few things that I think could show up as bugs and some suggestions.

I'll get back to you on how we can deal with test coverage after I understand #37 a bit better.

tts/src/tts/db.py

tts/src/tts/synthesizer.py

tts/src/tts/db.py

tts/src/tts/synthesizer.py

…rors

mjsobrep · 2019-11-09T22:47:55Z

@cevans87 Thanks for the feedback, sorry for being slow to get back on it. I have gotten rid of the size table per your recommendation and pulled some stuff in the db class to improve readability. I have tested under both normal operation and deleting cached files from the disk without telling the db.

mjsobrep · 2019-11-30T04:29:41Z

I just found a bug with this: If there is no network connection, then the error utterance (' the polly service cannot be reached ...') gets cached as whatever is being said. Then the system will always say that when the text is being said.

AAlon · 2020-01-06T21:49:47Z

I just found a bug with this: If there is no network connection, then the error utterance (' the polly service cannot be reached ...') gets cached as whatever is being said. Then the system will always say that when the text is being said.

Are you actively working on that? if so we should probably wait before re-reviewing. It'd be great if you could add some unit tests too. Thanks!

mjsobrep · 2020-01-11T17:06:05Z

@AAlon I took a look at it a while back, but haven't had time to come up with a good fix. It seems like the way that errors are handled is a little odd. I will take a look at fixing it eventually, but it will require the errors to propagate up rather than just being handled by passing a different sound file. I actually think that would be a good thing. In many situations the current error handling would be poor for human robot interaction. Allowing integrators (like me) to customize how errors are handled would be better. I should get to that at the latest by the middle of February. I'm pretty busy until then.

I'm not sure how to add unit tests, ref #37

mjsobrep · 2020-02-21T01:23:33Z

@cevans87 @AAlon I fixed the caching of the error file and added a bunch of new tests to get the code coverage up to a reasonable level. I think this is ready for a new review.

mjsobrep · 2020-03-19T19:14:24Z

Want to check back in on this.

mm318

Thank you very much for the significant contribution!

I have some comments and questions.

mm318 · 2020-03-23T09:17:25Z

tts/src/tts/synthesizer.py

+    class DummyEngine:
+        """A dummy engine which exists to facilitate testing. Can either
+        be set to act as if it is connected or disconnected. Will create files where
+        they are expected, but they will not be actual audio files."""
+
+        def __init__(self):
+            self.connected = True
+            self.file_size = 50000
+
+        def __call__(self, **kwargs):
+            """put a file at the specified location and return resonable dummy
+            values. If not connected, fills in the Exception fields.
+
+            Args:
+                **kwarks: dictionary with fields: output_format, voice_id, sample_rate,
+                          text_type, text, output_path
+
+            Returns: A json version of a string with fields: Audio File, Audio Type, 
+                Exception (if there is an exception), Traceback (if there is an exception), 
+                and if succesful Amazon Polly Response Metadata
+            """
+            if self.connected:
+                with open(kwargs['output_path'], 'wb') as f:
+                    f.write(os.urandom(self.file_size))
+                output_format = kwargs['OutputFormat'] if 'OutputFormat' in kwargs else 'ogg_vorbis'
+                resp = json.dumps({
+                    'Audio File': kwargs['output_path'],
+                    'Audio Type': output_format,
+                    'Amazon Polly Response Metadata': {'some header': 'some data'}
+                    })
+                return SynthesizerResponse(resp)
+            else:
+                current_dir = os.path.dirname(os.path.abspath(__file__))
+                error_ogg_filename = 'connerror.ogg'
+                error_details = {
+                    'Audio File': os.path.join(current_dir, '../src/tts/data', error_ogg_filename),
+                    'Audio Type': 'ogg',
+                    'Exception': {
+                        'dummy head': 'dummy val'
+                        # 'Type': str(exc_type),
+                        # 'Module': exc_type.__module__,
+                        # 'Name': exc_type.__name__,
+                        # 'Value': str(e),
+                    },
+                    'Traceback': 'some traceback'
+                }
+                return SynthesizerResponse(json.dumps(error_details))
+
+
+        def set_connection(self, connected):
+            """set the connection state
+
+            Args:
+                connected: boolean, whether to act connected or not
+            """
+            self.connected = connected
+
+        def set_file_sizes(self, size):
+            """Set the target file size for future files in bytes
+
+            Args:
+                size: the number of bytes to make the next files
+            """
+            self.file_size = size


I'm kind of on the fence about this class being here instead of being under test, such as in test_unit_synthesizer.py.

Yep, this belongs under test/

mm318 · 2020-03-23T09:26:27Z