Skip to content

Sort order disobeyed #546

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
vmatekole opened this issue Mar 10, 2016 · 15 comments
Closed

Sort order disobeyed #546

vmatekole opened this issue Mar 10, 2016 · 15 comments

Comments

@vmatekole
Copy link

Hi!

I am working with Spree and ElasticSearch gem and have the following code:

records = search_result.limit(Spree::Config.products_per_page).page(page).records
records.joins(:classifications).where("spree_products_taxons.taxon_id IN(#{@taxon.id})").reorder("spree_products_taxons.position asc")

Resulting SQL:

"SELECT \"spree_products\".* FROM \"spree_products\" INNER JOIN \"spree_products_taxons\" ON \"spree_products_taxons\".\"product_id\" = \"spree_products\".\"id\" WHERE \"spree_products\".\"deleted_at\" IS NULL AND \"spree_products\".\"id\" IN (1760, 1666, 1750, 1672, 1815, 1318, 1361, 1762, 1678, 6392, 1808, 1768, 1669, 1759, 1722, 1965, 1745, 6284, 1740, 1743, 7014, 1735, 1826, 1734, 1732, 1683, 1671, 7428, 1773, 1820, 7321, 1772, 1753, 1766, 1742, 1694, 7319, 2564, 6423, 7337, 1806, 1728, 1727, 1819, 1818, 1739, 1729, 1816) AND (spree_products_taxons.taxon_id IN(247))  ORDER BY spree_products_taxons.position asc"

the resulting order disobeys the order by, but strangely enough when I do a records.first or records.pluck the output is in the correct order.

@vmatekole
Copy link
Author

Hi!

It appears my issue is similar to #217. I have had to result to the same solution, return ids and then do another search with a correct order clause. Any thoughts on this?

@karmi
Copy link
Contributor

karmi commented Mar 14, 2016

Hi, where the reorder method is coming from?

UPDATE: I see it's coming from the ActiveRecord API.

There's an integration test for the order here, maybe we need to update that: https://github.com/elastic/elasticsearch-rails/blob/master/elasticsearch-model/test/integration/active_record_basic_test.rb#L208

Would it be possible to model your situation with the setup in that integration test? It would help me a lot when I'd have a piece of runnable code so I can try the situation myself.

@karmi
Copy link
Contributor

karmi commented Mar 14, 2016

(...) strangely enough when I do a records.first or records.pluck the output is in the correct order (...)

The records are loaded only when .to_a is called on them (which .first etc do implicitly). But I guess your issue is that the ORDER BY clause is missing from the final SQL query?

@fabiode
Copy link

fabiode commented Apr 25, 2016

@vmatekole I'm having the EXACT same issue, and when I mean EXACTLY, it's true. I'm trying to use Spree::Classification to reorder the Elasticsearch result, but with no success at all.

It appears the collection and the array that Elasticsearch brings to Rails interface aren't refreshed, or something like that. What did worked it was to use pluck(:id), reload all Products and order it, but then the pagination problem appears, since I've used the from parameter in Elasticsearch query.

A possible solution is to index the product position and sort in Elasticsearch, but everytime the taxon classification is updated, we'll have to update the whole index.

I don't know, maybe it's because the position is dealed inside Spree::Classification and not in Spree::Product and, as a PROXY Collection, Elasticsearch can't understand it. (I'm just thinking outloud, I don't know if this makes sense at all.)

@karmi
Copy link
Contributor

karmi commented Apr 30, 2016

Hi, I've left it out of my mind a little bit, been too busy with something else. Again, is it possible to model the situation either as an integration test, or as a standalone script like this one: https://github.com/elastic/elasticsearch-rails/blob/master/elasticsearch-model/examples/activerecord_article.rb, so we have a common ground and code to reason about?

@vmatekole
Copy link
Author

vmatekole commented May 30, 2016

@karmi and @fabiode. Sorry, somehow I missed these mentions. @karmi I will look at the integration test. @fabiode I have had a frustrating time with ES and Spree :( ... I think the rule of thumb should be to avoid ES when it comes to rendering the taxon pages as client requirements typically want ordering of products. I thought of similar solution but like you said it means maintaining positions within ES and that feels ugggh. I resulted in using classifications for rendering pages and sorting is done client-side using a JS lib. I only use ES for search at the moment :( ... Happy to collaborate on an overall better solution that includes ES.

@karmi
Copy link
Contributor

karmi commented May 30, 2016

@vmatekole, thanks, either fiddling with the integration test or having something like the standalone scripts in the examples directory would help...

@jamescgibson
Copy link

jamescgibson commented Jan 23, 2017

I'm having (I think) the same issue here, except without Spree. If I use .records to return an ActiveRecord collection, then that collection will always be in the order that it came out of Elasticsearch, even if I apply orders to it.

The most baffling thing is that if I do mycollection.search(...).records.order("column ASC").to_sql, and then paste that SQL into my Database, I get the correct, re-ordered results.

This makes me think that elasticsearch-model is doing something really funky, and even breaking .to_sql.

In the mean time, I'm doing what @fabiode suggested and just plucking IDs, and then creating a new query, but that is (understandably) less than ideal.

If helpful this is all with Postgres 9.5, elasticsearch-model 0.1.9, and Rails 4.2.

@camkidman
Copy link

Wanted to comment that I'm running into this issue as well. Doing any kind of .order doesn't seem to order the records how I want them. I'll get an example organized and see if I can dig into this too.

@camkidman
Copy link

camkidman commented May 31, 2018

Okay here's my test case so far with a new rails app:

Environment

(master ✘)✭ ᐅ ruby -v 
ruby 2.3.4p301 (2017-03-30 revision 58214) [x86_64-darwin16]
(master ✘)✭ ᐅ rails -v
Rails 5.1.6
(master ✘)✭ ᐅ grep 'elasticsearch' Gemfile.lock
    elasticsearch (5.0.5)
      elasticsearch-api (= 5.0.5)
      elasticsearch-transport (= 5.0.5)
    elasticsearch-api (5.0.5)
    elasticsearch-model (5.0.2)
      elasticsearch (~> 5)
    elasticsearch-rails (5.0.2)
    elasticsearch-transport (5.0.5)
  elasticsearch-model
  elasticsearch-rails
ᐅ psql -d ordering_development
psql (10.3)
Type "help" for help.

ordering_development=# SELECT version();
                                                    version
---------------------------------------------------------------------------------------------------------------
 PostgreSQL 10.3 on x86_64-apple-darwin17.3.0, compiled by Apple LLVM version 9.0.0 (clang-900.0.39.2), 64-bit
(1 row)

Test Setup

rails new ordering -d postgresql
cd ordering
rails g model Thing name:text
rails db:create
rails db:migrate
bundle install
# app/models/thing.rb
class Thing < ApplicationRecord
  include ::Elasticsearch::Model
end

# config/initializers/elasticsearch.rb
config = {
  hosts: { host: 'localhost', port: 9200, protocol: Rails.env.production? ? 'https' : 'http' }
}

Elasticsearch::Model.client = Elasticsearch::Client.new(config)

# Gemfile
# Bundle edge Rails instead: gem 'rails', github: 'rails/rails'
gem 'rails', '~> 5.1.6'
# Use postgresql as the database for Active Record
gem 'pg', '>= 0.18', '< 2.0'
# ...
gem 'elasticsearch-model'
gem 'elasticsearch-rails'

Now if I create some models, the index, do an import, and search on them...

10000.times do |i|
Thing.create(name: "aaabbb#{i}", created_at: rand(8.months).seconds.ago)
end

Thing.__elasticsearch__.create_index!
Thing.__elasticsearch__.import
Thing.search("aaa*", size: 200).records.order(created_at: :desc).map(&:created_at)
Thing Load (1.4ms)  SELECT "things".* FROM "things" WHERE "things"."id" IN (1001, 1004, 1006, 1007, 1013, 1032, 1033, 1035, 1036, 1044, 1045, 1052, 1054, 1055, 1061, 1064, 1069, 1073, 1075, 1082, 1084, 1087, 1089, 1099, 1102, 1110, 1115, 1123, 1124, 1128, 1131, 1136, 1145, 1146, 1147, 1156, 1159, 1160, 1161, 1164, 1167, 1169, 1172, 1174, 1185, 1210, 1212, 1219, 1230, 1231, 1246, 1249, 1251, 1254, 1255, 1262, 1267, 1274, 1275, 1283, 1305, 1307, 1309, 1313, 1314, 1315, 1316, 1320, 1323, 1325, 1326, 1327, 1330, 1333, 1343, 1345, 1349, 1350, 1351, 1353, 1359, 1361, 1364, 1369, 1371, 1379, 1381, 1385, 1388, 1392, 1397, 1400, 1403, 1409, 1414, 1419, 1422, 1430, 1432, 1435, 1444, 1448, 1450, 1451, 1455, 1462, 1463, 1478, 1479, 1480, 1484, 1494, 1498, 1501, 1502, 1503, 1509, 1514, 1517, 1523, 1526, 1531, 1535, 1536, 1542, 1544, 1550, 1559, 1561, 1563, 1564, 1566, 1570, 1579, 1588, 1589, 1590, 1591, 1592, 1594, 1606, 1607, 1612, 1614, 1620, 1633, 1635, 1645, 1647, 1648, 1664, 1678, 1687, 1692, 1696, 1699, 1703, 1719, 1724, 1725, 1728, 1734, 1735, 1737, 1740, 1743, 1754, 1756, 1760, 1762, 1769, 1781, 1784, 1789, 1796, 1802, 1810, 1814, 1829, 1838, 1840, 1843, 1844, 1850, 1852, 1859, 1866, 1868, 1874, 1876, 1877, 1878, 1879, 1890, 1893, 1898, 1899, 1900, 1908, 1915) ORDER BY "things"."created_at" DESC
[Mon, 05 Feb 2018 11:24:35 UTC +00:00,
 Mon, 16 Apr 2018 11:47:25 UTC +00:00,
 Sun, 20 May 2018 18:11:08 UTC +00:00,
 Wed, 27 Dec 2017 05:58:52 UTC +00:00,
 Thu, 30 Nov 2017 19:12:43 UTC +00:00,
 Mon, 11 Dec 2017 23:12:01 UTC +00:00,
 Mon, 21 May 2018 06:13:42 UTC +00:00,
 Thu, 02 Nov 2017 20:45:03 UTC +00:00,
 Sat, 11 Nov 2017 00:12:56 UTC +00:00,
 Fri, 04 May 2018 05:05:01 UTC +00:00,
 Mon, 02 Apr 2018 14:38:22 UTC +00:00,
# ...
]

However, if I copy + paste that SQL right into a psql session:

  id  |    name    |         created_at         |         updated_at
------+------------+----------------------------+----------------------------
 1703 | aaabbb1702 | 2018-05-28 03:53:45.73405  | 2018-05-31 18:21:49.735116
 1330 | aaabbb1329 | 2018-05-27 13:31:34.979919 | 2018-05-31 18:21:47.980966
 1592 | aaabbb1591 | 2018-05-26 22:33:22.161538 | 2018-05-31 18:21:49.162501
 1231 | aaabbb1230 | 2018-05-26 00:21:32.497358 | 2018-05-31 18:21:47.498465
 1740 | aaabbb1739 | 2018-05-25 11:54:18.872323 | 2018-05-31 18:21:49.873199
 1275 | aaabbb1274 | 2018-05-25 10:23:37.715866 | 2018-05-31 18:21:47.716802
 1353 | aaabbb1352 | 2018-05-21 11:18:30.066591 | 2018-05-31 18:21:48.06767
 1033 | aaabbb1032 | 2018-05-21 06:13:42.6024   | 2018-05-31 18:21:46.60328
 1462 | aaabbb1461 | 2018-05-20 19:28:09.49157  | 2018-05-31 18:21:48.492978
 1006 | aaabbb1005 | 2018-05-20 18:11:08.512187 | 2018-05-31 18:21:46.513139
 1480 | aaabbb1479 | 2018-05-20 01:05:35.609906 | 2018-05-31 18:21:48.610879
 1645 | aaabbb1644 | 2018-05-17 20:03:38.461354 | 2018-05-31 18:21:49.462322
...

Am I doing something wrong or is this an issue? I'd be happy to spend some time diving into the problem if it is indeed a problem.

@jazzytomato
Copy link
Contributor

jazzytomato commented Aug 29, 2018

@camkidman I am having a similar problem with those versions

rails (4.2.8)
pg (0.19.0)

    elasticsearch (5.0.3)
      elasticsearch-api (= 5.0.3)
      elasticsearch-transport (= 5.0.3)
    elasticsearch-api (5.0.3)
    elasticsearch-model (0.1.9)
      elasticsearch (> 0.4)
    elasticsearch-rails (0.1.9)
    elasticsearch-transport (5.0.3)
  elasticsearch-model
  elasticsearch-rails

@jazzytomato
Copy link
Contributor

I noticed that the order will be lost if the call to the method order isn't done directly on the records from elastic search.

Because of the monkey patch on .to_a here

# Re-order records based on the order from Elasticsearch hits
# by redefining `to_a`, unless the user has called `order()`
#
sql_records.instance_exec(response.response['hits']['hits']) do |hits|
define_singleton_method :to_a do
if defined?(::ActiveRecord) && ::ActiveRecord::VERSION::MAJOR >= 4
self.load
else
self.__send__(:exec_queries)
end
@records.sort_by { |record| hits.index { |hit| hit['_id'].to_s == record.id.to_s } }
end
end

and here

# Intercept call to the `order` method, so we can ignore the order from Elasticsearch
#
def order(*args)
sql_records = records.__send__ :order, *args
# Redefine the `to_a` method to the original one
#
sql_records.instance_exec do
define_singleton_method(:to_a) do
if defined?(::ActiveRecord) && ::ActiveRecord::VERSION::MAJOR >= 4
self.load
else
self.__send__(:exec_queries)
end
@records
end
end
sql_records
end
end

For example Thing.distinct.order('title asc') will call order on an ActiveRecord::Relation and the monkey patch will not be reverted.

However, Thing.order('title asc').distinct will work because order is sent to Elasticsearch::Model::Response::Records which is then delegated to the AR adapter.

Sometimes however, when chaining scopes and building complex AR relations, there is no simple way of just calling order('something') first.

The way I solved it was by removing the order right away by calling order('') when I know I will need to order later, e.g.

relation = relation.order('')
# ...
relation.scope1.scope2.order('something asc') 

@jazzytomato
Copy link
Contributor

jazzytomato commented Aug 30, 2018

For those using a more recent version of ActiveRecord, calling .to_a will fix the problem. Because the monkey patch on .to_a is removed but not the one on .records.

@camkidman in your test set up, this would fix it: Thing.search("aaa*", size: 200).records.order(created_at: :desc).to_a.map(&:created_at)

So there are two issue, one is fairly simple to fix (taking into account .records in the order method here). I could submit a fix if you are interested.

The other issue that I explained in my comment above is a bit more complicated, I think maybe the monkey patch shouldn't be implicit and provided as an option, i.e. options[:preserve_order], but that would mean a breaking change. Otherwise, we could have an option the other way around, i.e. options[:custom_order] that would prevent monkey patching.

@estolfo
Copy link
Contributor

estolfo commented Sep 14, 2018

For anyone following this ticket, I've opened this pull request, which resolves this issue. Can you please test it out and let me know if the changes do indeed solve the problem?

@estolfo
Copy link
Contributor

estolfo commented Sep 17, 2018

Closing, this is fixed by d92efd8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants