Skip to content

ValueError: Length of values (1) does not match length of index with sc.pp.calculate_qc_metrics(adata) #2008

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
pchiang5 opened this issue Oct 1, 2021 · 10 comments
Assignees
Labels
Needs info❔ More information needed Upstream

Comments

@pchiang5
Copy link

pchiang5 commented Oct 1, 2021

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of scanpy.
  • (optional) I have confirmed this bug exists on the master branch of scanpy.

I also tried 'log1p = False' and produced the other error. Thank you.

sc.pp.calculate_qc_metrics(adata)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/root/miniconda3/envs/scanpy/lib/python3.9/site-packages/scanpy/preprocessing/_qc.py", line 294, in calculate_qc_metrics
    obs_metrics = describe_obs(
  File "/root/miniconda3/envs/scanpy/lib/python3.9/site-packages/scanpy/preprocessing/_qc.py", line 111, in describe_obs
    obs_metrics[f"log1p_total_{expr_type}"] = np.log1p(
  File "/root/miniconda3/envs/scanpy/lib/python3.9/site-packages/pandas/core/frame.py", line 3612, in __setitem__
    self._set_item(key, value)
  File "/root/miniconda3/envs/scanpy/lib/python3.9/site-packages/pandas/core/frame.py", line 3784, in _set_item
    value = self._sanitize_column(value)
  File "/root/miniconda3/envs/scanpy/lib/python3.9/site-packages/pandas/core/frame.py", line 4509, in _sanitize_column
    com.require_length_match(value, self.index)
  File "/root/miniconda3/envs/scanpy/lib/python3.9/site-packages/pandas/core/common.py", line 531, in require_length_match
    raise ValueError(
ValueError: Length of values (1) does not match length of index (35255)
>

sc.pp.calculate_qc_metrics(adata, log1p = False)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/root/miniconda3/envs/scanpy/lib/python3.9/site-packages/pandas/core/frame.py", line 995, in __repr__
    self.to_string(
  File "/root/miniconda3/envs/scanpy/lib/python3.9/site-packages/pandas/core/frame.py", line 1131, in to_string
    return fmt.DataFrameRenderer(formatter).to_string(
  File "/root/miniconda3/envs/scanpy/lib/python3.9/site-packages/pandas/io/formats/format.py", line 1053, in to_string
    string = string_formatter.to_string()
  File "/root/miniconda3/envs/scanpy/lib/python3.9/site-packages/pandas/io/formats/string.py", line 25, in to_string
    text = self._get_string_representation()
  File "/root/miniconda3/envs/scanpy/lib/python3.9/site-packages/pandas/io/formats/string.py", line 40, in _get_string_representation
    strcols = self._get_strcols()
  File "/root/miniconda3/envs/scanpy/lib/python3.9/site-packages/pandas/io/formats/string.py", line 31, in _get_strcols
    strcols = self.fmt.get_strcols()
  File "/root/miniconda3/envs/scanpy/lib/python3.9/site-packages/pandas/io/formats/format.py", line 540, in get_strcols
    strcols = self._get_strcols_without_index()
  File "/root/miniconda3/envs/scanpy/lib/python3.9/site-packages/pandas/io/formats/format.py", line 804, in _get_strcols_without_index
    fmt_values = self.format_col(i)
  File "/root/miniconda3/envs/scanpy/lib/python3.9/site-packages/pandas/io/formats/format.py", line 818, in format_col
    return format_array(
  File "/root/miniconda3/envs/scanpy/lib/python3.9/site-packages/pandas/io/formats/format.py", line 1240, in format_array
    return fmt_obj.get_result()
  File "/root/miniconda3/envs/scanpy/lib/python3.9/site-packages/pandas/io/formats/format.py", line 1271, in get_result
    fmt_values = self._format_strings()
  File "/root/miniconda3/envs/scanpy/lib/python3.9/site-packages/pandas/io/formats/format.py", line 1518, in _format_strings
    return list(self.get_result_as_array())
  File "/root/miniconda3/envs/scanpy/lib/python3.9/site-packages/pandas/io/formats/format.py", line 1482, in get_result_as_array
    formatted_values = format_values_with(float_format)
  File "/root/miniconda3/envs/scanpy/lib/python3.9/site-packages/pandas/io/formats/format.py", line 1456, in format_values_with
    values = format_with_na_rep(values, formatter, na_rep)
  File "/root/miniconda3/envs/scanpy/lib/python3.9/site-packages/pandas/io/formats/format.py", line 1427, in format_with_na_rep
    [
  File "/root/miniconda3/envs/scanpy/lib/python3.9/site-packages/pandas/io/formats/format.py", line 1428, in <listcomp>
    formatter(val) if not m else na_rep
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Versions

WARNING: If you miss a compact list, please try print_header!

anndata 0.7.6
scanpy 1.7.2
sinfo 0.3.1

PIL 8.3.2
anndata 0.7.6
beta_ufunc NA
binom_ufunc NA
cffi 1.14.6
colorama 0.4.4
concurrent NA
cycler 0.10.0
cython_runtime NA
dateutil 2.8.2
dunamai 1.6.0
encodings NA
genericpath NA
get_version 3.5
h5py 3.4.0
joblib 1.0.1
kiwisolver 1.3.2
legacy_api_wrap 0.0.0
llvmlite 0.37.0
matplotlib 3.4.3
mpl_toolkits NA
natsort 7.1.1
nbinom_ufunc NA
ntpath NA
numba 0.54.0
numexpr 2.7.3
numpy 1.20.3
opcode NA
packaging 21.0
pandas 1.3.3
pkg_resources NA
posixpath NA
pycparser 2.20
pyexpat NA
pyparsing 2.4.7
pytz 2021.1
scanpy 1.7.2
scipy 1.7.1
setuptools_scm NA
sinfo 0.3.1
six 1.16.0
sklearn 1.0
sphinxcontrib NA
sre_compile NA
sre_constants NA
sre_parse NA
tables 3.6.1

Python 3.9.7 | packaged by conda-forge | (default, Sep 29 2021, 19:20:46) [GCC 9.4.0]
Linux-5.4.72-microsoft-standard-WSL2-x86_64-with-glibc2.31
24 logical CPU cores, x86_64

Session information updated at 2021-10-01 14:56

@mtvector
Copy link

mtvector commented Oct 8, 2021

I am suddenly having a similar problem as well, in addition to the other issue I raised...

@sfortma2
Copy link

sfortma2 commented Oct 16, 2021

I'm also suddenly having this problem with "ValueError: Length of values (1) does not match length of index()" for certain Scanpy functions like sc.pl.scatter(adata, 'n_counts', 'n_genes', color='mt_frac') and numpy functions adata.obs['log_counts'] = np.log(adata.obs['n_counts']). The error is not due to a problem with my adata file because it reproduces with datasets that were previously error-free.

@ivirshup
Copy link
Member

Could you update your version of scanpy and see if the issue persists? I believe this issue was an incompatibility with the 1.3.0 release of pandas pandas-dev/pandas#42376, which was fixed for scanpy 1.8.1 (#1917)

@ivirshup ivirshup added the Needs info❔ More information needed label Oct 19, 2021
@ivirshup ivirshup self-assigned this Oct 19, 2021
@mtvector
Copy link

I had this issue on 1.8.1 with pandas 1.3.3

@michalk8
Copy link
Contributor

I've encountered similar issue last week and it was because adata.obs contained a column which was n_obs x 1 scipy.sparse.spmatrix. The below code reproduces the formatter issue (pandas==1.3.3):

import scanpy as sc
from scipy.sparse import csr_matrix

adata = sc.datasets.pbmc3k()
adata.X = csr_matrix(adata.X)
adata.obs['total_counts'] = adata.X.sum(1)  # is sparse, pandas doesn't complain
adata.obs  # raises the formatter error
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~/.miniconda3/envs/cellrank/lib/python3.8/site-packages/IPython/core/formatters.py in __call__(self, obj)
    700                 type_pprinters=self.type_printers,
    701                 deferred_pprinters=self.deferred_printers)
--> 702             printer.pretty(obj)
    703             printer.flush()
    704             return stream.getvalue()

~/.miniconda3/envs/cellrank/lib/python3.8/site-packages/IPython/lib/pretty.py in pretty(self, obj)
    392                         if cls is not object \
    393                                 and callable(cls.__dict__.get('__repr__')):
--> 394                             return _repr_pprint(obj, self, cycle)
    395 
    396             return _default_pprint(obj, self, cycle)

~/.miniconda3/envs/cellrank/lib/python3.8/site-packages/IPython/lib/pretty.py in _repr_pprint(obj, p, cycle)
    698     """A pprint that just redirects to the normal repr function."""
    699     # Find newlines and replace them with p.break_()
--> 700     output = repr(obj)
    701     lines = output.splitlines()
    702     with p.group():

~/.miniconda3/envs/cellrank/lib/python3.8/site-packages/pandas/core/frame.py in __repr__(self)
    993         else:
    994             width = None
--> 995         self.to_string(
    996             buf=buf,
    997             max_rows=max_rows,

~/.miniconda3/envs/cellrank/lib/python3.8/site-packages/pandas/core/frame.py in to_string(self, buf, columns, col_space, header, index, na_rep, formatters, float_format, sparsify, index_names, justify, max_rows, min_rows, max_cols, show_dimensions, decimal, line_width, max_colwidth, encoding)
   1129                 decimal=decimal,
   1130             )
-> 1131             return fmt.DataFrameRenderer(formatter).to_string(
   1132                 buf=buf,
   1133                 encoding=encoding,

~/.miniconda3/envs/cellrank/lib/python3.8/site-packages/pandas/io/formats/format.py in to_string(self, buf, encoding, line_width)
   1051 
   1052         string_formatter = StringFormatter(self.fmt, line_width=line_width)
-> 1053         string = string_formatter.to_string()
   1054         return save_to_buffer(string, buf=buf, encoding=encoding)
   1055 

~/.miniconda3/envs/cellrank/lib/python3.8/site-packages/pandas/io/formats/string.py in to_string(self)
     23 
     24     def to_string(self) -> str:
---> 25         text = self._get_string_representation()
     26         if self.fmt.should_show_dimensions:
     27             text = "".join([text, self.fmt.dimensions_info])

~/.miniconda3/envs/cellrank/lib/python3.8/site-packages/pandas/io/formats/string.py in _get_string_representation(self)
     38             return self._empty_info_line
     39 
---> 40         strcols = self._get_strcols()
     41 
     42         if self.line_width is None:

~/.miniconda3/envs/cellrank/lib/python3.8/site-packages/pandas/io/formats/string.py in _get_strcols(self)
     29 
     30     def _get_strcols(self) -> list[list[str]]:
---> 31         strcols = self.fmt.get_strcols()
     32         if self.fmt.is_truncated:
     33             strcols = self._insert_dot_separators(strcols)

~/.miniconda3/envs/cellrank/lib/python3.8/site-packages/pandas/io/formats/format.py in get_strcols(self)
    538         Render a DataFrame to a list of columns (as lists of strings).
    539         """
--> 540         strcols = self._get_strcols_without_index()
    541 
    542         if self.index:

~/.miniconda3/envs/cellrank/lib/python3.8/site-packages/pandas/io/formats/format.py in _get_strcols_without_index(self)
    802                 int(self.col_space.get(c, 0)), *(self.adj.len(x) for x in cheader)
    803             )
--> 804             fmt_values = self.format_col(i)
    805             fmt_values = _make_fixed_width(
    806                 fmt_values, self.justify, minimum=header_colwidth, adj=self.adj

~/.miniconda3/envs/cellrank/lib/python3.8/site-packages/pandas/io/formats/format.py in format_col(self, i)
    816         frame = self.tr_frame
    817         formatter = self._get_formatter(i)
--> 818         return format_array(
    819             frame.iloc[:, i]._values,
    820             formatter,

~/.miniconda3/envs/cellrank/lib/python3.8/site-packages/pandas/io/formats/format.py in format_array(values, formatter, float_format, na_rep, digits, space, justify, decimal, leading_space, quoting)
   1238     )
   1239 
-> 1240     return fmt_obj.get_result()
   1241 
   1242 

~/.miniconda3/envs/cellrank/lib/python3.8/site-packages/pandas/io/formats/format.py in get_result(self)
   1269 
   1270     def get_result(self) -> list[str]:
-> 1271         fmt_values = self._format_strings()
   1272         return _make_fixed_width(fmt_values, self.justify)
   1273 

~/.miniconda3/envs/cellrank/lib/python3.8/site-packages/pandas/io/formats/format.py in _format_strings(self)
   1516 
   1517     def _format_strings(self) -> list[str]:
-> 1518         return list(self.get_result_as_array())
   1519 
   1520 

~/.miniconda3/envs/cellrank/lib/python3.8/site-packages/pandas/io/formats/format.py in get_result_as_array(self)
   1480             float_format = lambda value: self.float_format % value
   1481 
-> 1482         formatted_values = format_values_with(float_format)
   1483 
   1484         if not self.fixed_width:

~/.miniconda3/envs/cellrank/lib/python3.8/site-packages/pandas/io/formats/format.py in format_values_with(float_format)
   1454             values = self.values
   1455             is_complex = is_complex_dtype(values)
-> 1456             values = format_with_na_rep(values, formatter, na_rep)
   1457 
   1458             if self.fixed_width:

~/.miniconda3/envs/cellrank/lib/python3.8/site-packages/pandas/io/formats/format.py in format_with_na_rep(values, formatter, na_rep)
   1425             mask = isna(values)
   1426             formatted = np.array(
-> 1427                 [
   1428                     formatter(val) if not m else na_rep
   1429                     for val, m in zip(values.ravel(), mask.ravel())

~/.miniconda3/envs/cellrank/lib/python3.8/site-packages/pandas/io/formats/format.py in <listcomp>(.0)
   1426             formatted = np.array(
   1427                 [
-> 1428                     formatter(val) if not m else na_rep
   1429                     for val, m in zip(values.ravel(), mask.ravel())
   1430                 ]

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~/.miniconda3/envs/cellrank/lib/python3.8/site-packages/IPython/core/formatters.py in __call__(self, obj)
    343             method = get_real_method(obj, self.print_method)
    344             if method is not None:
--> 345                 return method()
    346             return None
    347         else:

~/.miniconda3/envs/cellrank/lib/python3.8/site-packages/pandas/core/frame.py in _repr_html_(self)
   1045                 decimal=".",
   1046             )
-> 1047             return fmt.DataFrameRenderer(formatter).to_html(notebook=True)
   1048         else:
   1049             return None

~/.miniconda3/envs/cellrank/lib/python3.8/site-packages/pandas/io/formats/format.py in to_html(self, buf, encoding, classes, notebook, border, table_id, render_links)
   1027             render_links=render_links,
   1028         )
-> 1029         string = html_formatter.to_string()
   1030         return save_to_buffer(string, buf=buf, encoding=encoding)
   1031 

~/.miniconda3/envs/cellrank/lib/python3.8/site-packages/pandas/io/formats/html.py in to_string(self)
     70 
     71     def to_string(self) -> str:
---> 72         lines = self.render()
     73         if any(isinstance(x, str) for x in lines):
     74             lines = [str(x) for x in lines]

~/.miniconda3/envs/cellrank/lib/python3.8/site-packages/pandas/io/formats/html.py in render(self)
    619         self.write("<div>")
    620         self.write_style()
--> 621         super().render()
    622         self.write("</div>")
    623         return self.elements

~/.miniconda3/envs/cellrank/lib/python3.8/site-packages/pandas/io/formats/html.py in render(self)
     76 
     77     def render(self) -> list[str]:
---> 78         self._write_table()
     79 
     80         if self.should_show_dimensions:

~/.miniconda3/envs/cellrank/lib/python3.8/site-packages/pandas/io/formats/html.py in _write_table(self, indent)
    246             self._write_header(indent + self.indent_delta)
    247 
--> 248         self._write_body(indent + self.indent_delta)
    249 
    250         self.write("</table>", indent)

~/.miniconda3/envs/cellrank/lib/python3.8/site-packages/pandas/io/formats/html.py in _write_body(self, indent)
    393     def _write_body(self, indent: int) -> None:
    394         self.write("<tbody>", indent)
--> 395         fmt_values = self._get_formatted_values()
    396 
    397         # write values

~/.miniconda3/envs/cellrank/lib/python3.8/site-packages/pandas/io/formats/html.py in _get_formatted_values(self)
    583 
    584     def _get_formatted_values(self) -> dict[int, list[str]]:
--> 585         return {i: self.fmt.format_col(i) for i in range(self.ncols)}
    586 
    587     def _get_columns_formatted_values(self) -> list[str]:

~/.miniconda3/envs/cellrank/lib/python3.8/site-packages/pandas/io/formats/html.py in <dictcomp>(.0)
    583 
    584     def _get_formatted_values(self) -> dict[int, list[str]]:
--> 585         return {i: self.fmt.format_col(i) for i in range(self.ncols)}
    586 
    587     def _get_columns_formatted_values(self) -> list[str]:

~/.miniconda3/envs/cellrank/lib/python3.8/site-packages/pandas/io/formats/format.py in format_col(self, i)
    816         frame = self.tr_frame
    817         formatter = self._get_formatter(i)
--> 818         return format_array(
    819             frame.iloc[:, i]._values,
    820             formatter,

~/.miniconda3/envs/cellrank/lib/python3.8/site-packages/pandas/io/formats/format.py in format_array(values, formatter, float_format, na_rep, digits, space, justify, decimal, leading_space, quoting)
   1238     )
   1239 
-> 1240     return fmt_obj.get_result()
   1241 
   1242 

~/.miniconda3/envs/cellrank/lib/python3.8/site-packages/pandas/io/formats/format.py in get_result(self)
   1269 
   1270     def get_result(self) -> list[str]:
-> 1271         fmt_values = self._format_strings()
   1272         return _make_fixed_width(fmt_values, self.justify)
   1273 

~/.miniconda3/envs/cellrank/lib/python3.8/site-packages/pandas/io/formats/format.py in _format_strings(self)
   1516 
   1517     def _format_strings(self) -> list[str]:
-> 1518         return list(self.get_result_as_array())
   1519 
   1520 

~/.miniconda3/envs/cellrank/lib/python3.8/site-packages/pandas/io/formats/format.py in get_result_as_array(self)
   1480             float_format = lambda value: self.float_format % value
   1481 
-> 1482         formatted_values = format_values_with(float_format)
   1483 
   1484         if not self.fixed_width:

~/.miniconda3/envs/cellrank/lib/python3.8/site-packages/pandas/io/formats/format.py in format_values_with(float_format)
   1454             values = self.values
   1455             is_complex = is_complex_dtype(values)
-> 1456             values = format_with_na_rep(values, formatter, na_rep)
   1457 
   1458             if self.fixed_width:

~/.miniconda3/envs/cellrank/lib/python3.8/site-packages/pandas/io/formats/format.py in format_with_na_rep(values, formatter, na_rep)
   1425             mask = isna(values)
   1426             formatted = np.array(
-> 1427                 [
   1428                     formatter(val) if not m else na_rep
   1429                     for val, m in zip(values.ravel(), mask.ravel())

~/.miniconda3/envs/cellrank/lib/python3.8/site-packages/pandas/io/formats/format.py in <listcomp>(.0)
   1426             formatted = np.array(
   1427                 [
-> 1428                     formatter(val) if not m else na_rep
   1429                     for val, m in zip(values.ravel(), mask.ravel())
   1430                 ]

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

@ivirshup
Copy link
Member

With pandas 1.3.4 and 1.3.3

  • I can't replicate the initial issue
  • I can replicate @michalk8's example

This looks very upstream in pandas. I will try and submit an issue/ check that this hasn't been reported to pandas already tomorrow. This may be a kinda easy fix (e.g. check value shape better during column assignment in pandas), but it can take a bit to figure out how to fix things there.

AFAIK, we removed calls in scanpy which assigned (n x 1) matrices to pandas because of related, non-formatting error.

Is the current scanpy release assigning these matrices anywhere?

@sfortma2
Copy link

Many thanks for everyone's input. The bug is indeed due to an issue with Pandas ≥1.3. I am running Scanpy 1.8.1 and I can confirm that the indexing problem remains with Pandas 1.3.0, 1.3.2, and the latest 1.3.4, but resolves when downgrading to 1.2.5

@Defphoenix
Copy link

thanks for everyone's input. I tried to solve this problem by downgrading pandas to 1.1.5. the cause of this problem may be that in python 3.9 and above, pandas modifies the matrix function

@ivirshup
Copy link
Member

Opened a PR to pandas which should hopefully fix this: pandas-dev/pandas#42376

@ivirshup
Copy link
Member

My PR was merged, so this should be resolved with the next version of pandas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs info❔ More information needed Upstream
Projects
None yet
Development

No branches or pull requests

6 participants