Add type annotations to Index and MultiIndex #1882

xinrong-meng · 2020-11-02T17:55:56Z

No description provided.

databricks/koalas/indexes.py

xinrong-meng · 2020-11-02T18:11:47Z

databricks/koalas/indexes.py

        index_map[i], index_map[j], = index_map[j], index_map[i]
        result = DataFrame(self._kdf._internal.copy(index_map=OrderedDict(index_map))).index
-        return result
+        return result  # type: ignore


Otherwise, Mypy expects "Index" as the return type.

xinrong-meng · 2020-11-02T18:11:56Z

databricks/koalas/indexes.py

        index_map = OrderedDict(zip(sdf.columns, names))
        internal = InternalFrame(spark_frame=sdf, index_map=index_map)
-        return DataFrame(internal).index
+        return DataFrame(internal).index  # type: ignore


Otherwise, Mypy expects "Index" as the return type.

Oh, so maybe it seems that mypy requires us to add a kind of abstract methods to the Index class ?

Maybe can we ignore that rule like we did for inplace parameter ?? 😲

Thanks @itholic, this is a good idea!

Unfortunately, I didn't find a parameter in https://mypy.readthedocs.io/en/stable/config_file.html can ignore this rule.

I do agree this should a common problem for other projects. Let me do more search and see if there is a better solution.

Yep, Sure :)

codecov-io · 2020-11-02T18:31:33Z

Codecov Report

Merging #1882 into master will decrease coverage by 2.88%.
The diff coverage is 96.20%.

@@            Coverage Diff             @@
##           master    #1882      +/-   ##
==========================================
- Coverage   94.20%   91.31%   -2.89%     
==========================================
  Files          40       40              
  Lines        9915     9833      -82     
==========================================
- Hits         9340     8979     -361     
- Misses        575      854     +279

Impacted Files	Coverage Δ
databricks/koalas/indexes.py	`95.04% <96.20%> (-2.02%)`	⬇️
databricks/koalas/usage_logging/__init__.py	`25.66% <0.00%> (-66.65%)`	⬇️
databricks/koalas/usage_logging/usage_logger.py	`47.82% <0.00%> (-52.18%)`	⬇️
databricks/koalas/__init__.py	`76.66% <0.00%> (-13.34%)`	⬇️
databricks/conftest.py	`88.13% <0.00%> (-11.87%)`	⬇️
databricks/koalas/spark/functions.py	`88.88% <0.00%> (-7.41%)`	⬇️
databricks/koalas/accessors.py	`86.13% <0.00%> (-6.94%)`	⬇️
databricks/koalas/namespace.py	`78.08% <0.00%> (-4.93%)`	⬇️
databricks/koalas/generic.py	`90.56% <0.00%> (-4.79%)`	⬇️
databricks/koalas/frame.py	`93.40% <0.00%> (-3.33%)`	⬇️
... and 14 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bb22748...dfd76ca. Read the comment docs.

databricks/koalas/indexes.py

itholic

Seems fine to me except one question

databricks/koalas/indexes.py

HyukjinKwon · 2020-11-03T05:22:51Z

databricks/koalas/indexes.py

        return isinstance(self.spark.data_type, IntegralType)

-    def item(self):
+    def item(self) -> Union[Scalar, Tuple]:


Actually using Scalar is strictly not correct because this can return a pandas instance too. For exmaple, pd.Timestamp which is not in the Scalar from my cursory reading. But I think it's okay - we can fix it later separately.

This makes sense. How about we define a pandas Scalar separately? @HyukjinKwon

databricks/koalas/indexes.py

HyukjinKwon · 2020-11-03T05:24:27Z

Looks pretty good