-
Notifications
You must be signed in to change notification settings - Fork 329
Support for Row type Ser/De and exposing the CreateDataFrame API #338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
imback82
merged 43 commits into
dotnet:master
from
Niharikadutta:nidutta/RowTypeSerDeSupport
Jan 10, 2020
Merged
Changes from 9 commits
Commits
Show all changes
43 commits
Select commit
Hold shift + click to select a range
044a01e
Initial commit for Row Ser/De and exposing CreateDataFrame
Niharikadutta b1d8dfc
Reverting changes to Basic.cs
Niharikadutta 5c7dd03
Merge branch 'master' into nidutta/RowTypeSerDeSupport
Niharikadutta b6b82be
Added example in Basic.cs about how to use CreateDataFrame
Niharikadutta ae31dd2
Removing unneccessary comments
Niharikadutta 9546229
Removing whitespace
Niharikadutta ca094d7
Added GenericRow support and some tests, removed comments and added C…
Niharikadutta 3c58aa7
Exposing GenericRow tests and cleaned up comments.
Niharikadutta 74ff143
Made GenericRow as internal
Niharikadutta 236ea30
PR review comments removing whitespaces and commented out functions
Niharikadutta 520528f
exposed CreateDataFrame API that does not take in a schema but infers…
Niharikadutta 1ca7a6c
PR review comments - 2
Niharikadutta 25deb3d
Update src/csharp/Microsoft.Spark/Sql/GenericRow.cs
Niharikadutta ac8a27f
PR review comment changes - 3
Niharikadutta fedff8d
merging latest from repo
Niharikadutta bac8899
Removed commented code
Niharikadutta b754147
Using composition in Row to avoid duplication with GenericRow
Niharikadutta 3c3e677
Changed Row Ser/De to utilize existing framework and added jvm create…
Niharikadutta 499fc47
Merge branch 'master' into nidutta/RowTypeSerDeSupport
Niharikadutta 66f26c9
Fixed indentation and added E2E tests for CreateDataFrame
Niharikadutta 8943ca8
PR review changes
Niharikadutta 67f1375
Added logic to test properties of created DataFrame in test
Niharikadutta bb011d7
PR review changes
Niharikadutta f3eec4e
Sorted usings
Niharikadutta 56dd332
Update src/csharp/Microsoft.Spark.E2ETest/IpcTests/Sql/SparkSessionTe…
Niharikadutta aa4913c
Update src/csharp/Microsoft.Spark.E2ETest/IpcTests/Sql/SparkSessionTe…
Niharikadutta 68145a0
Update src/csharp/Microsoft.Spark.E2ETest/IpcTests/Sql/SparkSessionTe…
Niharikadutta a940b9a
Update src/csharp/Microsoft.Spark.E2ETest/IpcTests/Sql/SparkSessionTe…
Niharikadutta e1a46fc
PR review changes
Niharikadutta 6bf02d8
Update src/csharp/Microsoft.Spark/Sql/SparkSession.cs
Niharikadutta bc8029a
Update src/csharp/Microsoft.Spark/Sql/SparkSession.cs
Niharikadutta 7c5bce6
Update src/csharp/Microsoft.Spark/Sql/SparkSession.cs
Niharikadutta fcc0a13
Update src/csharp/Microsoft.Spark/Sql/SparkSession.cs
Niharikadutta fae4b2f
PR review changes - exposed CreateDataFrame for string and int withou…
Niharikadutta 6a4bd4d
Made changes as per PR review comments
Niharikadutta 4f5bfff
PR review comments changes
Niharikadutta 09bdf4d
PR review comment changes
Niharikadutta 149044a
PR review comments changes
Niharikadutta e281692
PR review changes
Niharikadutta ea84838
PR review changes
Niharikadutta defe8da
Merge branch 'master' into nidutta/RowTypeSerDeSupport
Niharikadutta c12a576
update
Niharikadutta 7a45963
Merge branch 'nidutta/RowTypeSerDeSupport' of github.com:Niharikadutt…
Niharikadutta File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
44 changes: 44 additions & 0 deletions
44
src/csharp/Microsoft.Spark.UnitTest/Sql/GenericRowTests.cs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
// Licensed to the .NET Foundation under one or more agreements. | ||
Niharikadutta marked this conversation as resolved.
Show resolved
Hide resolved
|
||
// The .NET Foundation licenses this file to you under the MIT license. | ||
// See the LICENSE file in the project root for more information. | ||
|
||
using System; | ||
using System.Collections.Generic; | ||
using System.IO; | ||
using System.Linq; | ||
using Microsoft.Spark.Interop.Ipc; | ||
using Microsoft.Spark.Network; | ||
using Microsoft.Spark.Sql; | ||
using Microsoft.Spark.Sql.Types; | ||
using Microsoft.Spark.UnitTest.TestUtils; | ||
using Microsoft.Spark.Utils; | ||
using Moq; | ||
using Razorvine.Pickle; | ||
using Xunit; | ||
Niharikadutta marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
namespace Microsoft.Spark.UnitTest | ||
{ | ||
public class GenericRowTests | ||
{ | ||
[Fact] | ||
public void GenericRowTest() | ||
Niharikadutta marked this conversation as resolved.
Show resolved
Hide resolved
|
||
{ | ||
var row = new GenericRow(new object[] { 1, "abc" }); | ||
|
||
// Validate Size(). | ||
Assert.Equal(2, row.Size()); | ||
|
||
// Validate [] operator. | ||
Assert.Equal(1, row[0]); | ||
Assert.Equal("abc", row[1]); | ||
|
||
// Validate Get*(int). | ||
Assert.Equal(1, row.Get(0)); | ||
Assert.Equal("abc", row.Get(1)); | ||
Assert.Equal(1, row.GetAs<int>(0)); | ||
Assert.ThrowsAny<Exception>(() => row.GetAs<string>(0)); | ||
Assert.Equal("abc", row.GetAs<string>(1)); | ||
Assert.ThrowsAny<Exception>(() => row.GetAs<int>(1)); | ||
} | ||
} | ||
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -8,6 +8,7 @@ | |
using System.Collections.Generic; | ||
using System.IO; | ||
using System.Linq; | ||
using Microsoft.Spark.Sql; | ||
|
||
namespace Microsoft.Spark.Interop.Ipc | ||
{ | ||
|
@@ -25,6 +26,7 @@ internal class PayloadHelper | |
private static readonly byte[] s_byteArrayTypeId = new[] { (byte)'r' }; | ||
private static readonly byte[] s_intArrayTypeId = new[] { (byte)'l' }; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure why this is called |
||
private static readonly byte[] s_dictionaryTypeId = new[] { (byte)'e' }; | ||
private static readonly byte[] s_rowArrTypeId = new[] { (byte)'R' }; | ||
Niharikadutta marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
private static readonly ConcurrentDictionary<Type, bool> s_isDictionaryTable = | ||
new ConcurrentDictionary<Type, bool>(); | ||
|
@@ -231,6 +233,15 @@ internal static void ConvertArgsToBytes( | |
SerDe.Write(destination, argProvider.Reference.Id); | ||
break; | ||
|
||
case IEnumerable<GenericRow> argRowArray: | ||
SerDe.Write(destination, (int)argRowArray.Count()); | ||
foreach (GenericRow r in argRowArray) | ||
{ | ||
SerDe.Write(destination, (int)r.Values.Length); | ||
ConvertArgsToBytes(destination, r.Values, true); | ||
} | ||
break; | ||
|
||
default: | ||
throw new NotSupportedException( | ||
string.Format($"Type {arg.GetType()} is not supported")); | ||
|
@@ -283,6 +294,11 @@ internal static byte[] GetTypeId(Type type) | |
{ | ||
return s_intArrayTypeId; | ||
} | ||
|
||
if (type == typeof(IEnumerable<GenericRow>)) | ||
{ | ||
return s_rowArrTypeId; | ||
} | ||
break; | ||
} | ||
|
||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,168 @@ | ||
// Licensed to the .NET Foundation under one or more agreements. | ||
// The .NET Foundation licenses this file to you under the MIT license. | ||
// See the LICENSE file in the project root for more information. | ||
|
||
using System; | ||
using System.Collections.Generic; | ||
using System.Linq; | ||
using Microsoft.Spark.Sql.Types; | ||
Niharikadutta marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
namespace Microsoft.Spark.Sql | ||
{ | ||
/// <summary> | ||
/// Represents a row object in RDD, equivalent to GenericRow in Spark. | ||
/// </summary> | ||
public sealed class GenericRow | ||
Niharikadutta marked this conversation as resolved.
Show resolved
Hide resolved
|
||
{ | ||
/// <summary> | ||
/// Constructor for the Row class. | ||
Niharikadutta marked this conversation as resolved.
Show resolved
Hide resolved
|
||
/// </summary> | ||
/// <param name="values">Column values for a row</param> | ||
internal GenericRow(object[] values) | ||
{ | ||
Values = values; | ||
//TODO: | ||
//Convert() -> implement type checking for not implemented exception | ||
} | ||
|
||
/// <summary> | ||
/// Values representing this row. | ||
/// </summary> | ||
public object[] Values { get; } | ||
|
||
/// <summary> | ||
/// Returns the number of columns in this row. | ||
/// </summary> | ||
/// <returns>Number of columns in this row</returns> | ||
public int Size() => Values.Length; | ||
|
||
/// <summary> | ||
/// Returns the column value at the given index. | ||
/// </summary> | ||
/// <param name="index">Index to look up</param> | ||
/// <returns>A column value</returns> | ||
public object this[int index] => Get(index); | ||
|
||
/// <summary> | ||
/// Returns the column value at the given index. | ||
/// </summary> | ||
/// <param name="index">Index to look up</param> | ||
/// <returns>A column value</returns> | ||
public object Get(int index) | ||
{ | ||
if (index >= Size()) | ||
{ | ||
throw new IndexOutOfRangeException($"index ({index}) >= column counts ({Size()})"); | ||
} | ||
else if (index < 0) | ||
{ | ||
throw new IndexOutOfRangeException($"index ({index}) < 0)"); | ||
} | ||
|
||
return Values[index]; | ||
} | ||
|
||
///// <summary> | ||
///// Returns the column value whose column name is given. | ||
///// </summary> | ||
///// <param name="columnName">Column name to look up</param> | ||
///// <returns>A column value</returns> | ||
//public object Get(string columnName) => | ||
Niharikadutta marked this conversation as resolved.
Show resolved
Hide resolved
|
||
// Get(Schema.Fields.FindIndex(f => f.Name == columnName)); | ||
|
||
/// <summary> | ||
/// Returns the string version of this row. | ||
/// </summary> | ||
/// <returns>String version of this row</returns> | ||
public override string ToString() | ||
{ | ||
var cols = new List<string>(); | ||
foreach (object item in Values) | ||
{ | ||
cols.Add(item?.ToString() ?? string.Empty); | ||
} | ||
|
||
return $"[{(string.Join(",", cols.ToArray()))}]"; | ||
} | ||
|
||
/// <summary> | ||
/// Returns the column value at the given index, as a type T. | ||
/// TODO: If the original type is "long" and its value can be | ||
/// fit into the "int", Pickler will serialize the value as int. | ||
/// Since the value is boxed, <see cref="GetAs{T}(int)"/> will throw an exception. | ||
/// </summary> | ||
/// <typeparam name="T">Type to convert to</typeparam> | ||
/// <param name="index">Index to look up</param> | ||
/// <returns>A column value as a type T</returns> | ||
public T GetAs<T>(int index) => (T)Get(index); | ||
|
||
///// <summary> | ||
///// Returns the column value whose column name is given, as a type T. | ||
///// TODO: If the original type is "long" and its value can be | ||
///// fit into the "int", Pickler will serialize the value as int. | ||
///// Since the value is boxed, <see cref="GetAs{T}(string)"/> will throw an exception. | ||
///// </summary> | ||
///// <typeparam name="T">Type to convert to</typeparam> | ||
///// <param name="columnName">Column name to look up</param> | ||
///// <returns>A column value as a type T</returns> | ||
//public T GetAs<T>(string columnName) => (T)Get(columnName); | ||
Niharikadutta marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
/// <summary> | ||
/// Checks if the given object is same as the current object. | ||
/// </summary> | ||
/// <param name="obj">Other object to compare against</param> | ||
/// <returns>True if the other object is equal.</returns> | ||
public override bool Equals(object obj) | ||
{ | ||
if (obj is null) | ||
{ | ||
return false; | ||
} | ||
|
||
if (ReferenceEquals(this, obj)) | ||
{ | ||
return true; | ||
} | ||
|
||
if (obj is Row otherRow) | ||
Niharikadutta marked this conversation as resolved.
Show resolved
Hide resolved
|
||
{ | ||
return Values.SequenceEqual(otherRow.Values); | ||
} | ||
|
||
return false; | ||
Niharikadutta marked this conversation as resolved.
Show resolved
Hide resolved
|
||
} | ||
|
||
/// <summary> | ||
/// Returns the hash code of the current object. | ||
/// </summary> | ||
/// <returns>The hash code of the current object</returns> | ||
public override int GetHashCode() => base.GetHashCode(); | ||
|
||
//TODO: | ||
Niharikadutta marked this conversation as resolved.
Show resolved
Hide resolved
|
||
///// <summary> | ||
///// Converts the values to .NET values. Currently, only the simple types such as | ||
///// int, string, etc. are supported (which are already converted correctly by | ||
///// the Pickler). Note that explicit type checks against the schema are not performed. | ||
///// </summary> | ||
//private void Convert() | ||
//{ | ||
// foreach (object val in Values) | ||
// { | ||
// TypeCode valType = Type.GetTypeCode(val.GetType()); | ||
// if (valType == TypeCode.Object) | ||
// { | ||
// switch (valType) | ||
// { | ||
// case object[]: | ||
// SerDe.Write(destination, (int)arg); | ||
// break; | ||
|
||
// case TypeCode.Int64: | ||
// SerDe.Write(destination, (long)arg); | ||
// break; | ||
// } | ||
// } | ||
// } | ||
//} | ||
} | ||
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.