Skip to content

HDFS-17316 Add hadoop-compat-bench #6535

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 58 additions & 0 deletions hadoop-tools/hadoop-compat-bench/HdfsCompatBenchIssue.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
<!---
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->

# Compatibility Benchmark over HCFS Implementations

## <a name="Background"></a> Background

Hadoop-Compatible File System (HCFS) is a core conception in big data storage ecosystem,
providing unified interfaces and generally clear semantics,
and has become the de-factor standard for industry storage systems to follow and conform with.
There have been a series of HCFS implementations in Hadoop,
such as S3AFileSystem for Amazon's S3 Object Store,
WASB for Microsoft's Azure Blob Storage and OSS connector for Alibaba Cloud Object Storage,
and more from storage service's providers on their own.

## <a name="Problems"></a> Problems

However, as indicated by [`HCFS Introduction`](hadoop-common-project/hadoop-common/src/site/markdown/filesystem/introduction.md),
there is no formal suite to do compatibility assessment of a file system for all such HCFS implementations.
Thus, whether the functionality is well accomplished and meets the core compatible expectations
mainly relies on service provider's own report.
Meanwhile, Hadoop is also developing and new features are continuously contributing to HCFS interfaces
for existing implementations to follow and update, in which case,
Hadoop also needs a tool to quickly assess if these features are supported or not for a specific HCFS implementation.
Besides, the known hadoop command line tool or hdfs shell is used to directly interact with a HCFS storage system,
where most commands correspond to specific HCFS interfaces and work well.
Still, there are cases that are complicated and may not work, like expunge command.
To check such commands for an HCFS, we also need an approach to figure them out.

## <a name="Proposal"></a> Proposal

Accordingly, we propose to define a formal HCFS compatibility benchmark and provide corresponding tool
to do the compatibility assessment for an HCFS storage system.
The benchmark and tool should consider both HCFS interfaces and hdfs shell commands.
Different scenarios require different kinds of compatibilities.
For such consideration, we could define different suites in the benchmark.

## <a name="Benefits"></a> Benefits

We intend the benchmark and tool to be useful for both storage providers and storage users.
For end users, it can be used to evalute the compatibility level and
determine if the storage system in question is suitable for the required scenarios.
For storage providers, it helps to quickly generate an objective and reliable report
about core functioins of the storage service.
As an instance, if the HCFS got a 100% on a suite named 'tpcds',
it is demonstrated that all functions needed by a tpcds program have been well achieved.
It is also a guide indicating how storage service abilities can map to HCFS interfaces, such as storage class on S3.
118 changes: 118 additions & 0 deletions hadoop-tools/hadoop-compat-bench/pom.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-project</artifactId>
<version>3.5.0-SNAPSHOT</version>
<relativePath>../../hadoop-project</relativePath>
</parent>
<artifactId>hadoop-compat-bench</artifactId>
<version>3.5.0-SNAPSHOT</version>
<packaging>jar</packaging>

<description>Apache Hadoop Compatibility</description>
<name>Apache Hadoop Compatibility Benchmark</name>

<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<scope>provided</scope>
</dependency>
<dependency>
<!-- Should we keep this -->
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<scope>compile</scope>
</dependency>

<!-- For test -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs-client</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<type>test-jar</type>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<type>test-jar</type>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.mockito</groupId>
<artifactId>mockito-core</artifactId>
<scope>test</scope>
</dependency>
</dependencies>

<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
<configuration>
<archive>
<manifest>
<mainClass>org.apache.hadoop.fs.compat.HdfsCompatibility</mainClass>
</manifest>
</archive>
</configuration>
<executions>
<execution>
<goals>
<goal>test-jar</goal>
</goals>
<configuration>
<archive>
<manifest>
<mainClass>org.apache.hadoop.fs.compat.hdfs.HdfsCompatMiniCluster</mainClass>
</manifest>
</archive>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<configuration>
<forkedProcessTimeoutInSeconds>3600</forkedProcessTimeoutInSeconds>
</configuration>
</plugin>
</plugins>
<resources>
<resource>
<directory>src/main/resources</directory>
</resource>
<resource>
<directory>shell</directory>
</resource>
</resources>
</build>
</project>
58 changes: 58 additions & 0 deletions hadoop-tools/hadoop-compat-bench/shell/cases/attr.t
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
#!/bin/sh

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

. $(dirname "$0")/../misc.sh

echo "Hello World!" > "${localDir}/dat"
hadoop fs -put "${localDir}/dat" "${baseDir}/"

echo "1..10"

# 1. chown
hadoop fs -chown "hadoop-compat-bench-user" "${baseDir}/dat"
expect_out "chown" "user:hadoop-compat-bench-user" hadoop fs -stat "user:%u" "${baseDir}/dat"

# 2. chgrp
hadoop fs -chgrp "hadoop-compat-bench-group" "${baseDir}/dat"
expect_out "chgrp" "group:hadoop-compat-bench-group" hadoop fs -stat "group:%g" "${baseDir}/dat"

# 3. chmod
hadoop fs -chmod 777 "${baseDir}/dat"
expect_out "chmod" "perm:777" hadoop fs -stat "perm:%a" "${baseDir}/dat"

# 4. touch
hadoop fs -touch -m -t "20000615:000000" "${baseDir}/dat"
expect_out "touch" "date:2000-06-.*" hadoop fs -stat "date:%y" "${baseDir}/dat"

# 5. setfattr
expect_ret "setfattr" 0 hadoop fs -setfattr -n "user.key" -v "value" "${baseDir}/dat"

# 6. getfattr
expect_out "getfattr" ".*value.*" hadoop fs -getfattr -n "user.key" "${baseDir}/dat"

# 7. setfacl
expect_ret "setfacl" 0 hadoop fs -setfacl -m "user:foo:---" "${baseDir}/dat"

# 8. getfacl
expect_out "getfacl" ".*foo.*" hadoop fs -getfacl "${baseDir}/dat"

# 9. setrep
hadoop fs -setrep 1 "${baseDir}/dat"
expect_out "setrep" "replication:1" hadoop fs -stat "replication:%r" "${baseDir}/dat"

# 10. checksum
expect_ret "checksum" 0 hadoop fs -checksum "${baseDir}/dat" # TODO
36 changes: 36 additions & 0 deletions hadoop-tools/hadoop-compat-bench/shell/cases/concat.t
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
#!/bin/sh

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

. $(dirname "$0")/../misc.sh

echo "Hello World!" > "${localDir}/dat"
hadoop fs -put "${localDir}/dat" "${baseDir}/src1"
hadoop fs -put "${localDir}/dat" "${baseDir}/src2"

echo "1..3"

# 1. touchz
hadoop fs -touchz "${baseDir}/dat"
expect_out "touchz" "size:0" hadoop fs -stat "size:%b" "${baseDir}/dat"

# 2. concat
expect_ret "concat" 0 hadoop fs -concat "${baseDir}/dat" "${baseDir}/src1" "${baseDir}/src2"
# expect_out "size:26" hadoop fs -stat "size:%b" "${baseDir}/dat"

# 3. getmerge
hadoop fs -getmerge "${baseDir}" "${localDir}/merged"
expect_ret "getmerge" 0 test -s "${localDir}/merged"
33 changes: 33 additions & 0 deletions hadoop-tools/hadoop-compat-bench/shell/cases/copy.t
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
#!/bin/sh

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

. $(dirname "$0")/../misc.sh

echo "Hello World!" > "${localDir}/dat"

echo "1..3"

# 1. copyFromLocal
expect_ret "copyFromLocal" 0 hadoop fs -copyFromLocal "${localDir}/dat" "${baseDir}/"

# 2. cp
hadoop fs -cp "${baseDir}/dat" "${baseDir}/dat2"
expect_ret "cp" 0 hadoop fs -test -f "${baseDir}/dat2"

# 3. copyToLocal
hadoop fs -copyToLocal "${baseDir}/dat2" "${localDir}/"
expect_ret "copyToLocal" 0 test -f "${localDir}/dat2"
51 changes: 51 additions & 0 deletions hadoop-tools/hadoop-compat-bench/shell/cases/fileinfo.t
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
#!/bin/sh

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

. $(dirname "$0")/../misc.sh

echo "Hello World!" > "${localDir}/dat"
hadoop fs -put "${localDir}/dat" "${baseDir}/"
hadoop fs -mkdir -p "${baseDir}/dir/sub"

echo "1..9"

# 1. ls
expect_lines "ls" 2 ".*dat.*" ".*dir.*" hadoop fs -ls "${baseDir}"

# 2. lsr
expect_lines "lsr" 3 ".*dat.*" ".*dir.*" ".*sub.*" hadoop fs -lsr "${baseDir}"

# 3. count
expect_out "count" ".*13.*" hadoop fs -count "${baseDir}"

# 4. du
expect_out "du" ".*13.*" hadoop fs -du "${baseDir}"

# 5. dus
expect_out "dus" ".*13.*" hadoop fs -dus "${baseDir}"

# 6. df
expect_ret "df" 0 hadoop fs -df "${baseDir}"

# 7. stat
expect_out "stat" "size:13" hadoop fs -stat "size:%b" "${baseDir}/dat"

# 8. test
expect_ret "test" 0 hadoop fs -test -f "${baseDir}/dat"

# 9. find
expect_out "find" ".*dat.*" hadoop fs -find "${baseDir}" -name "dat" -print
34 changes: 34 additions & 0 deletions hadoop-tools/hadoop-compat-bench/shell/cases/modification.t
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
#!/bin/sh

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

. $(dirname "$0")/../misc.sh

echo "Hello World!" > "${localDir}/dat"

echo "1..4"

# 1. mkdir
expect_ret "mkdir" 0 hadoop fs -mkdir -p "${baseDir}/dir"

# 2. put
expect_ret "put" 0 hadoop fs -put "${localDir}/dat" "${baseDir}/"

# 3. appendToFile
expect_ret "appendToFile" 0 hadoop fs -appendToFile "${localDir}/dat" "${baseDir}/dat"

# 4. truncate
expect_ret "truncate" 0 hadoop fs -truncate 13 "${baseDir}/dat"
Loading