Failure with large object size with multi clients and high number of threads

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Failure with large object size with multi clients and high number of threads

karan singh
Hello guys

Facing problems with COSBench while PUT large object size (128M) with multiple clients in parallel (8) and large number of threads each client (128). My backend storage is Ceph Jewel accessing over S3.

COSBench debug log enabled but not seeing any useful messages, however my job fails

Are their any special consideration (other than which are mentioned in user guide) for running large threads , clients and object sizes on COSBench

If i run same job with 2RGW, 8 clients , 64M , 128 workers, it works. However if i increase object size to 128M it fails.

Below are the logs and workload file.


2016-08-06 00:37:54,009 [INFO] [Log4jLogManager] - will append log to file /root/cosbench/0.4.2.c3/log/mission/MA5CA41A0E4.log
2016-08-06 00:37:54,010 [DEBUG] [Log4jLogManager] - log level has been set to DEBUG
2016-08-06 00:37:54,012 [DEBUG] [S3Storage] - initialize S3 client with storage config: {path_style_access=false, endpoint=http://10.5.13.140:80, max_connections=200, proxyport=, accesskey=S3user1, secretkey=S3user1key, logging=true, timeout=999999, proxyhost=}
2016-08-06 00:37:54,014 [DEBUG] [S3Storage] - S3 client has been initialized
2016-08-06 00:37:54,014 [DEBUG] [S3Storage] - initialize S3 client with storage config: {path_style_access=false, endpoint=http://10.5.13.140:80, max_connections=200, proxyport=, accesskey=S3user1, secretkey=S3user1key, logging=true, timeout=999999, proxyhost=}
2016-08-06 00:37:54,015 [DEBUG] [S3Storage] - S3 client has been initialized
2016-08-06 00:37:54,015 [DEBUG] [S3Storage] - initialize S3 client with storage config: {path_style_access=false, endpoint=http://10.5.13.140:80, max_connections=200, proxyport=, accesskey=S3user1, secretkey=S3user1key, logging=true, timeout=999999, proxyhost=}
2016-08-06 00:37:54,016 [DEBUG] [S3Storage] - S3 client has been initialized
2016-08-06 00:37:54,016 [DEBUG] [S3Storage] - initialize S3 client with storage config: {path_style_access=false, endpoint=http://10.5.13.140:80, max_connections=200, proxyport=, accesskey=S3user1, secretkey=S3user1key, logging=true, timeout=999999, proxyhost=}

Full Logs here : https://paste.fedoraproject.org/403997/7799147/raw/
-----------------

My workload file looks like
------------------------------

<?xml version="1.0" encoding="UTF-8"?>
<workload name="2-rgw-10Gbe-8-client-s3-1024-workers" description="2-rgw-10Gbe-8-client-s3-1024-workers" config="">

    <workflow config="">
       
        <workstage name="init" closuredelay="0" config="">
            <work name="Initialize all containers" type="init"
                workers="1" interval="5" division="container"
                runtime="0" rampup="30" rampdown="0" afr="0" totalOps="8"
                totalBytes="0" config="containers=r(1101,1180)">
                <auth type="none"/>
                <storage type="s3" config="accesskey=S3user1;secretkey=S3user1key;timeout=999999;max_connections=200;endpoint=<a href="http://10.5.13.140:80&quot;/&gt;">http://10.5.13.140:80"/>
                <operation type="init" ratio="100" division="container"
                    config="objects=r(0,0);sizes=c(0)B;containers=r(1101,1180)" id="op4"/>
            </work>

        </workstage>
               
        <workstage name="client-write-128M" closuredelay="0" config="">
            <work name="client1-write-128M" type="normal" workers="128"
                interval="5" division="none" runtime="300" rampup="30"
                rampdown="0" afr="200000" totalOps="0" totalBytes="0"
                driver="client1" config="">
                <auth type="none"/>
                <storage type="s3" config="accesskey=S3user1;secretkey=S3user1key;timeout=999999;max_connections=200;endpoint=<a href="http://10.5.13.140:80&quot;/&gt;">http://10.5.13.140:80"/>
                <operation type="write" ratio="100" division="none"
                    config="containers=r(1101,1110);objects=r(1,500);sizes=c(128)MB" id="op1"/>
            </work>
            <work name="client2-write-128M" type="normal" workers="128"
                interval="5" division="none" runtime="300" rampup="30"
                rampdown="0" afr="200000" totalOps="0" totalBytes="0"
                driver="client2" config="">
                <auth type="none"/>
                <storage type="s3" config="accesskey=S3user1;secretkey=S3user1key;timeout=999999;max_connections=200;endpoint=<a href="http://10.5.13.141:80&quot;/&gt;">http://10.5.13.141:80"/>
                <operation type="write" ratio="100" division="none"
                    config="containers=r(1111,1120);objects=r(1,500);sizes=c(128)MB" id="op2"/>
            </work>
            <work name="client3-write-128M" type="normal" workers="128"
                interval="5" division="none" runtime="300" rampup="30"
                rampdown="0" afr="200000" totalOps="0" totalBytes="0"
                driver="client3" config="">
                <auth type="none"/>
                <storage type="s3" config="accesskey=S3user1;secretkey=S3user1key;timeout=999999;max_connections=200;endpoint=<a href="http://10.5.13.140:80&quot;/&gt;">http://10.5.13.140:80"/>
                <operation type="write" ratio="100" division="none"
                    config="containers=r(1121,1130);objects=r(1,500);sizes=c(128)MB" id="op1"/>
            </work>
            <work name="client4-write-128M" type="normal" workers="128"
                interval="5" division="none" runtime="300" rampup="30"
                rampdown="0" afr="200000" totalOps="0" totalBytes="0"
                driver="client4" config="">
                <auth type="none"/>
                <storage type="s3" config="accesskey=S3user1;secretkey=S3user1key;timeout=999999;max_connections=200;endpoint=<a href="http://10.5.13.141:80&quot;/&gt;">http://10.5.13.141:80"/>
                <operation type="write" ratio="100" division="none"
                    config="containers=r(1131,1140);objects=r(1,500);sizes=c(128)MB" id="op2"/>
            </work>
            <work name="client5-write-128M" type="normal" workers="128"
                interval="5" division="none" runtime="300" rampup="30"
                rampdown="0" afr="200000" totalOps="0" totalBytes="0"
                driver="client5" config="">
                <auth type="none"/>
                <storage type="s3" config="accesskey=S3user1;secretkey=S3user1key;timeout=999999;max_connections=200;endpoint=<a href="http://10.5.13.140:80&quot;/&gt;">http://10.5.13.140:80"/>
                <operation type="write" ratio="100" division="none"
                    config="containers=r(1141,1150);objects=r(1,500);sizes=c(128)MB" id="op1"/>
            </work>
            <work name="client6-write-128M" type="normal" workers="128"
                interval="5" division="none" runtime="300" rampup="30"
                rampdown="0" afr="200000" totalOps="0" totalBytes="0"
                driver="client6" config="">
                <auth type="none"/>
                <storage type="s3" config="accesskey=S3user1;secretkey=S3user1key;timeout=999999;max_connections=200;endpoint=<a href="http://10.5.13.141:80&quot;/&gt;">http://10.5.13.141:80"/>
                <operation type="write" ratio="100" division="none"
                    config="containers=r(1151,1160);objects=r(1,500);sizes=c(128)MB" id="op2"/>
            </work>
            <work name="client7-write-128M" type="normal" workers="128"
                interval="5" division="none" runtime="300" rampup="30"
                rampdown="0" afr="200000" totalOps="0" totalBytes="0"
                driver="client7" config="">
                <auth type="none"/>
                <storage type="s3" config="accesskey=S3user1;secretkey=S3user1key;timeout=999999;max_connections=200;endpoint=<a href="http://10.5.13.140:80&quot;/&gt;">http://10.5.13.140:80"/>
                <operation type="write" ratio="100" division="none"
                    config="containers=r(1161,1170);objects=r(1,500);sizes=c(128)MB" id="op1"/>
            </work>
            <work name="client8-write-128M" type="normal" workers="128"
                interval="5" division="none" runtime="300" rampup="30"
                rampdown="0" afr="200000" totalOps="0" totalBytes="0"
                driver="client8" config="">
                <auth type="none"/>
                <storage type="s3" config="accesskey=S3user1;secretkey=S3user1key;timeout=999999;max_connections=200;endpoint=<a href="http://10.5.13.141:80&quot;/&gt;">http://10.5.13.141:80"/>
                <operation type="write" ratio="100" division="none"
                    config="containers=r(11711,1180);objects=r(1,500);sizes=c(128)MB" id="op2"/>
            </work>                               
        </workstage>
    </workflow>
</workload>
Reply | Threaded
Open this post in threaded view
|

Re: Failure with large object size with multi clients and high number of threads

ywang19
Administrator
could you also check the log/system.log?

在 2016年8月8日,上午6:18,karan singh [via COSBench] <[hidden email]> 写道:

Hello guys

Facing problems with COSBench while PUT large object size (128M) with multiple clients in parallel (8) and large number of threads each client (128). My backend storage is Ceph Jewel accessing over S3.

COSBench debug log enabled but not seeing any useful messages, however my job fails

Are their any special consideration (other than which are mentioned in user guide) for running large threads , clients and object sizes on COSBench

If i run same job with 2RGW, 8 clients , 64M , 128 workers, it works. However if i increase object size to 128M it fails.

Below are the logs and workload file.


2016-08-06 00:37:54,009 [INFO] [Log4jLogManager] - will append log to file /root/cosbench/0.4.2.c3/log/mission/MA5CA41A0E4.log
2016-08-06 00:37:54,010 [DEBUG] [Log4jLogManager] - log level has been set to DEBUG
2016-08-06 00:37:54,012 [DEBUG] [S3Storage] - initialize S3 client with storage config: {path_style_access=false, endpoint=http://10.5.13.140:80, max_connections=200, proxyport=, accesskey=S3user1, secretkey=S3user1key, logging=true, timeout=999999, proxyhost=}
2016-08-06 00:37:54,014 [DEBUG] [S3Storage] - S3 client has been initialized
2016-08-06 00:37:54,014 [DEBUG] [S3Storage] - initialize S3 client with storage config: {path_style_access=false, endpoint=http://10.5.13.140:80, max_connections=200, proxyport=, accesskey=S3user1, secretkey=S3user1key, logging=true, timeout=999999, proxyhost=}
2016-08-06 00:37:54,015 [DEBUG] [S3Storage] - S3 client has been initialized
2016-08-06 00:37:54,015 [DEBUG] [S3Storage] - initialize S3 client with storage config: {path_style_access=false, endpoint=http://10.5.13.140:80, max_connections=200, proxyport=, accesskey=S3user1, secretkey=S3user1key, logging=true, timeout=999999, proxyhost=}
2016-08-06 00:37:54,016 [DEBUG] [S3Storage] - S3 client has been initialized
2016-08-06 00:37:54,016 [DEBUG] [S3Storage] - initialize S3 client with storage config: {path_style_access=false, endpoint=http://10.5.13.140:80, max_connections=200, proxyport=, accesskey=S3user1, secretkey=S3user1key, logging=true, timeout=999999, proxyhost=}

Full Logs here : https://paste.fedoraproject.org/403997/7799147/raw/
-----------------

My workload file looks like
------------------------------

<?xml version="1.0" encoding="UTF-8"?>
<workload name="2-rgw-10Gbe-8-client-s3-1024-workers" description="2-rgw-10Gbe-8-client-s3-1024-workers" config="">

    <workflow config="">
       
        <workstage name="init" closuredelay="0" config="">
            <work name="Initialize all containers" type="init"
                workers="1" interval="5" division="container"
                runtime="0" rampup="30" rampdown="0" afr="0" totalOps="8"
                totalBytes="0" config="containers=r(1101,1180)">
                <auth type="none"/>
                <storage type="s3" config="accesskey=S3user1;secretkey=S3user1key;timeout=999999;max_connections=200;endpoint=<a href="<a href="http://10.5.13.140:80&amp;quot;/&amp;gt">http://10.5.13.140:80&quot;/&gt;">http://10.5.13.140:80"/>
                <operation type="init" ratio="100" division="container"
                    config="objects=r(0,0);sizes=c(0)B;containers=r(1101,1180)" id="op4"/>
            </work>

        </workstage>
               
        <workstage name="client-write-128M" closuredelay="0" config="">
            <work name="client1-write-128M" type="normal" workers="128"
                interval="5" division="none" runtime="300" rampup="30"
                rampdown="0" afr="200000" totalOps="0" totalBytes="0"
                driver="client1" config="">
                <auth type="none"/>
                <storage type="s3" config="accesskey=S3user1;secretkey=S3user1key;timeout=999999;max_connections=200;endpoint=<a href="<a href="http://10.5.13.140:80&amp;quot;/&amp;gt">http://10.5.13.140:80&quot;/&gt;">http://10.5.13.140:80"/>
                <operation type="write" ratio="100" division="none"
                    config="containers=r(1101,1110);objects=r(1,500);sizes=c(128)MB" id="op1"/>
            </work>
            <work name="client2-write-128M" type="normal" workers="128"
                interval="5" division="none" runtime="300" rampup="30"
                rampdown="0" afr="200000" totalOps="0" totalBytes="0"
                driver="client2" config="">
                <auth type="none"/>
                <storage type="s3" config="accesskey=S3user1;secretkey=S3user1key;timeout=999999;max_connections=200;endpoint=<a href="<a href="http://10.5.13.141:80&amp;quot;/&amp;gt">http://10.5.13.141:80&quot;/&gt;">http://10.5.13.141:80"/>
                <operation type="write" ratio="100" division="none"
                    config="containers=r(1111,1120);objects=r(1,500);sizes=c(128)MB" id="op2"/>
            </work>
            <work name="client3-write-128M" type="normal" workers="128"
                interval="5" division="none" runtime="300" rampup="30"
                rampdown="0" afr="200000" totalOps="0" totalBytes="0"
                driver="client3" config="">
                <auth type="none"/>
                <storage type="s3" config="accesskey=S3user1;secretkey=S3user1key;timeout=999999;max_connections=200;endpoint=<a href="<a href="http://10.5.13.140:80&amp;quot;/&amp;gt">http://10.5.13.140:80&quot;/&gt;">http://10.5.13.140:80"/>
                <operation type="write" ratio="100" division="none"
                    config="containers=r(1121,1130);objects=r(1,500);sizes=c(128)MB" id="op1"/>
            </work>
            <work name="client4-write-128M" type="normal" workers="128"
                interval="5" division="none" runtime="300" rampup="30"
                rampdown="0" afr="200000" totalOps="0" totalBytes="0"
                driver="client4" config="">
                <auth type="none"/>
                <storage type="s3" config="accesskey=S3user1;secretkey=S3user1key;timeout=999999;max_connections=200;endpoint=<a href="<a href="http://10.5.13.141:80&amp;quot;/&amp;gt">http://10.5.13.141:80&quot;/&gt;">http://10.5.13.141:80"/>
                <operation type="write" ratio="100" division="none"
                    config="containers=r(1131,1140);objects=r(1,500);sizes=c(128)MB" id="op2"/>
            </work>
            <work name="client5-write-128M" type="normal" workers="128"
                interval="5" division="none" runtime="300" rampup="30"
                rampdown="0" afr="200000" totalOps="0" totalBytes="0"
                driver="client5" config="">
                <auth type="none"/>
                <storage type="s3" config="accesskey=S3user1;secretkey=S3user1key;timeout=999999;max_connections=200;endpoint=<a href="<a href="http://10.5.13.140:80&amp;quot;/&amp;gt">http://10.5.13.140:80&quot;/&gt;">http://10.5.13.140:80"/>
                <operation type="write" ratio="100" division="none"
                    config="containers=r(1141,1150);objects=r(1,500);sizes=c(128)MB" id="op1"/>
            </work>
            <work name="client6-write-128M" type="normal" workers="128"
                interval="5" division="none" runtime="300" rampup="30"
                rampdown="0" afr="200000" totalOps="0" totalBytes="0"
                driver="client6" config="">
                <auth type="none"/>
                <storage type="s3" config="accesskey=S3user1;secretkey=S3user1key;timeout=999999;max_connections=200;endpoint=<a href="<a href="http://10.5.13.141:80&amp;quot;/&amp;gt">http://10.5.13.141:80&quot;/&gt;">http://10.5.13.141:80"/>
                <operation type="write" ratio="100" division="none"
                    config="containers=r(1151,1160);objects=r(1,500);sizes=c(128)MB" id="op2"/>
            </work>
            <work name="client7-write-128M" type="normal" workers="128"
                interval="5" division="none" runtime="300" rampup="30"
                rampdown="0" afr="200000" totalOps="0" totalBytes="0"
                driver="client7" config="">
                <auth type="none"/>
                <storage type="s3" config="accesskey=S3user1;secretkey=S3user1key;timeout=999999;max_connections=200;endpoint=<a href="<a href="http://10.5.13.140:80&amp;quot;/&amp;gt">http://10.5.13.140:80&quot;/&gt;">http://10.5.13.140:80"/>
                <operation type="write" ratio="100" division="none"
                    config="containers=r(1161,1170);objects=r(1,500);sizes=c(128)MB" id="op1"/>
            </work>
            <work name="client8-write-128M" type="normal" workers="128"
                interval="5" division="none" runtime="300" rampup="30"
                rampdown="0" afr="200000" totalOps="0" totalBytes="0"
                driver="client8" config="">
                <auth type="none"/>
                <storage type="s3" config="accesskey=S3user1;secretkey=S3user1key;timeout=999999;max_connections=200;endpoint=<a href="<a href="http://10.5.13.141:80&amp;quot;/&amp;gt">http://10.5.13.141:80&quot;/&gt;">http://10.5.13.141:80"/>
                <operation type="write" ratio="100" division="none"
                    config="containers=r(11711,1180);objects=r(1,500);sizes=c(128)MB" id="op2"/>
            </work>                               
        </workstage>
    </workflow>
</workload>



To start a new topic under COSBench, email [hidden email]
To unsubscribe from COSBench, click here.
NAML
Reply | Threaded
Open this post in threaded view
|

Re: Failure with large object size with multi clients and high number of threads

karan singh
Hello Ywang

Thanks for your response, this works now. The problem was one of the COSBench client was misconfigured.

- Karan -
Reply | Threaded
Open this post in threaded view
|

Re: Failure with large object size with multi clients and high number of threads

karan singh
Hi

My bad, looks like this problem still exists.  We have done some more findings around this problem

#1  2 RGW, S3, 8 Clients , 128M object size fails  BUT 2RGW,S3, 8Clients, (1/32/64M) object sizes works
#2 On all Cosbench clients we are seeing "Encountered an exception and couldn't reset the stream to retry" message (see below logs)
#3 Same workload i.e 2RGW,8Clients,128M with Swift works well

This leads us to think, there is something wrong in COSBench OR AWS Java SDK that it uses
https://github.com/aws/aws-sdk-java/issues/191

Upstream AWS JAVA SDK version is 1.11.26 https://github.com/aws/aws-sdk-java/releases

Do you think upgrading COSBench AWS java sdk will fix this ?? I am not very familiar with JAVA SDK so will be nice if COSBench developers can help me with that ??

Mission logs
--------------
2016-08-10 19:00:46,278 [INFO] [NoneStorage] - performing PUT at /mycontainers1167/myobjects239
2016-08-10 19:00:46,357 [WARN] [S3Storage] - below exception encountered when creating object myobjects73 at mycontainers1168: Encountered an exception and couldn't reset the stream to retry
2016-08-10 19:00:46,359 [INFO] [NoneStorage] - performing PUT at /mycontainers1168/myobjects74
2016-08-10 19:00:46,776 [INFO] [NoneStorage] - performing PUT at /mycontainers1169/myobjects27
2016-08-10 19:00:46,880 [INFO] [NoneStorage] - performing PUT at /mycontainers1170/myobjects427
2016-08-10 19:00:47,627 [INFO] [NoneStorage] - performing PUT at /mycontainers1161/myobjects59
2016-08-10 19:00:48,193 [INFO] [NoneStorage] - performing PUT at /mycontainers1162/myobjects199
2016-08-10 19:00:49,412 [INFO] [NoneStorage] - performing PUT at /mycontainers1164/myobjects191
2016-08-10 19:00:49,412 [INFO] [NoneStorage] - performing PUT at /mycontainers1163/myobjects139
2016-08-10 19:00:49,664 [INFO] [NoneStorage] - performing PUT at /mycontainers1165/myobjects259
2016-08-10 19:00:49,751 [INFO] [NoneStorage] - performing PUT at /mycontainers1166/myobjects183
2016-08-10 19:00:49,817 [INFO] [NoneStorage] - performing PUT at /mycontainers1167/myobjects275
2016-08-10 19:00:51,787 [INFO] [NoneStorage] - performing PUT at /mycontainers1168/myobjects111
2016-08-10 19:00:52,378 [INFO] [NoneStorage] - performing PUT at /mycontainers1169/myobjects211
2016-08-10 19:00:53,170 [WARN] [S3Storage] - below exception encountered when creating object myobjects402 at mycontainers1166: Encountered an exception and couldn't reset the stream to retry
2016-08-10 19:00:53,172 [INFO] [NoneStorage] - performing PUT at /mycontainers1170/myobjects403
2016-08-10 19:00:54,992 [INFO] [NoneStorage] - performing PUT at /mycontainers1161/myobjects151
2016-08-10 19:00:55,262 [INFO] [NoneStorage] - performing PUT at /mycontainers1162/myobjects227
2016-08-10 19:00:56,039 [INFO] [NoneStorage] - performing PUT at /mycontainers1163/myobjects491
2016-08-10 19:00:56,382 [INFO] [NoneStorage] - performing PUT at /mycontainers1164/myobjects473
2016-08-10 19:00:56,581 [INFO] [NoneStorage] - performing PUT at /mycontainers1165/myobjects147
2016-08-10 19:00:57,148 [INFO] [NoneStorage] - performing PUT at /mycontainers1166/myobjects175
2016-08-10 19:00:57,236 [INFO] [NoneStorage] - performing PUT at /mycontainers1167/myobjects307
2016-08-10 19:00:57,638 [INFO] [NoneStorage] - performing PUT at /mycontainers1168/myobjects123
2016-08-10 19:00:57,639 [INFO] [NoneStorage] - performing PUT at /mycontainers1169/myobjects167
2016-08-10 19:00:58,195 [INFO] [NoneStorage] - performing PUT at /mycontainers1170/myobjects15
2016-08-10 19:01:00,142 [WARN] [S3Storage] - below exception encountered when creating object myobjects66 at mycontainers1170: Encountered an exception and couldn't reset the stream to retry
2016-08-10 19:01:00,144 [INFO] [NoneStorage] - performing PUT at /mycontainers1161/myobjects67
2016-08-10 19:01:00,314 [INFO] [NoneStorage] - performing PUT at /mycontainers1162/myobjects331
2016-08-10 19:01:00,491 [WARN] [S3Storage] - below exception encountered when creating object myobjects37 at mycontainers1170: Encountered an exception and couldn't reset the stream to retry
2016-08-10 19:01:00,492 [INFO] [NoneStorage] - performing PUT at /mycontainers1163/myobjects38