Limiting resources used by IBM Cloud private and Orient Me

IBM Conductor for Containers has been rebranded IBM Cloud private with version 1.2.0 (https://www.ibm.com/developerworks/community/blogs/fe25b4ef-ea6a-4d86-a629-6f87ccf4649e/entry/IBM_Cloud_private_formerly_IBM_Spectrum_Conductor_for_Containers_version_1_2_0_is_now_available?lang=en)

IBM released version 6.0.0.1 of Orient Me and with it added new applications increasing the total amount of pods in play. Each pod requires some resources to run. Recently there has been some frustration for those who work with Connections trying to get Orient Me up and running on smaller servers for testing purposes or for deployment to SMB customers.

I spent some time looking at how to limit the resources consumed by decreasing the number of pods.

Kubernetes allows you to scale up or down your pods. This can be done on the command line or via the UI

Since I prefer the command line here is how you scale an application and it’s effect on the number of pods. There are two ways in which this is done, by Replica Sets and Stateful Sets. I won’t go into the difference of both because I’m not even wholly sure myself but suffice to say that most of OM applications use Replica Sets.

Replica Sets

I’m using analysisservice as an example because it is at the top when commands are run.

# kubectl get pods
NAME                                                   READY STATUS RESTARTS AGE
analysisservice-1093785398-31ks2 1/1        Running        0             8m
analysisservice-1093785398-hf90j 1/1        Running        0             8m

# kubectl get rs
NAME                                        DESIRED CURRENT READY AGE
analysisservice-1093785398 2                 2                   2            9m

The following command tells K8s to change the number of pods to be 1 that will accept load.

# kubectl scale –replicas=1 rs/analysisservice-1093785398
replicaset “analysisservice-1093785398” scaled

Below shows that just the one pod is ready to accept load. Note that the desired number is two. This means that this will be the default value if all the pods are deleted or the OS restarted.

# kubectl get rs
NAME                                        DESIRED CURRENT READY AGE
analysisservice-1093785398 2                 2                   1            9m

The pod that is going to not accept load is destroyed and a new one replaces it.

# kubectl get pods
NAME                                                   READY STATUS                  RESTARTS AGE
analysisservice-1093785398-31ks2 1/1        Running                  0                   18m
analysisservice-1093785398-4njpn 1/1       Terminating           0                    5m
analysisservice-1093785398-fmnrd 0/1     ContainerCreating 0                   3s

You can see that the new pod is not “ready” and thus not accepting any load.

# kubectl get pods
NAME                                                   READY STATUS RESTARTS AGE
analysisservice-1093785398-31ks2 1/1        Running 0                   19m
analysisservice-1093785398-fmnrd 0/1      Running 0                   43s

The reverse is true and you can scale the number of pods upwards. ICp can do this with policies based on CPU usage creating more pods and then decreasing them when the load drops.

The above approach does not persist over OS restarts or deletion of all the pods. To persist these changes the following steps need to be followed.

# kubectl get deployment
NAME               DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
analysisservice 2                2                   2                        2                      34m

This command amends the deployment configuration which was set in complete.6_0.yaml in the OM binaries.

# kubectl edit deployment analysisservice
apiVersion: extensions/v1beta1
kind: Deployment

This will open in vi though you can change your editor if you prefer. Under the spec section you want to amend the number of replicas

spec:
replicas: 1
selector:
matchLabels:
mService: analysisservice
name: analysisservice
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 1

Ignore the status section. Save and close (:wq)

# kubectl get deployment
NAME                    DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
analysisservice      1                   1                   1                      1                       44m

This time the second pod is not listed with a 0/1 ready value. The second pod has been deleted.

# kubectl get pods
NAME                                                     READY STATUS RESTARTS AGE
analysisservice-1093785398-kz76m 1/1        Running  0                   17m

You can use the following command to open all application deployments and update using vi all the applications at one time.

# kubectl edit deployment

When you save and close the applications will be updated in line the values you set for the replicas.

# kubectl get deployment
NAME                                    DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
analysisservice                     1                  1                   1                         1                      55m
haproxy                                 1                  1                   1                         1                      57m
indexingservice                   1                  1                   1                         1                      55m
itm-services                         1                  1                   1                         1                      55m
mail-service                        1                  1                   1                          1                     55m
orient-webclient                1                  1                   1                         1                      55m
people-migrate                  1                  1                   1                         1                      55m
people-relation                  1                  1                   1                         1                      55m
people-scoring                   1                  1                   1                         1                      55m
redis-sentinel                     1                  1                   1                         1                      57m
retrievalservice                  1                  1                   1                         1                      55m
solr1                                     1                  1                   1                         1                      57m
solr2                                    1                  1                   1                         1                      57m
solr3                                    1                  1                   1                         1                      57m
zookeeper-controller-1   1                  1                   1                         1                      57m
zookeeper-controller-2  1                  1                   1                         1                      57m
zookeeper-controller-3  1                  1                   1                         1                      57m

To delete the additional solr and zookeper-controller pods you needs to run the following.

# kubectl delete deployment zookeeper-controller-2 zookeeper-controller-3
# kubectl delete deployment solr2 solr3

Running the following shows the number of pods have decreased by quite a lot.

# kubectl get pods

Checking the ReplicaSets again shows the values have decreased.

# kubectl get rs

Mongo and redis-server do not use Replica Sets, they use StatefulSets.

StatefulSets

The following command shows that there are 3 pods for each application.

# kubectl get statefulsets
NAME          DESIRED CURRENT AGE
mongo          3                 3                  1h
redis-server 3                 3                  1h

In the same vain as before you edit the replicas decreasing/increasing them as you see fit.

# kubectl edit statefulsets
statefulset “mongo” edited
statefulset “redis-server” edited

The end result is that only the one ReplicaSet is configured.

# kubectl get statefulsets
NAME          DESIRED CURRENT AGE
mongo          1                 1                  1h
redis-server 1                 1                  1h

The effect is seen when you list the pods.

# kubectl get pods
NAME              READY STATUS RESTARTS AGE
mongo-0          2/2        Running 0                   1h
redis-server-0 1/1         Running 0                   1h

At install time

These changes can be made at install time by updating the various .yml files in /microservices/hybridcloud/templates/* and /microservices/hybridcloud/templates/complete.6_0.yaml and then running install.sh.

Finally

I have only experimented on the default applications and have not touched those from the kube-system namespace which are the ICp applications and not OM specific.

I haven’t tried this on a working system yet, purely a detached single node running all roles with hostpath configuration.

Since there is no load on the server my measurements with regards to resources consumed pre and post changes is far from scientific but looking at the UI the amount of CPU and memory is certainly less then previously used.

I have no idea as yet whether this will break OM but I will persist and see whether it does or whether it works swimmingly. If anyone tries this out then please feedback to me.

BTW – I restarted the OS and had a couple of problems with analysisservice and indexingservice pods not being ready and shown as unhealthy but after deleting haproxy, redis-server-0 and redis-sentinel all my pods are showing as healthy.

IBM, please please provide a relatively simple way (ideally at install time) for us to cut the deployment down to bare bones maybe a small, medium or large deployment as you do with traditional Connections?

Update 05/07/2017

Once I integrated the server with a working Connections 6.0 server with latest fixes applied the ITM bar did not work. Nico Meisenzahl has also been looking into this and we hope to have a working set up soon

Update 07/07/2017

Nico created a great blog updating the yml files to decrease the amount of pods/containers during installation of Orient Me.

Advertisements

Orient Me and mongoDB connection failures

I have been banging against a mongoDB wall for a good few days as explained in another post but I’m slowly getting there. The problem I was facing was that the migration application in the people-migrate container wasn’t working.

# npm run start migrate
npm info it worked if it ends with ok
npm info using npm@3.10.8
npm info using node@v6.9.1
npm info lifecycle people-datamigration-service@0.0.1~prestart: people-datamigration-service@0.0.1
npm info lifecycle people-datamigration-service@0.0.1~start: people-datamigration-service@0.0.1

> people-datamigration-service@0.0.1 start /usr/src/app
> cross-env NODE_ENV=production node lib/server.js “migrate”

2017-04-20T13:19:56.761Z – info: [migrator] Mongo DB URL: mongodb://mongo-0.mongo:27017,mongo-1.mongo:27017,mongo-2.mongo:27017/relationshipdb?replicaSet=rs0&readPreference=primaryPreferred&wtimeoutMS=2000
2017-04-20T13:19:56.766Z – info: [migrator] Mongo DB URL: mongodb://mongo-0.mongo:27017,mongo-1.mongo:27017,mongo-2.mongo:27017/datamigrationdb?replicaSet=rs0&readPreference=primaryPreferred&wtimeoutMS=2000
2017-04-20T13:19:56.767Z – info: [migrator] Mongo DB URL: mongodb://mongo-0.mongo:27017,mongo-1.mongo:27017,mongo-2.mongo:27017/profiledb?replicaSet=rs0&readPreference=primaryPreferred&wtimeoutMS=2000
Connection fails: MongoError: failed to connect to server [mongo-0:27017] on first connect [MongoError: getaddrinfo ENOTFOUND mongo-0 mongo-0:27017]
It will be retried for the next request.

/usr/src/app/node_modules/mongodb/lib/mongo_client.js:338
          throw err
          ^
MongoError: failed to connect to server [mongo-0:27017] on first connect [MongoError: getaddrinfo ENOTFOUND mongo-0 mongo-0:27017]
    at Pool.<anonymous> (/usr/src/app/node_modules/mongodb-core/lib/topologies/server.js:327:35)
    at emitOne (events.js:96:13)
    at Pool.emit (events.js:188:7)
    at Connection.<anonymous> (/usr/src/app/node_modules/mongodb-core/lib/connection/pool.js:274:12)
    at Connection.g (events.js:291:16)
    at emitTwo (events.js:106:13)
    at Connection.emit (events.js:191:7)
    at Socket.<anonymous> (/usr/src/app/node_modules/mongodb-core/lib/connection/connection.js:177:49)
    at Socket.g (events.js:291:16)
    at emitOne (events.js:96:13)
    at Socket.emit (events.js:188:7)
    at connectErrorNT (net.js:1020:8)
    at _combinedTickCallback (internal/process/next_tick.js:74:11)
    at process._tickCallback (internal/process/next_tick.js:98:9)

If I specify the location of migrationConfig I get the same result.

# npm run start migrate config:/usr/src/app/migrationConfig

Oddly enough, if I run the above command outside of /usr/src/app/ directory it fails. It doesn’t actually read the file you specify, it always looks for migrationConfig in relation to the working directory where you are when you issue it. Of course I may have the syntax wrong but if I do not then it’s a bit sloppy.

On to the problem which seems to be name resolution. The error I was getting was

Connection fails: MongoError: failed to connect to server [mongo-0:27017] on first connect [MongoError: getaddrinfo ENOTFOUND mongo-0 mongo-0:27017]

It seems to be trying to connect to mongo-0 over 27017.

# kubectl exec -it $(kubectl get pods | grep people-migrate | awk ‘{print $1}’) bash

# ping mongo-0
ping: mongo-0: Name or service not known

# ping mongo
PING mongo.default.svc.cluster.local (10.1.67.163) 56(84) bytes of data.
64 bytes from 10.1.67.163 (10.1.67.163): icmp_seq=1 ttl=63 time=0.063 ms

# ping mongo-0.mongo
PING mongo-0.mongo.default.svc.cluster.local (10.1.67.163) 56(84) bytes of data.
64 bytes from 10.1.67.163 (10.1.67.163): icmp_seq=1 ttl=63 time=0.087 ms

This was the cause, “mongo-0” was not resolving for me and this is confirmed by another that there container works the same. To work around this I added an entry to the container’s host file.

# cat /etc/hosts
# Kubernetes-managed hosts file.
127.0.0.1       localhost
::1     localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
fe00::0 ip6-mcastprefix
fe00::1 ip6-allnodes
fe00::2 ip6-allrouters
10.1.67.176     people-migrate-4029352936-n8fzl
10.1.67.163     mongo-0 mongo-0.mongo

Now the migration app works but I also have mongo-sidecar errors which I’m not clear on as to whether they are supposed to be there.

Update – 27/04/17

This only gets me so far. This allows me to get the data migrated from Connections Profiles in to MongoDB but when the container is torn down and replaced with another the host file entry is gone. Also, there are the following errors in the logs for itm-services containers that I cannot exec to to update the hosts file.

Connection fails: MongoError: failed to connect to server [mongo-0:27017] on first connect [MongoError: getaddrinfo ENOTFOUND mongo-0 mongo-0:27017]
It will be retried for the next request.

/usr/src/app/node_modules/mongodb/lib/mongo_client.js:338
          throw err
          ^
MongoError: failed to connect to server [mongo-0:27017] on first connect [MongoError: getaddrinfo ENOTFOUND mongo-0 mongo-0:27017]
    at Pool.<anonymous> (/usr/src/app/node_modules/mongodb-core/lib/topologies/server.js:327:35)
    at emitOne (events.js:96:13)
    at Pool.emit (events.js:188:7)
    at Connection.<anonymous> (/usr/src/app/node_modules/mongodb-core/lib/connection/pool.js:274:12)
    at Connection.g (events.js:291:16)
    at emitTwo (events.js:106:13)
    at Connection.emit (events.js:191:7)
    at Socket.<anonymous> (/usr/src/app/node_modules/mongodb-core/lib/connection/connection.js:177:49)
    at Socket.g (events.js:291:16)
    at emitOne (events.js:96:13)
    at Socket.emit (events.js:188:7)
    at connectErrorNT (net.js:1020:8)
    at _combinedTickCallback (internal/process/next_tick.js:74:11)
    at process._tickCallback (internal/process/next_tick.js:98:9)

Update 28/04/17

During the (excellent) Connections Pink Developer Workshop hosted by IBM we were given access to a SoftLayer server running CentOS 7.3 where we installed CfC and Orient Me. The installer worked just fine with no signs of the mongoDB errors above. I have come across two others who have the same errors I have documented above.

I sparked up a CentOS 7.3 server on Bluemix for a few hours and the install with the same binaries worked just fine. I compared what yum has installed and installed all on my local CentOS 7.3 server and the same problem occurred. I changed my NIC device name swapping it from ens192 to match Bluemix and eth0 but the result is the same.

Update 05/05/17

This week I was lucky to visit the Dublin labs with a customer discussing Watson Workspace, Watson Work Services, XPages and Pink. I used a couple of hours of those two days to have a chat with David McDonagh and a colleague of his Bruno to look into the problems I was having with Mongo.

The crux of it was that the node I was using as the master, boot, worker and proxy was under a great deal of strain, mainly CPU strain, which seemed to be causing the problem. This would make sense since the differences between my ESXi server and Bluemix are the resources available to it.

I bumped up the resources available to the single node but although the install went OK the problems persisted. It wasn’t until today that I got it working but not with a single node but rather two nodes. Node 1 ran boot, master and proxy roles whilst node 2 was the worker node. I gave a generous helping of resources to both and the thankfully the installation went smoothly and more importantly the errors above are no more.

I have some further work to see how much I can scale the resources back because it does have an impact on my ESXi host and the other guests on it.

Connections Pink and container orchestration using CfC

A while ago I started dabbling with Docker after reading some great blogs about ELK by Klaus Bild and Christoph Stoettner thinking I could do with a tool like ELK to analyse log files and to give me something tangible to work with whilst learning about Docker.

After a lot of hard learning and some frustrating hours I got my head around containers and how they could be used to my advantage and got ELK running natively on Ubuntu and then on my work Windows 7 laptop.

A few months before Connect 2017 news was leaking about Connections Pink and its architecture and how the applications will run within containers. Recently Jason Gary Roy held a webinar (Open Mic Webcast: Think Pink – The Future of IBM Connections – 07 March 2017) replaying some of his slides from Connect 2017 and in the video he mentions (briefly) CfC in combination with Docker and containers.

I asked the question in the IBM Connections Community Skype chat and a few people told me that CfC was an IBM product called IBM Spectrum Conductor for Containers. I looked through the community for CfC and realised how important having an orchestration tool is for running multiple containers and scaling for high availability. This was a long way away from running three containers on my laptop.

Installing CfC was pretty easy and well documented in the CfC community. Installation wise you need to install on Ubuntu 16.04 or RHEL although I am sure CentOS will work. I’ll get to that next week.

What you end up with is a rather nice UI which does many of the hard things for you such as networking, setting up persistent storage for your containers, moving applications to other nodes, automatic scaling when demand requires and many more.

What I also liked is that it acts as a private repository for your containers avoiding you needing to push to Docker Hub for storage.

In the latest version you can install on a single node which is great for testing purposes but it also allows you to add and remove worker nodes when you want to branch out.

I asked in the CfC Slack channel what the future looks like for CfC because if it requires a license then it is another hurdle to overcome when selling in Connections. The response I got was:

“We are intending to keep providing a free version that customer can use and deploy as it is a packaging of open-source. Business discussion on what to do beyond that are still ongoing so I can’t comment. Options include providing commercial support or additional add-ons around the open-source for a commercial product. Right now this is a community effort, and we are currently looking  technical feedback  and understanding of what use cases people would like to use CfC for.  Looking forward to  your participation.”

Since the product is built on the following open technologies I would hope that a free option remains available going forward.

Another other benefit for using CfC is that IBM are using it for Pink. I assume that most of the documentation referring to orchestration of the containers will reference CfC in some form. Getting to know it now, I hope, will make deploying Pink containers easier.

Thanks to Michele Buccarello for answering my questions.

CwC has been built with below individual components

Core component:

  • Kubernetes and Mesosphere API/CLI
  • GUI
  • Installer for HA
  • Authentication through LDAP
  • An App store
  • A Private image registry

Sample applications:

  • Frontend
  • Liberty
  • Nginx
  • Redis
  • Tomcat

Built in Network

  • Flannel
  • Calico

Built in persistent Storage

  • NFS
  • Hostpath
  • GlusterFs

Supported CPU Architecture

  • PowerPC LE
  • x86