Sametime 8.5.2 upgrade failed – DB2 password

I was asked to assist a colleague who was upgrading a customer’s Sametime environment which had failed when upgrading the SSC to 8.5.2.

Initially when logging into the SSC the version was showing as 8.5.2 with WAS at which is good but I couldn’t access policies nor any of the servers which is of course bad. Also users were getting a blank screen when accessing the meeting center with an error referring to policies.

The database tables showed that the SSC was at the old version too.

I checked the various SystemOut.logs and saw the following written to the STConsoleServer log file.

[28/06/12 18:30:20:419 BST] 00000032 PolicyBeanDAO E   Exception Failed to get the Policy for: default, for product: im java.sql.SQLException: [ibm][db2][jcc][t4][2013][11249] Connection authorization failure occurred.  Reason: User ID or Password invalid.DSRA0010E: SQL State = null, Error Code = -4,214

So I checked the password for the DB2 administrative account by trying to connect to itself forcing the use of the password which I knew (since I built the servers).

db2 connect to stsc user db2admin
Enter current password for db2admin:
SQL30082N  Security processing failed with reason “24” (“USERNAME AND/ORPASSWORD INVALID”).  SQLSTATE=08001

So the password is wrong which I confirmed by attempting to change the password using passwd.

I thought I’d go a little further (since I had root!) and checked the date in which the password was changed and also who changed it by looking at the /var/log/secure files for the date shown below.

# passwd db2admin -S
db2admin PS 2012-06-20 0 99999 7 -1 (Password set, MD5 crypt.)

The first step was to change the password for db2admin back to what it was previously and recycle the SSC. This allowed me to access policies and users to continue using the meeting servers.

At first it was thought running STSC db2admin might fix it but this fails and looking at the wrapper it calls update.ddl which essentially runs three SQL queries. This fails because the tables and columns referenced have already been updated.

I decided to try and replicate this on a lab deployment and during the upgrade I came across a few unrelated problems which were resolved by running through Upgrading Sametime System Console to 8.5.2 IFR 1 fails during installation and trying again.

Once I got the lab server to the same state I then tried registering the SSC again because these tables are updated by the -upgrade script. Before doing this I checked the DBAppPassword= parameter in /opt/IBM/WebSphere/SSC/STSCServerCell/console/ to ensure that the password listed was correct and run it. Low and behold the tables updated (below).

I also checked https://<<SSC URL>>:9443/console/deployment/ListOfAllProductDeployments to ensure that the version was correct there. I can now access all the Sametime component servers.

The crux of the problem was that the password detailed in /opt/IBM/WebSphere/SSC/STSCServerCell/console/ was different to that of the OS at the time of the upgrade.

More NFS file locking and Connections

So I posted a blog NFS file locking and it’s affect on Connections not long ago so when I came across a similar problem with my lab deployment of Connections 3.0.1 on CentOS I thought it would be easy to sort out…..

I was seeing the following exception in the systemOut.log:

[7/9/12 22:27:51:172 BST] 00000022 SystemOut     O java.lang.reflect.InvocationTargetException
[7/9/12 22:27:51:173 BST] 00000022 SystemOut     O      at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
[7/9/12 22:27:51:173 BST] 00000022 SystemOut     O      at sun.reflect.NativeConstructorAccessorImpl.newInstance(
[7/9/12 22:27:51:174 BST] 00000022 SystemOut     O      at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
[7/9/12 22:27:51:174 BST] 00000022 SystemOut     O      at java.lang.reflect.Constructor.newInstance(
[7/9/12 22:27:51:175 BST] 00000022 SystemOut     O      at
[7/9/12 22:27:51:175 BST] 00000022 SystemOut     O      at
[7/9/12 22:27:51:176 BST] 00000022 SystemOut     O      at<init>(
[7/9/12 22:27:51:176 BST] 00000022 SystemOut     O      at
[7/9/12 22:27:51:176 BST] 00000022 SystemOut     O      at
[7/9/12 22:27:51:177 BST] 00000022 SystemOut     O      at<init>(
[7/9/12 22:27:51:177 BST] 00000022 SystemOut     O      at
[7/9/12 22:27:51:178 BST] 00000022 SystemOut     O      at
[7/9/12 22:27:51:178 BST] 00000022 SystemOut     O      at
[7/9/12 22:27:51:178 BST] 00000022 SystemOut     O      at
[7/9/12 22:27:51:179 BST] 00000022 SystemOut     O      at
[7/9/12 22:27:51:179 BST] 00000022 SystemOut     O Caused by: No locks available
[7/9/12 22:27:51:180 BST] 00000022 SystemOut     O      at
[7/9/12 22:27:51:180 BST] 00000022 SystemOut     O      at java.nio.channels.FileChannel.tryLock(
[7/9/12 22:27:51:181 BST] 00000022 SystemOut     O      at<init>(
[7/9/12 22:27:51:181 BST] 00000022 SystemOut     O      … 15 more

I tried the steps in my previous blog to no avail. I also edited /etc/fstab on the client appending “lock” as follows. /opt/IBM/LotusConnections/data/shared nfs rw,lock,hard,intr 0 0

I re-read some CentOS documentation picking up on the following:

“nfslock also has to be started for both the NFS client and server to function properly. To start NFS locking as root type: /sbin/service nfslock start. If NFS is set to start at boot, please ensure that nfslock also starts by running chkconfig –list nfslock. If nfslock is not set to on, this implies that you will need to manually run the /sbin/service nfslock start each time the computer starts. To set nfslock to automatically start on boot, type the following command in a terminal chkconfig nfslock on. ”

After enabling nfslock on the two clients and the server as well as setting it to start at boot I was up and running again. Odd thing was that this was running fine up until a prolonged shut down of my servers.

Missing status updates in Connections

A customer had a particularly difficult and awkward problem. A number of user reported that they could not see the status updates of others users who were part of the same network.

This seemed to affect a handful of users in the following pattern. User A cannot see User B’s status updates whilst User C can see User B’s updates. The problem is associated with User B. To c0mplicate things further User A shares a network with User B but does not follow him whilst User C also shares a network but does follow User B. So if you follow User B you get to see their status updates in your Status Updates. You should not need to follow the user to see their status updates the fact that you’re sharing a network should be sufficient.

I logged a PMR which thankfully found it’s way into the ever helpful hands of David McCarthy. An ISSC colleague mentioned that this had been seen in IBM’s deployment of Connections (w3) by Luiz Benietz and others so getting to the bottom of this would kill two birds with one stone.

IBM had me extract data from PEOPLEDB and HOMEPAGE databases using the DB2 commands below to provide the PERSON_ID of the users within HOMEPAGE to make sure it matches with the PROF_GUID within PEOPLEDB which for User B is E833339D-AED2-425F-8600-64CEFD85A3A5.

Comparison of NR_NETWORK showed that the data in there was correct.

PROF_GUID = E833339D-AED2-425F-8600-64CEFD85A3A5

PERSON_ID = 0bdd7880-8d9a-4fe1-ab13-77c54bb429bc
EXID = E833339D-AED2-425F-8600-64CEFD85A3A5

Using the PERSON_ID I was then able to retrieve data from the HOMEPAGE.NR_NEWS_STATUS_NETWORK table which details all the status updates posted by that PERSON_ID which in the HOMEPAGE.NR_NEWS_STATUS_NETWORK table is identified by ACTOR_UUID. This table also shows the READER_ID (detailed below) as well as a BRIEF_DESC column which shows the first 200 (or so) characters of the status update.

db2 “select * from HOMEPAGE.NR_NEWS_STATUS_NETWORK where ACTOR_UUID = ‘0bdd7880-8d9a-4fe1-ab13-77c54bb429bc‘”

ACTOR_UUID = 0bdd7880-8d9a-4fe1-ab13-77c54bb429bc
READER_ID = PERSON_ID of the person that is consuming the status updates ie those listed in the READER_ID column should be able to see the status updates.

To find out the names of the people who can read the updates you will need to use the READER_ID from HOMEPAGE.NR_NEWS_STATUS_NETWORK and find out the name of the user from the HOMEPAGE.PERSON by using the READER_ID converted to PERSON_ID.

db2 “select DISPLAYNAME from HOMEPAGE.PERSON where PERSON_ID = ‘714fbcd9-1740-477b-972d-a6456a35e4ea'”

I contacted the user and they could indeed see User B’s status updates.

To see how many people User B has in their network the following command was run which produced a value of about 140.

db2 “select count(*) from HOMEPAGE.NR_NETWORK where PERSON_ID = ‘0bdd7880-8d9a-4fe1-ab13-77c54bb429bc‘”

IBM asked for the following trace to be enabled whilst User B posted a status update and in the trace.log an exception was seen during the processing of one of the threads.

*=info: java.sql.*=all: com.ibatis.sqlmap.*=all:*=all:*=all

It was suggested that be installed but that would not be possible. The following fixes were applied which aimed to improve that code path to handle the case where the use records in the Homepage database are out of sync.     LO63965     VBUT8LNH89     Enhancement to News aggregation service for better handling of exceptional cases where user records become inconsistent with Profiles     LO66468     SCRD8QCDDK     Profiles Sync issue. Too many requests made on Event consumption to Profiles colleague feed

Unfortunately the problem was still apparent after the iFixes so more debugging was enabled, this time*=all:*=all

Thankfully this time the trace.log provided much more valuable information.

[30/05/12 16:14:59:301 BST] 00000067 StatusUpdateS 3 insertStatusUpdate Going to insert into db with a batch process this number of records: 144 and BUCKET_SIZE_BATCH_INSERT: 100
[ … snip …]
[30/05/12 16:14:59:308 BST] 00000067 NewsStatusNet 3 insertBatch – doInSqlMapClient rowsaffected 100
[30/05/12 16:14:59:308 BST] 00000067 NewsStatusNet < insertBatch RETURN 100
[30/05/12 16:14:59:312 BST] 00000067 StatusUpdateS < insertBatch RETURN 100
[ Here – it’s missing the insert for the remaining 44 records ]
[30/05/12 16:14:59:312 BST] 00000067 StatusUpdateS 3 insertStatusUpdate Going to insert into db with a batch process this number of records: 3 and BUCKET_SIZE_BATCH_INSERT: 100″

The response from L3 was:

The root cause is a coding error in a routine performing batch insert. The issue should be reproducible for people in the following conditions:

  • The number of people in the intersection of the sets (colleagues, followers)  for the author of the status update is > 100
  • The number of colleagues – number of people in the intersection is > 100 (this is what happen for this very specific PMR)
  • The number of followers – number of people in the intersection is > 100.

So the problem we are seeing is because the user had more than 100 people in their network and/or followers and the code would not add the 44 others to the database. pressing further I asked for an iFix for this which is too complicated and the only option is to go to CR1.

“Upon further review – the issue appears to be indirectly fixed in….We performed a code review in – this code path was entirely re-written. The final section of the batch is inserted in the new code in”

Up until this point the problem still hasn’t been resolved as updating Connections has not been planned yet but IBM support are fairly confident that CR1 will iron out this. With regards to w3, David McCarthy was able to track down one user who reported the problem initially who now (after the upgrade) can see all status updates.