IBM DB2 NUMDB value changes when migrating IBM Connections databases causing application problems

A very recent go live of IBM Connections 5.5 from 4.5 resulted in an error affecting Metrics. Metrics was not working what so ever.

A look at the cogserver.logs showed DB2 exceptions. I checked the DB2 client on the Cognos node, it could connect. I noticed that all the daily refreshes failed so it must have been database related.

I checked the value of the numdb having set the value to 25 after the databases were created prior to transfer of the 4.5 data. Running db2 get dbm cfg | find “NUMDB” gave me the value of 15 which is not what I set. I checked my notes and I did set it to 25.

I looked at db2diag.log and could see that the value changed at the time of go live, in fact it had changed after the databases were created and after I had changed the value to 25.

PID     : 7376                 TID : 3468           PROC : db2bp.exe
INSTANCE: DB2                  NODE : 000           DB   : HOMEPAGE
APPID   : *LOCAL.DB2.
HOSTNAME:
EDUID   : 3468
FUNCTION: DB2 UDB, config/install, sqlfLogUpdateCfgParam, probe:30
CHANGE  : CFG DBM: “Numdb” From: “25”  To: “15”

The above entry was at the same time that activity was happening with HOMEPAGE database.

I checked the scripts I had written and all seemed well. I then checked the lcwizard logs directory and found in C:\Users\db2admin\lcWizard\log\dbWizard\homepage_upgrade-50CR3-55.log the following

UPDATE DBM CFG USING NUMDB 15
DB20000I  The UPDATE DATABASE MANAGER CONFIGURATION command completed
successfully.

In ..\Wizards\connections.sql\homepage\db2\upgrade-50CR3-55.sql I found the culprit.

— —————————————————————–
— Defect 164873:
— —————————————————————–

UPDATE DBM CFG USING NUMDB 15@

I really have no idea why IBM would put that in there. There are 16 databases if you include FEBDB. Why have it for HOMEPAGE and not the other database scripts?

I updated the number of databases and then stopped all the application servers and restarted DB2 for good measure and all is well now.

DB2 log saturation during IBM Connections database transfer

During a 4.5 –> 5.5 migration I got the following when running the transfer scripts for METRICS and PEOPLEDB.

[02/03/16 16:33:26.659 CET] com.ibm.db2.jcc.am.SqlTransactionRollbackException: Error for batch element #1: DB2 SQL Error: SQLCODE=-1476, SQLSTATE=40506, SQLERRMC=-964, DRIVER=3.69.49
[02/03/16 16:33:26.659 CET] com.ibm.db2.jcc.am.SqlException: [jcc][103][10843][3.69.49] Non-recoverable chain-breaking exception occurred during batch processing.  The batch is terminated non-atomically. ERRORCODE=-4225, SQLSTATE=null
[02/03/16 16:33:26.659 CET] error.executing.transfer
err.dbtransfer.exception.labelclass com.ibm.db2.jcc.am.BatchUpdateException: [jcc][t4][102][10040][3.69.49] Batch failure.  The batch was submitted, but at least one exception occurred on an individual member of the batch.
Use getNextException() to retrieve the exceptions for specific batched elements. ERRORCODE=-4229, SQLSTATE=null
com.ibm.db2.jcc.am.BatchUpdateException: [jcc][t4][102][10040][3.69.49] Batch failure.  The batch was submitted, but at least one exception occurred on an individual member of the batch.
Use getNextException() to retrieve the exceptions for specific batched elements. ERRORCODE=-4229, SQLSTATE=null

Looking in the db2diag.log I saw the following

2016-02-03-18.49.00.983000+060 E44991171F646        LEVEL: Error
PID     : 2348                 TID : 1580           PROC : db2syscs.exe
INSTANCE: DB2                  NODE : 000           DB   : METRICS
APPHDL  : 0-809                APPID: 15.91.29.211.49843.160203173533
AUTHID  : DB2ADMIN             HOSTNAME:
EDUID   : 1580                 EDUNAME: db2agent (METRICS) 0
FUNCTION: DB2 UDB, data protection services, sqlpgResSpace, probe:2860
MESSAGE : ADM1823E  The active log is full and is held by application handle
          “0-809”.  Terminate this application by COMMIT, ROLLBACK or FORCE
          APPLICATION.

2016-02-03-18.49.00.983000+060 E44991819F610        LEVEL: Error
PID     : 2348                 TID : 1580           PROC : db2syscs.exe
INSTANCE: DB2                  NODE : 000           DB   : METRICS
APPHDL  : 0-809                APPID: 15.91.29.211.49843.160203173533
AUTHID  : DB2ADMIN             HOSTNAME:
EDUID   : 1580                 EDUNAME: db2agent (METRICS) 0
FUNCTION: DB2 UDB, data protection services, sqlpgResSpace, probe:6666
MESSAGE : ZRC=0x85100009=-2062548983=SQLP_NOSPACE
          “Log File has reached its saturation point”
          DIA8309C Log file was full.

It means that the DB2 transaction log has become full which you can get information of from the following URLs

http://www-01.ibm.com/support/docview.wss?uid=swg21623212

http://www-01.ibm.com/support/docview.wss?uid=swg21617184

http://bpmadmin.blogspot.com/2014/04/db2-sql-error-sqlcode-1476.html

To get the data transferred I used the following values (the values you need may differ) and commands

db2 update db cfg for metrics using LOGFILSIZ 10000
db2 update db cfg for metrics using LOGPRIMARY 80
db2 update db cfg for metrics using LOGSECOND 40

db2stop
db2start

db2 get db cfg for metrics
Log file size (4KB)                         (LOGFILSIZ) = 10000
Number of primary log files                (LOGPRIMARY) = 80
Number of secondary log files               (LOGSECOND) = 40

I was then able to run the transfer for both these databases.

You may want to change the values back to the default values as it will have an impact on disk space and possibly performance.

Clearing LCUSER.UT_CLBACTIVITYSTREAMQUEUEENT as part of IBM Connections CCM migration

Nearing the end of a 4.5 -> 5.5 migration of IBM Connections I hundreds of lines of exceptions in the Infrastructure SystemOut.log. These exceptions only appeared after the content store and database data were transferred to the target. I couldn’t see a problem in the UI whatsoever, this worries me more than if I did come up with an error somewhere.

[3/8/16 16:20:19:148 CET] 0000085d SRTServletRes W com.ibm.ws.webcontainer.srt.SRTServletResponse setStatus WARNING: Cannot set status. Response already committed.
[3/8/16 16:20:19:148 CET] 0000085d SRTServletRes W com.ibm.ws.webcontainer.srt.SRTServletResponse addHeader SRVE8094W: WARNING: Cannot set header. Response already committed.
[3/8/16 16:20:19:148 CET] 0000085d WASSessionCor W SessionAffinityManager setCookie SESN0066E: The response is already committed to the client. The session cookie cannot be set.
[3/8/16 16:20:19:148 CET] 0000085d SystemOut     O java.lang.IllegalStateException
[3/8/16 16:20:19:148 CET] 0000085d SystemOut     O     at com.ibm.ws.webcontainer.srt.SRTServletResponse.addSessionCookie(SRTServletResponse.java:2175)
[3/8/16 16:20:19:148 CET] 0000085d SystemOut     O     at com.ibm.ws.session.SessionAffinityManager.setCookie(SessionAffinityManager.java:589)
[3/8/16 16:20:19:148 CET] 0000085d SystemOut     O     at com.ibm.ws.session.SessionManager.adaptAndSetCookie(SessionManager.java:747)
[3/8/16 16:20:19:148 CET] 0000085d SystemOut     O     at com.ibm.ws.session.SessionManager.createSession(SessionManager.java:734)
[3/8/16 16:20:19:148 CET] 0000085d SystemOut     O     at com.ibm.ws.session.SessionContext.getIHttpSession(SessionContext.java:505)
[3/8/16 16:20:19:148 CET] 0000085d SystemOut     O     at com.ibm.ws.session.SessionContext.getIHttpSession(SessionContext.java:426)
[3/8/16 16:20:19:148 CET] 0000085d SystemOut     O     at com.ibm.ws.webcontainer.srt.SRTRequestContext.getSession(SRTRequestContext.java:113)
[3/8/16 16:20:19:148 CET] 0000085d SystemOut     O     at com.ibm.ws.webcontainer.srt.SRTServletRequest.getSession(SRTServletRequest.java:2168)
[3/8/16 16:20:19:148 CET] 0000085d SystemOut     O     at com.ibm.ws.webcontainer.srt.SRTServletRequest.getSession(SRTServletRequest.java:2152)

These exceptions were triggered every 11 minutes.

[3/8/16 16:51:30:018 CET] 00000c98 ThreadHttpReq E   Exception with request in this thread : null
[3/8/16 16:51:30:018 CET] 00000c98 ThreadHttpReq E   Exception with request in this thread : null
[3/8/16 16:52:00:018 CET] 00000c9b ThreadHttpReq E   Exception with request in this thread : null
[3/8/16 16:52:00:018 CET] 00000c9b ThreadHttpReq E   Exception with request in this thread : null

The above exception appeared constantly. This exception stopped after clearing the /temp, /wstemp and /translog directories but the other exceptions remained.

Enabling trace got me a bit more.

[3/8/16 16:20:19:148 CET] 0000085d SRTServletRes W com.ibm.ws.webcontainer.srt.SRTServletResponse setStatus WARNING: Cannot set status. Response already committed.
[3/8/16 16:20:19:148 CET] 0000085d SRTServletRes W com.ibm.ws.webcontainer.srt.SRTServletResponse addHeader SRVE8094W: WARNING: Cannot set header. Response already committed.
[3/8/16 16:20:19:148 CET] 0000085d ServletWrappe > com.ibm.ws.webcontainer.servlet.ServletWrapper handleRequest ServletWrapper[/ic/errors/errorMini.jsp:null] ,request-> com.ibm.lconn.core.web.util.lang.I18NFilter$LCServletRequest@afb184af ,response-> com.ibm.ws.webcontainer.srt.SRTServletResponse@9bd4bb03 ENTRY
[3/8/16 16:20:19:148 CET] 0000085d ServletWrappe 1 com.ibm.ws.webcontainer.servlet.ServletWrapper handleRequest   request—>/connections/opensocial/basic/rest/activitystreams/@me/@all/@all<—
[3/8/16 16:20:19:148 CET] 0000085d ServletWrappe 1 com.ibm.ws.webcontainer.servlet.ServletWrapper handleRequest handling request for resource [/connections/opensocial/ic/errors/errorMini.jsp]
[3/8/16 16:20:19:148 CET] 0000085d ServletWrappe > com.ibm.ws.webcontainer.servlet.ServletWrapper loadServlet, className–>[com.ibm._jsp._errorMini], servletName[/ic/errors/errorMini.jsp] ENTRY
[3/8/16 16:20:19:148 CET] 0000085d ServletWrappe < com.ibm.ws.webcontainer.servlet.ServletWrapper loadServlet, Found target for className–>[com.ibm._jsp._errorMini], servletName[/ic/errors/errorMini.jsp] RETURN
[3/8/16 16:20:19:148 CET] 0000085d ServletWrappe 3 com.ibm.ws.webcontainer.servlet.ServletWrapper handleRequest internal servlet –> false
[3/8/16 16:20:19:148 CET] 0000085d ServletWrappe > com.ibm.ws.webcontainer.servlet.ServletWrapper service  ENTRY  this->[ServletWrapper[/ic/errors/errorMini.jsp:null]] ,className–>[com.ibm._jsp._errorMini] ,request->[com.ibm.lconn.core.web.util.lang.I18NFilter$LCServletRequest@afb184af] ,response->[com.ibm.ws.webcontainer.srt.SRTServletResponse@9bd4bb03
[3/8/16 16:20:19:148 CET] 0000085d WASSessionCor W SessionAffinityManager setCookie SESN0066E: The response is already committed to the client. The session cookie cannot be set.
[3/8/16 16:20:19:148 CET] 0000085d SystemOut     O   java.lang.IllegalStateException
[3/8/16 16:20:19:148 CET] 0000085d SystemOut     O       at com.ibm.ws.webcontainer.srt.SRTServletResponse.addSessionCookie(SRTServletResponse.java:2175)
[3/8/16 16:20:19:148 CET] 0000085d SystemOut     O       at com.ibm.ws.session.SessionAffinityManager.setCookie(SessionAffinityManager.java:589)
[3/8/16 16:20:19:148 CET] 0000085d SystemOut     O       at com.ibm.ws.session.SessionManager.adaptAndSetCookie(SessionManager.java:747)
[3/8/16 16:20:19:148 CET] 0000085d SystemOut     O       at com.ibm.ws.session.SessionManager.createSession(SessionManager.java:734)

In access_log I saw the following.

x.x.x.x – – [08/Mar/2016:16:20:19 +0100] “POST /connections/opensocial/basic/rest/activitystreams/@me/@all/@all HTTP/1.1” 400 68

I raised a PMR and Kevin Holohan quickly got back to me asking me to step through “Mass notifications from CCM.” I wondered how this would fit in with my problem but performed the following steps:

db2 connect to FNOS

db2 select count(*) as ROWS from LCUSER.UT_CLBACTIVITYSTREAMQUEUEENT

ROWS
———–
        938

db2 “select count(*) as REMOVED  from LCUSER.UT_CLBACTIVITYSTREAMQUEUEENT where ENTRY_STATUS = 2 AND OBJECT_ID <> x’00000000000000000000000000
000000′”

REMOVED
———–
        937

db2 “DELETE FROM LCUSER.UT_CLBACTIVITYSTREAMQUEUEENT WHERE ENTRY_STATUS = 2 AND OBJECT_ID <> x’00000000000000000000000000000000′”

db2 select count(*) as ROWS from LCUSER.UT_CLBACTIVITYSTREAMQUEUEENT

ROWS
———–
          1

Of course I backed up FNOS first and dropped the application servers before doing this. On start up the exceptions are no more and I have a clean log

As this system was not in production yet I do not know whether old notifications were being sent.

Just yesterday another customer called me for a different reason but I saw exactly the same exceptions in their logs and sent him the steps I used above. This morning he pinged me an email to tell me the exceptions have stopped. This customer is a big user of CCM and migrated to 5.0 from 4.5 a number of months ago. He had over 27000 rows in LCUSER.UT_CLBACTIVITYSTREAMQUEUEENT and his users had not received old notifications like the developerworks blog.

I will add this to my migration steps from no onwards when migrating CCM.

IBM Connections 5.5 DB2 migration fails due to full transaction logs

During a database transfer from Connections 4.5 CR05 (DB2 10.1) to Connections 5.5 (DB2 10.5.0.7) I ran across a number of transfer failures using the tool. After a bit of digging such as looking at db2diag.log and DB2 Technotes I found the problem was that the DB2 transaction logs were being filled. Below are some example errors.

[02/03/16 16:33:26.659 CET] com.ibm.db2.jcc.am.SqlTransactionRollbackException: Error for batch element #1: DB2 SQL Error: SQLCODE=-1476, SQLSTATE=40506, SQLERRMC=-964, DRIVER=3.69.49
[02/03/16 16:33:26.659 CET] com.ibm.db2.jcc.am.SqlException: [jcc][103][10843][3.69.49] Non-recoverable chain-breaking exception occurred during batch processing.  The batch is terminated non-atomically. ERRORCODE=-4225, SQLSTATE=null
[02/03/16 16:33:26.659 CET] error.executing.transfer
err.dbtransfer.exception.labelclass com.ibm.db2.jcc.am.BatchUpdateException: [jcc][t4][102][10040][3.69.49] Batch failure.  The batch was submitted, but at least one exception occurred on an individual member of the batch.
Use getNextException() to retrieve the exceptions for specific batched elements. ERRORCODE=-4229, SQLSTATE=null
com.ibm.db2.jcc.am.BatchUpdateException: [jcc][t4][102][10040][3.69.49] Batch failure.  The batch was submitted, but at least one exception occurred on an individual member of the batch.
Use getNextException() to retrieve the exceptions for specific batched elements. ERRORCODE=-4229, SQLSTATE=null

Db2diag.log

EDUID   : 1580                 EDUNAME: db2agent (METRICS) 0
FUNCTION: DB2 UDB, data protection services, sqlpgResSpace, probe:6666
MESSAGE : ZRC=0x85100009=-2062548983=SQLP_NOSPACE
“Log File has reached its saturation point”
DIA8309C Log file was full.

In http://www-01.ibm.com/support/docview.wss?uid=swg21623212 it suggests increasing the sizes for LogFilSiz, LogPrimary, and LogSecond. On the second attempt changing these settings I found values that worked (for me).

db2 update db cfg for metrics using LOGFILSIZ 10000
db2 update db cfg for metrics using LOGPRIMARY 80
db2 update db cfg for metrics using LOGSECOND 40
db2stop
db2start

I had to increase the default values for Metrics and Profiles as they contain a lot of data.

You may want to reset the values after migration so you do not impact disk space.

Cannot share folders with a community

A customer notified me of a problem a user faced when trying to share a folder to a community. Quickly we found the problem was with the community and not the folder as the folder could be shared with other communities and various folders could not be shared with this specific community. This was a community that was created in Connections 3.0.1.

The user saw different errors in the web browser compared with the Windows connector.

2013-07-15_170714

2013-07-15_170726

I found  a forum entry but it did not provide any resolution or technical details. I did some looking into SNCOMM and FILES and could not see anything obviously wrong so I raised a PMR.

IBM quickly came back and asked me to run FilesDataIntegrityService.syncAllCommunityShares(). This is a command that should be run after upgrade/migration to Connections 4.0 so with some dubiousness I ran the command. In my wsadmin window I observed a number of lines of output but two for the particular community.

[08/07/13 15:08:18:462 IST] 000028fd SyncCommunity I   EJPVJ9418I: The community 8ac6c344-43d8-4321-a292-2b952c55bd9d has been synchronized and now has visibility PRIVATE and name Parent Community.
[08/07/13 15:08:20:541 IST] 000028fd SyncCommunity I   EJPVJ9418I: The community feba9b69-6b05-45db-adb9-ea1c6d26073f has been synchronized and now has visibility PRIVATE and name sub-community.

I’m not sure what these lines mean (awaiting an answer from IBM) but it worked and the user can now share his folder with the community.