/storage/seat is massive

This partition getting too big causes issues. Seat Stands for

  • S – stats
  • E – Events
  • A – alarms
  • T – Tasks

For this article we’re interested in when the growth is in events. In the example below we see that the eventtblsp is 54GB. this is the largest directory in /storage/seat

du -sh /storage/seat/vpostgres/*
1.3M alarmtblsp
54G eventtblsp
2.9G hs1tblsp
900M hs2tblsp
308M hs3tblsp
400M hs4tblsp
2.3G tasktblsp

Firstly you need to identify what records are filling the database

/opt/vmware/vpostgres/current/bin/psql -d VCDB -U postgres 

SELECT COUNT(0),
event_type
FROM vpxv_event_all
GROUP BY event_type
ORDER BY 1 DESC LIMIT 5;

The output in this case is

16297499 | vim.event.UserLoginSessionEvent
16295260 | vim.event.UserLogoutSessionEvent
2690546  | DatastoreFileDownloadEvent
1187037  | vim.event.TaskEvent
486828   | esx.problem.visorfs.ramdisk.full

You should then investigate using the vsphere client to see what’s issuing all these events.

Below here we can should investigate what dcui@127.0.0.1 is doing and try and stop it

Often it’s not so simple to stop these events. They’re often coming from backup software which vmware admins have no control over. In this instance we’ve used a workaround. This creates a procedure in the database which removes these unwanted events regularly.

Please change the procedure as appropriate to your issue

ldt_keep_from - this can be changed to keep any amount of days of the event. In our example we're keeping 10 days of information.

lv_event_type1 and lv_event_type2 - these are the 2 events we have an issue with in this example

If you've only 1 event then you'll need to change the code (event_type = '''||
           lv_event_type1||''' or event_type = '''||lv_event_type2||''')
to
 (event_type = '''|| lv_event_type1||''')

/opt/vmware/vpostgres/current/bin/psql -d VCDB -U postgres

CREATE OR REPLACE FUNCTION remove_some_Events ()
returns VARCHAR as $$
DECLARE
event_partition VARCHAR (2000);
lv_event_type1 VARCHAR (200);
lv_event_type2 VARCHAR (200);
ldt_keep_from date;
ln_count_rows_removed_event int;
ln_count_rows_removed_event_arg int;
ln_rowcount int;
DECLARE return_count VARCHAR := 'DEFAULT';
BEGIN
 ln_count_rows_removed_event = 0;
 ln_count_rows_removed_event_arg = 0;
 -- this is configuring the amount of this data you want to keep.
 -- Change this as you feel comfortable.
 ldt_keep_from = timezone('UTC', now())  - (10 || ' days')::interval;
 --- These are the events we're going to remove for issue 1
 lv_event_type1 = 'vim.event.UserLoginSessionEvent';
 lv_event_type2 = 'vim.event.UserLogoutSessionEvent';
  FOR part IN 1..92  LOOP
 event_partition = 'delete  from VPX_EVENT_ARG_' || CAST(part AS TEXT)
            || ' where event_id in (select event_id from  VPX_EVENT_' || CAST(part AS TEXT)
            || ' where create_time < '''||ldt_keep_from ||''' and ((event_type = '''||
           lv_event_type1||''' or event_type = '''||lv_event_type2||''')))';
         EXECUTE event_partition;                     get diagnostics ln_rowcount =  row_count;                     ln_count_rows_removed_event_arg = ln_count_rows_removed_event_arg + ln_rowcount;                     ln_rowcount = 0;
 event_partition = 'delete  from VPX_EVENT_' || CAST(part AS TEXT)
           || ' where (event_type = '''||lv_event_type1||''' or event_type = '''||lv_event_type2||''')';
             EXECUTE event_partition;
                         get diagnostics ln_rowcount =  row_count;
                         ln_count_rows_removed_event = ln_count_rows_removed_event + ln_rowcount;
                         ln_rowcount = 0;
 END LOOP;             return_count = 'vpx_events_removed=' || ln_count_rows_removed_event || ',vpx_event_arguments_removed=' || ln_count_rows_removed_event_arg ;             RETURN return_count;
 END
 $$language plpgsql;

this can be run manually the first time. it may take a long time depending on the amount of data which need to be removed. It can be executed with all other services running.

VCDB=# select remove_some_Events();

vpx_events_removed=1988331,vpx_event_arguments_removed=78653978
(1 row)

once your happy you can package it in a simple shell script.

vi /storage/updatemgr/run_cleanup.sh

#!/bin/sh
/opt/vmware/vpostgres/current/bin/psql -U postgres -d VCDB -c "SELECT * FROM clean_some_Events();"
:wq! 
chmod 777 /storage/updatemgr/run_cleanup.sh
./storage/updatemgr/run_cleanup.sh

Now it’s best to schedule this to run every night.

crontab -e
0  2 * * * /storage/updatemgr/run_cleanup.sh
:wq!

       

Performance Statistics in vCenter using too many resources

Statistics are important to show how your VM’s are performing. However this may sometimes become a balancing act. Too many performance statistics can degrade the performance. These stats are like a data warehouse and can effect the critical operational side of virtual center.

The stats are /storage/seat.

Here is an example of a setup which has 130GB of hist stat.

du -sh /storage/seat/vpostgres/*
1.3M /storage/seat/vpostgres/alarmtblsp
54M /storage/seat/vpostgres/eventtblsp
26G /storage/seat/vpostgres/hs1tblsp
39G /storage/seat/vpostgres/hs2tblsp
22G /storage/seat/vpostgres/hs3tblsp
11G /storage/seat/vpostgres/hs4tblsp
2.3M /storage/seat/vpostgres/tasktblsp
hs1tblsp  --  past day stats
hs2tblsp  --  past week stats
hs3tblsp  --- past month stats
hs4tblsp  --  past year stats

Double check that your statistic levels are set to default. Anything higher than which on any of these columns will cause a big increase in the amount of data collected.

If you have vRealize Operations VROPS. You maybe be able to disable this stat collection and just rely on VROPS for your statistics.

You may want to clear out some of this data. This KB 2110031 covers how to trim out some data.

I’m a DBA however and I do love to truncate. This is fast and efficient. First step is to stop all the services except postgres. Then copy and paste in the script below. All your historical stats will be removed once this script is run.

service-control --stop --all
service-control --start vmware-vpostgres
/opt/vmware/vpostgres/current/bin/psql -U postgres VCDB

Copy and paste in the code below and press return

CREATE OR REPLACE FUNCTION cleanup_hist_stats_truncate (SaveData INTEGER)
 returns void as $$
 DECLARE r record;
 DECLARE sql_stmt varchar(2000);
 DECLARE affected_rows integer;
 DECLARE affected_rows_step1 integer;
 BEGIN
 TRUNCATE TABLE VPX_SAMPLE_TIME1;
 TRUNCATE TABLE VPX_SAMPLE_TIME2;
 TRUNCATE TABLE VPX_SAMPLE_TIME3;
 TRUNCATE TABLE VPX_SAMPLE_TIME4;
 FOR r IN SELECT table_name FROM information_schema.tables WHERE table_name LIKE 'vpx_hist_stat%' AND table_type='BASE TABLE'     LOOP         sql_stmt = 'TRUNCATE TABLE ' || r.table_name;         EXECUTE sql_stmt;                        END LOOP;
 END
 $$language plpgsql;
 select cleanup_hist_stats_truncate(1);
            

vcac database growing too large 7.x

First step is to check your postgres logs.

These are on /storage/db/pg_data/pg_log

more postgresql.log

You may find, ERROR,XX001,”could not read block 0 in file “”base/8198863/8987722″”: read only 0 of 8192 bytes”,,,,,”automatic vacuum of table “”vrhb.public.docs_core_authcredentialsservicestate”””,,,,””

The fix to stop your database from growing.

psql vrhb

reindex table vrhb.public.docs_core_authcredentialsservicestate;

vacuum full reindex table vrhb.public.docs_core_authcredentialsservicestate;

\c vcac

vacuum analyze;

\q

You can verify that the messages are gone from your postgresql.log

recreate postgres instance is VCSA

Dump out VCDB the database to

/opt/vmware/vpostgres/current/bin/pg_dump -d VCDB -U postgres -Fp -c > /storage/updatemgr/VCDBackUp

Drop the database.

/opt/vmware/vpostgres/current/bin/dropdb -U postgres VCDB

dumpout the instance
su -m -c “/opt/vmware/vpostgres/current/bin/pg_dumpall -c -f /storage/updatemgr/GSS_dumpall.sql” postgres

Backup the data directory
cp /storage/db/vpostgres /storage/db/vpostgres_bk

Remove all dependencies including seat data

cd /storage/db/vpostgres
rm -rf *

rm /storage/seat/vpostgres/tasktblsp/*
rm /storage/seat/vpostgres/alarmtblsp/*
rm /storage/seat/vpostgres/hs2tblsp/*
rm /storage/seat/vpostgres/hs3tblsp/*
rm /storage/seat/vpostgres/hs1tblsp/*
rm /storage/seat/vpostgres/hs4tblsp/*
rm /storage/seat/vpostgres/eventtblsp/*

Create and start new instance

su – vpostgres -c “/opt/vmware/vpostgres/current/bin/initdb -D /storage/db/vpostgres”
su – vpostgres -c “/opt/vmware/vpostgres/current/bin/pg_ctl -D /storage/db/vpostgres start”

Create postgres user
/opt/vmware/vpostgres/current/bin/createuser -s postgres

Import the instance
/opt/vmware/vpostgres/current/bin/pg_dumpall -U postgres -c -f /storage/updatemgr/ddumps/GSS_dumpall.sql

Create VCDB and user.
/opt/vmware/vpostgres/current/bin/createdb -U postgres VCDB
Connect to the database:
/opt/vmware/vpostgres/current/bin/psql -U postgres VCDB
psql.bin (9.4.12 (VMware Postgres 9.4.12.0-5574246 release))
Type “help” for help.

Run the following queries::
create schema vc;
ALTER USER VC SET search_path to vc,pg_catalog;
ALTER USER postgres SET search_path to vc,pg_catalog;
ALTER USER vc WITH PASSWORD ‘d#Bu|AWL+x&{Y3j5’; — get this from /etc/vmware-vpx/vcdb.properties

ALTER DATABASE “VCDB” OWNER TO vc;

VCDB=> \q

We will now restore the previosuly backed up database:

/opt/vmware/vpostgres/current/bin/psql -d VCDB -U postgres -f /storage/updatemgr/VCDBackUp >/tmp/output_backup.txt 2>&1
Wait until it finishes.
Check the log on /tmp/output_backup.txt

Once it completes test by querying a table:

/opt/vmware/vpostgres/current/bin/psql -d VCDB -U vc
select * from vpx_entity; —confirm that rows are returned.

VCDB=> \q

Restore a backup of VCDB

/opt/vmware/vpostgres/current/bin/dropdb -U postgres VCDB

/opt/vmware/vpostgres/current/bin/createdb -U postgres VCDB

Connect to the database:
/opt/vmware/vpostgres/current/bin/psql -U postgres VCDB
psql.bin (9.4.12 (VMware Postgres 9.4.12.0-5574246 release))
Type “help” for help.

Run the following queries::
create schema vc;
ALTER USER VC SET search_path to vc,pg_catalog;
ALTER USER postgres SET search_path to vc,pg_catalog;
ALTER USER vc WITH PASSWORD ‘d#Bu|AWL+x&{Y3j5’; — get this from /etc/vmware-vpx/vcdb.properties

ALTER DATABASE “VCDB” OWNER TO vc;

VCDB=> \q

We will now restore the previosuly backed up database:

/opt/vmware/vpostgres/current/bin/psql -d VCDB -U postgres -f /storage/core/VCDBackUp >/tmp/output_backup.txt 2>&1
Wait until it finishes.
Check the log on /tmp/output_backup.txt

Once it completes test by querying a table:

/opt/vmware/vpostgres/current/bin/psql -d VCDB -U vc
select * from vpx_entity; —confirm that rows are returned.

VCDB=> \q

Nightly Vacuum analyze

You’ll have to create a file on your appliance and run it.

The appliation needs to be up for this script to be up.

It will cause extra IO but no locking. If your using synchronize please do this off peak.

copy the script below into vi

vi vacuum_nightly.sh

chmod u+x vacuum_nightly.sh

./vacuum_nightly.sh

Let me know if there’s any issue

ymurphy@vmware.com

After running this manaually a few times then I’d suggest you run it nightly on a cron.

crontab -e

20 2 * * * /tmp/vacuum_nightly.sh

Output from the script can be seen on stats_collector.log  on the same directory as the script

#!/bin/bash
error=0
errorlog='stats_collector.log'
pbin=`ps -ea -o cmd | grep '[p]ostgres -D' | awk '{ gsub ("postgres$","",$1); print $1}'`
run_log='optimise_output.log'
dbownername=` ps -eo uname:20,pid,pcpu,pmem,sz,tty,stat,time,cmd --sort -time |grep -m 1 'postgres:' |awk '{print $1}'`
databasename=` ps -f -N --sort -pid | grep -m 1 'postgres:' |awk '{print $10}'`

function vacuumaz {

echo $dbownername database owner
echo $databasename database name

if [ $databasename = "secureall" ]; then
 $pbin/vacuumdb -U secureall -d secureall 2>>$errorlog
elif [ $databasename = "vcac" ]; then
 $pbin/vacuumdb -U postgres -d vcac 2>>$errorlog
elif [ $databasename = "VCDB" ]; then
 $pbin/vacuumdb -U postgres -d VCDB > /dev/null 2>>$errorlog
elif
 $pbin/vacuumdb -U postgres -d $databasename > /dev/null 2>>$errorlog ; then
 printf '\n No Errors on vacuumdb \n'
 printf '\n'
 else
 echo 'vacuumdb ERROR. Please review $errorlog file'
 printf '\n'
 error=1
 fi

echo `/bin/date "+%Y-%m-%dT%H:%M:%S"` Finish vacuumdb
echo `/bin/date "+%Y-%m-%dT%H:%M:%S"` Finish vacuumdb >> $errorlog
}

# Main Program

vacuumaz

 

 

Is my postgres database bloated?

I personally believe that autovacuum isn’t agressive enough on any of our appliances out of the box. This causes bloat and slower response times.

Bloat can be seen using this script.

If the tbloat or ibloat is > 20% then this indicates that autovacuum isn’t doing its job and you need to implement a nightly vacuum. See next blog entry

SELECT
current_database(), schemaname, tablename, /*reltuples::bigint, relpages::bigint, otta,*/
ROUND((CASE WHEN otta=0 THEN 0.0 ELSE sml.relpages::FLOAT/otta END)::NUMERIC,1) AS tbloat,
CASE WHEN relpages < otta THEN 0 ELSE bs*(sml.relpages-otta)::BIGINT END AS wastedbytes,
iname, /*ituples::bigint, ipages::bigint, iotta,*/
ROUND((CASE WHEN iotta=0 OR ipages=0 THEN 0.0 ELSE ipages::FLOAT/iotta END)::NUMERIC,1) AS ibloat,
CASE WHEN ipages < iotta THEN 0 ELSE bs*(ipages-iotta) END AS wastedibytes
FROM (
SELECT
schemaname, tablename, cc.reltuples, cc.relpages, bs,
CEIL((cc.reltuples*((datahdr+ma-
(CASE WHEN datahdr%ma=0 THEN ma ELSE datahdr%ma END))+nullhdr2+4))/(bs-20::FLOAT)) AS otta,
COALESCE(c2.relname,’?’) AS iname, COALESCE(c2.reltuples,0) AS ituples, COALESCE(c2.relpages,0) AS ipages,
COALESCE(CEIL((c2.reltuples*(datahdr-12))/(bs-20::FLOAT)),0) AS iotta — very rough approximation, assumes all cols
FROM (
SELECT
ma,bs,schemaname,tablename,
(datawidth+(hdr+ma-(CASE WHEN hdr%ma=0 THEN ma ELSE hdr%ma END)))::NUMERIC AS datahdr,
(maxfracsum*(nullhdr+ma-(CASE WHEN nullhdr%ma=0 THEN ma ELSE nullhdr%ma END))) AS nullhdr2
FROM (
SELECT
schemaname, tablename, hdr, ma, bs,
SUM((1-null_frac)*avg_width) AS datawidth,
MAX(null_frac) AS maxfracsum,
hdr+(
SELECT 1+COUNT(*)/8
FROM pg_stats s2
WHERE null_frac<>0 AND s2.schemaname = s.schemaname AND s2.tablename = s.tablename
) AS nullhdr
FROM pg_stats s, (
SELECT
(SELECT current_setting(‘block_size’)::NUMERIC) AS bs,
CASE WHEN SUBSTRING(v,12,3) IN (‘8.0′,’8.1′,’8.2’) THEN 27 ELSE 23 END AS hdr,
CASE WHEN v ~ ‘mingw32’ THEN 8 ELSE 4 END AS ma
FROM (SELECT version() AS v) AS foo
) AS constants
GROUP BY 1,2,3,4,5
) AS foo
) AS rs
JOIN pg_class cc ON cc.relname = rs.tablename
JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname = rs.schemaname AND nn.nspname <> ‘information_schema’
LEFT JOIN pg_index i ON indrelid = cc.oid
LEFT JOIN pg_class c2 ON c2.oid = i.indexrelid
) AS sml
ORDER BY wastedbytes DESC;

 

 

 

Removing duplicate rows for all databases

Day to day we have to delete duplicate data from the database.

An upgrade requires  Primary key where one didn’t exist before or a db let it guard down and didn’t enforce the PK’s (yeah looking at you postgres!)

Or if an upgrade can’t create a new pk then this is useful. The SQL Server commands to find this are there too.

Continue reading “Removing duplicate rows for all databases”